From 5758d15988325466cd1ab78ab0f0b0135db42188 Mon Sep 17 00:00:00 2001
From: zengxianghuai <zengxianghuai@h-partners.com>
Date: Tue, 27 May 2025 19:31:36 +0800
Subject: [PATCH 1/2] =?UTF-8?q?=E6=B7=BB=E5=8A=A0=E8=84=9A=E6=9C=AC?=
 =?UTF-8?q?=E7=9A=84=E9=85=8D=E7=BD=AE=E6=96=87=E4=BB=B6?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

---
 test/tools/=1.21.6,      |    0
 test/tools/prompt.yaml   |   82 +
 test/tools/stopwords.txt | 4725 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 4807 insertions(+)
 create mode 100644 test/tools/=1.21.6,
 create mode 100644 test/tools/prompt.yaml
 create mode 100644 test/tools/stopwords.txt

diff --git a/test/tools/=1.21.6, b/test/tools/=1.21.6,
new file mode 100644
index 0000000..e69de29
diff --git a/test/tools/prompt.yaml b/test/tools/prompt.yaml
new file mode 100644
index 0000000..7a3031b
--- /dev/null
+++ b/test/tools/prompt.yaml
@@ -0,0 +1,82 @@
+GENERATE_QA: "你是一个问答生成专家，你的任务是根据我提供的段落内容和已有的问题，生成{qa_count}个不重复的针对该段落内容的问题与回答，
+并判断这个问答对的属于领域，并只输出问题、回答、领域。
+
+注意：
+
+1. 单个回答长度必须大于30字小于120字
+
+2. 问题不能出现重复
+
+3. 请指定明确的场景，如'xx公司', 'xx系统', 'xx项目', ‘xx软件'等
+
+4. 问题中不要使用模糊的指代词, 如'这'、'那'
+
+5. 划分领域的时候请忽略上下文内容,领域大概可以分为（建筑、园林、摄影、戏剧、戏曲、舞蹈、音乐、书法、绘画、雕塑、美食、营养、健身、运动、旅游、地理、气象、海洋、地质、生态、天文、化学、物理、生物、数学、统计、逻辑、人工智能、大数据、云计算、网络、通信、自动化、机械、电子、材料、能源、化工、纺织、服装、美容、美发、礼仪、公关、广告、营销、管理、金融、证券、保险、期货、税务、审计、会计、法律实务、知识产权）
+
+6. 问题必须与段落内容有逻辑关系
+
+7. 问题与回答在不重复的前提下，应当尽可能多地包含段落内容
+
+8. 输出的格式为：
+[
+
+{{
+  \"question\": \"  问题  \", 
+  \"answer\": \"  回答  \",
+  \"type\": \" 领域 \"
+}}
+
+,
+
+{{
+  \"question\": \"  问题  \", 
+  \"answer\": \"  回答  \",
+  \"type\": \" 领域 \"
+}}
+
+]
+
+10. 不要输出多余内容
+
+下面是给出的段落内容：
+
+{chunk}
+
+下面是段落的上下文内容：
+
+{text}
+
+下面是段落的来源文件
+{file_name}
+"
+SCORE_QA: "你是一个打分专家，你的任务是根据我提供的问题、原始片段和检索到的片段以及标准答案和答案，判断答案在下面四项指标的分数,每个指标要精确到小数点后面2位，且每次需要进行客观评价
+
+1.context_relevancy 解释：（上下文相关性，越高表示检索到的片段中无用的信息越少 0-100）
+2.context_recall 解释：（召回率，越高表示检索出来的片段与标准答案越相关 0-100）
+3.faithfulness 解释：（忠实性，越高表示答案的生成使用了越多检索出来的片段0-100）
+4.answer_relevancy 解释：（答案与问题的相关性 0-100）
+
+注意：
+请以下面格式输出
+{{
+  \"context_relevancy\": 分数, 
+  \"context_recall\": 分数,
+  \"faithfulness\": 分数,
+  \"answer_relevancy\": 分数
+}}
+
+下面是问题：
+{question}
+
+下面是原始片段：
+{meta_chunk}
+
+下面是检索到的片段：
+{chunk}
+
+下面是标准答案：
+{answer}
+
+下面是答案：
+{answer_text}
+"
diff --git a/test/tools/stopwords.txt b/test/tools/stopwords.txt
new file mode 100644
index 0000000..5784b44
--- /dev/null
+++ b/test/tools/stopwords.txt
@@ -0,0 +1,4725 @@
+　
+
+、
+老
+有时
+以前
+。
+一下
+要不然
+──
+者
+don't
+〈
+等到
+反过来说
+〉
+一一
+《
+》
+古来
+your
+准备
+往往
+而
+「
+」
+怎
+挨个
+without
+『
+』
+【
+these
+‐
+】
+逐渐
+再者
+–
+—
+would
+〔
+就是
+怕
+―
+〕
+‖
+〖
+甚至
+〗
+［⑤］
+倘
+‘
+与此同时
+’
+“
+几时
+ten
+”
+〝
+比照
+〞
+借
+该当
+!
+更有趣
+"
+逢
+•
+#
+一个
+$
+thus
+%
+meanwhile
+说真的
+特别是
+&
+…
+'
+(
+)
+*
+可是
+怪
+here’s
++
+,
+yourselves
+-
+.
+/
+［⑥］
+甚或
+集中
+‹
+:
+eleven
+›
+;
+<
+=
+>
+于是乎
+much
+?
+@
+第二单元
+A
+够瞧的
+wasn’t
+有喜欢
+又笑
+anybody
+I
+according to
+决定
+为着
+加以
+example
+串行
+除此之外
+咱们
+甚至于
+same
+只有
+［③］
+某个
+[
+after
+shouldn't
+you've
+\
+......
+第三产业
+]
+^
+_
+有问题吗
+`
+呼啦
+a
+怎麽
+凡是
+thanx
+有一期
+namely
+i
+且说
+过来
+日见
+the
+［④］
+问题
+fifth
+thank
+{
+|
+yours
+}
+一则通过
+~
+novel
+哪样
+处处
+难得
+包括
+诚然
+got
+第十届
+因此
+empty
+如此等等
+wish
+加强
+一些
+怎么办
+有的
+besides
+serious
+［①］
+什么样
+others
+¡
+失去
+或者
+那
+sans
+¦
+您
+从新
+«
+­
+转动
+ｎｇ昉
+onto
+¯
+gone
+共同
+仍旧
+第四单元
+´
+aside
+［②］
+·
+¸
+»
+¿
+避免
+<dquote>
+downwards
+某些
+不但…而且
+匆匆
+有一百
+得起
+，也
+像
+鄙人
+万一
+nowhere
+忽然
+provides
+you're
+×
+这会儿
+最后一派
+传说
+立刻
+来讲
+意思
+we'll
+确定
+上去--
+重大
+切切
+versus
+分别
+better
+with
+合理
+there
+并肩
+well
+屡次三番
+出现
+能
+都
+反之则
+不起
+竟而
+℃
+有一会了
+当时
+若非
+焉
+出去
+马上
+引起
+有一方
+不消
+不得不
+就地
+旁人
+大略
+afore
+per
+来说
+第四届
+went
+赶快
+断然
+considering
+方便
+注意
+*LRB*
+这时
+另行
+ever
+we've
+正值
+even
+然
+不得已
+现代
+陈年
+难怪
+当口儿
+儿
+thats
+又为什么
+hundred
+［⑤］］
+还是
+重要
+尽早
+难道
+若果
+上下
+save
+光
+respectively
+何时
+a's
+不足
+又小
+通常
+其后
+howbeit
+top
+too
+随时
+have
+必须
+有着
+一何
+accordingly
+Ⅲ
+particularly
+照
+八
+六
+兮
+看看
+共
+容易
+不巧
+哪天
+猛然
+其
+感兴趣
+who’s
+ain't
+腾
+近几年来
+＋＋
+com
+con
+_...
+内
+almost
+不仅...而且
+amoungst
+以及
+不已
+upon
+再
+高兴
+倒不如
+↑
+有意的
+冒
+除此
+→
+earlier
+whether
+不下
+如上所述
+quite
+深入
+不一
+beneath
+近来
+everyone
+由此可见
+怪不得
+lest
+抑或
+less
+不得了
+无宁
+对应
+冲
+一边
+看来
+were
+we're
+是不是
+try
+对于
+尔等
+－－
+以后
+became
+不常
+隔日
+&rdquo;
+得了
+举行
+cause
+嘎嘎
+极大
+第五课
+it’s
+不久
+切勿
+如次
+similarly
+无论
+动辄
+连日
+掌握
+第二波
+says
+所谓
+几
+凡
+it’d
+别人
+whence
+自
+cry
+凭
+臭
+despite
+followed
+具体说来
+至
+致
+×××
+第十次
+那个
+另外
+出
+迟早
+明显
+formerly
+转变
+shouldn’t
+１．
+gotten
+分
+切
+立即
+）、
+继后
+第四张
+风雨无阻
+［①⑤］
+wherein
+wasn't
+不了
+他是
+假如
+我
+按理
+那么
+从未
+∕
+或
+则
+分期分批
+刚
+let
+初
+welcome
+附近
+还有
+当真
+separately
+充其量
+保险
+再则
+嘎登
+漫说
+want
+一.
+㈧
+第四位
+如此
+云云
+喔唷
+［⑤ｂ］
+别
+最大
+藉以
+元／吨
+each
+［①⑥］
+到
+当地
+竟然
+must
+有效地
+所
+当着
+诸位
+probably
+川流不息
+≈
+第三遍
+那些
+当场
+［⑤ｅ］
+才
+two
+第四代
+趁便
+anyway
+［①⑦］
+第十二
+必要
+不仅
+打
+found
+综上所述
+does
+根据
+任凭
+从来
+gives
+２．３％
+think
+的确
+他的
+一转眼
+猛然间
+方能
+—　
+那麽
+［①⑧］
+沿着
+倘使
+entirely
+...
+到底
+［⑤ｄ］
+最好
+doesn’t
+犹且
+比及
+不满
+尽如人意
+won't
+维持
+随着
+till
+——
+非常
+什么意思
+把
+had
+尔尔
+”，
+切莫
+有一根
+好象
+需要
+〕〔
+has
+允许
+they'd
+起先
+given
+不会
+last
+对待
+&quot;
+借以
+主要
+这么样
+缕缕
+决不
+第十九
+［①①］
+显然
+照着
+倍感
+否则
+overall
+前此
+第五位
+联袂
+full
+away
+矣乎
+asking
+你是
+能否
+左右
+ˇ
+谁人
+［⑤ａ］
+ˉ
+ˊ
+ˋ
+第十三
+背靠背
+anything
+或则
+加入
+不但
+yesterday
+获得
+［①②］
+第十一
+５：０
+奋勇
+１２％
+˜
+只要
+多多益善
+若
+notwithstanding
+yes
+届时
+yet
+独
+［①④］
+全面
+要求
+inasmuch
+［①③］
+切不可
+况且
+若夫
+e.g.,
+无法
+进来
+第四年
+真是
+拿
+通过
+第五组
+知乎
+乘虚
+按
+以故
+three
+果真
+put
+岂但
+任务
+［①ｄ］
+her
+whoever
+’‘
+okay
+长期以来
+不得
+having
+而况
+结果
+凝神
+上述
+沙沙
+千万
+你的
+［①ｃ］
+hereupon
+应当
+待到
+千
+有一堆
+您们
+半
+乘隙
+多多
+真的
+就是了
+不过
+因为
+不必
+多年来
+［②Ｇ］
+［①ｆ］
+computer
+第五年
+单
+merely
+常言说
+相等
+同时
+归根结底
+那边
+可好
+unfortunately
+故而
+lately
+据
+这样
+［①ｅ］
+即
+却
+常言说得好
+刚才
+就要
+极端
+before
+历
+tell
+［①⑨］
+不迭
+中小
+him
+＝－
+.一
+his
+major
+＝（
+Δ
+丰富
+毫无例外
+顷刻间
+今天
+起初
+consider
+趁热
+keeps
+＜＜
+Ｒ．Ｌ．
+不怕
+＝［
+whither
+it's
+各地
+Ψ
+particular
+莫
+因了
+done
+［⑤ｆ］
+twice
+γ
+可见
+方才
+条件
+it'd
+也是
+非但
+去
+第三张
+μ
+进行
+＝｛
+它们
+第二任
+φ
+part
+又
+their
+及
+何须
+elsewhere
+行动
+［②Ｂ］
+［①ａ］
+最后一遍
+朝着
+扩大
+另一个
+并不是
+最高
+并排
+是否
+第五大道
+累次
+ltd
+第三件
+纯粹
+非徒
+另
+hereafter
+据我所知
+只
+消息
+叫
+乘机
+非得
+可
+尽管如此
+someone
+third
+mean
+neither
+further
+一致
+多少钱
+按时
+sometime
+been
+mostly
+各
+强调
+hasnt
+φ．
+couldn't
+同
+一切
+后
+相对
+一则
+向
+В
+吓
+反之
+倘然
+anent
+appreciate
+吗
+看见
+you
+一般
+going
+次第
+past
+吧
+bill
+明确
+whose
+绝非
+从头
+mill
+吱
+所幸
+人家
+trying
+倍加
+ [
+ ]
+当天
+呀
+截然
+范围
+呃
+何处
+反过来
+相对而言
+comes
+当头
+据称
+一片
+呐
+how
+呕
+won’t
+呗
+unlike
+呜
+放量
+mine
+①
+为止
+②
+呢
+③
+为此
+④
+⑤
+即如
+⑥
+不胜
+故意
+⑦
+⑧
+比方
+⑨
+⑩
+astride
+partly
+possible
+right
+反应
+<ldquo>
+第二把
+许多
+呵
+连袂
+代替
+呸
+具有
+不惟
+under
+必定
+did
+将近
+立时
+sometimes
+第三单元
+莫若
+咋
+和
+down
+later
+prior
+她们
+midst
+不能
+恰恰相反
+咚
+挨门挨户
+愤然
+人民
+出来
+ignored
+咦
+咧
+所以
+thereafter
+＝″
+regarding
+除了
+挨门逐户
+咱
+弹指之间
+take
+咳
+认识
+immediate
+还要
+relatively
+要不
+不然
+some
+如下
+连声
+如上
+……
+rather
+哇
+日渐
+哈
+哉
+这么些
+back
+哎
+以期
+余外
+不光
+哗
+大多
+<rsquo>
+第五部
+这一来
+局外
+just
+哟
+''
+哦
+哩
+不免
+哪
+必将
+大大
+那儿
+倘或
+although
+approximately
+要么
+fify
+那么样
+何妨
+哼
+如常
+良好
+知道
+he’s
+截至
+这种
+therein
+虽说
+唉
+＜±
+要不是
+除开
+thick
+soon
+总的来说
+最后一关
+第三册
+然後
+先不先
+的士高
+隔夜
+眨眼
+whereas
+usually
+后来
+从早到晚
+后面
+与其
+有笑
+近年来
+大体上
+made
+因而
+此后
+用
+不再
+以来
+甫
+being
+着呢
+甭
+大概
+尚且
+由
+而又
+绝顶
+按期
+傥然
+whereby
+第十一个
+故
+宁肯
+向着
+得出
+啊
+乃至于
+第二关
+多次
+whereupon
+大张旗鼓
+we’re
+you’ve
+趁势
+eight
+啐
+又一遍
+known
+就此
+不亦乐乎
+can't
+together
+接著
+twenty
+knows
+依照
+敢于
+敢
+ＬＩ
+may
+啥
+略
+within
+下列
+啦
+could
+′｜
+第四者
+数
+得到
+<rdquo>
+able
+适用
+总之
+略为
+喀
+吧哒
+喂
+如今
+使用
+presumably
+不但...而且
+use
+本地
+而后
+就算
+liked
+喏
+尽心竭力
+坚决
+find
+本着
+然而
+以至于
+那里
+insofar
+regardless
+--
+■
+同样
+不成
+seriously
+fill
+贼死
+ＺＸＦＩＴＬ
+becomes
+▲
+方
+据此
+倒不如说
+couldn’t
+喽
+since
+..
+./
+倘若
+we’ve
+更为
+立地
+best
+●
+也就是说
+既往
+分期
+宁愿
+反而
+显著
+的话
+hither
+个人
+基于
+无
+//
+嗡
+certainly
+造成
+既
+日
+嗬
+exactly
+嗯
+反倒
+单纯
+彼时
+concerning
+嗳
+总结
+限制
+due
+时
+请勿
+那般
+据实
+不特
+about
+［④ａ］
+＋ξ
+嘎
+并没有
+怎么样
+如何
+嘘
+above
+fire
+嘛
+根本
+顷刻之间
+并无
+不力
+myself
+herein
+则甚
+∪φ∈
+something
+由由
+是
+我是
+亲身
+thereby
+第二大节
+是的
+except
+巩固
+嘻
+sincere
+多少
+凑巧
+阿
+嘿
+紧接着
+老是
+nevertheless
+各种
+不仅仅
+中国知
+hasn't
+社会主义
+don’t
+mid
+据说
+穷年累月
+believe
+自个儿
+［④ｃ］
+into
+毫无保留地
+庶乎
+unless
+更重要的是
+第五卷
+－β
+打开天窗说亮话
+从此
+ought
+犹自
+不拘
+除
+though
+争取
+两者
+thorough
+many
+［④ｂ］
+actually
+差不多
+不若
+appear
+战斗
+长话短说
+definitely
+上升
+不独
+另一方面
+associated
+上午
+这次
+虽
+白
+we’d
+的
+seven
+哪个
+抽冷子
+取得
+inside
+到目前为止
+mainly
+随
+相应
+whenever
+下午
+似乎
+five
+beforehand
+我的
+赶早不赶晚
+从宽
+便于
+何止
+please
+换言之
+look
+-RRB-
+qua
+考虑
+哪年
+纵令
+allow
+que
+有没有
+非特
+宣布
+没奈何
+ain’t
+只消
+或是
+极为
+interest
+打从
+themselves
+忽地
+以外
+勃然
+he's
+wants
+突然
+四
+wonder
+存在
+every
+慢说
+不可抗拒
+因
+不单
+及其
+从古到今
+陡然
+略微
+again
+t’s
+indeed
+坚持
+蛮
+十分
+第三句
+更
+看上去
+安全
+零
+也好
+上去
+i’ll
+we’ll
+即将
+固
+快要
+哪些
+进步
+曾
+替
+最
+恰好
+认为
+②ｃ
+从小
+月
+有
+whole
+常常
+看
+during
+将才
+［①Ｂ］
+尽管
+由是
+［①Ｃ］
+didn’t
+再有
+c’s
+下去
+望
+自家
+朝
+此间
+恰如
+③］
+权时
+此时
+第四版
+正是
+still
+前进
+在
+来自
+极了
+累年
+本
+［①Ａ］
+［－
+underneath
+地
+itself
+toward
+用来
+呆呆地
+among
+anyone
+取道
+每天
+联系
+整个
+着
+::
+均
+为主
+极度
+人人
+相似
+ourselves
+specified
+先後
+有一对
+［①Ｅ］
+across
+前者
+相当
+moreover
+causes
+完全
+毫无
+非独
+wherever
+靠
+普通
+何尝
+不变
+第三卷
+及至
+alongside
+）÷（１－
+一番
+大家
+来
+纵使
+论说
+最后一班
+保管
+mrs
+［①Ｄ］
+不只
+难道说
+cannot
+hereby
+whereafter
+人们
+依据
+［］
+first
+什么
+极
+］［
+为了
+clearly
+不可
+［④ｅ］
+普遍
+［⑨］
+不同
+或曰
+突出
+既然
+之类
+from
+曾经
+啪达
+第十六
+并非
+you'll
+bottom
+而是
+原来
+［④ｄ］
+［⑩］
+然则
+第十八
+敢情
+唯有
+过于
+edu
+第二十
+愿意
+seems
+>>
+&ldquo;
+不止一次
+according
+替代
+二话没说
+能够
+大面儿上
+某
+与否
+［⑦］
+这就是说
+i’ve
+nighest
+value
+譬喻
+inc
+矣
+分头
+第四册
+扑通
+agin
+＜Δ
+相信
+嘿嘿
+instead
+一旦
+nigh
+练习
+［⑧］
+currently
+round
+不外乎
+什麽
+一方面
+常言道
+老老实实
+更进一步
+normally
+一时
+＜λ
+移动
+ａ］
+哪边
+来不及
+via
+完成
+假使
+′∈
+反手
+企图
+伙同
+because
+near
+第二盘
+unlikely
+孰料
+比如
+ｂ］
+比如说
+viz
+正在
+真正
+何乐而不为
+既...又
+砰
+contains
+巨大
+＜φ
+接着
+aboard
+已矣
+detail
+啊呀
+第二集
+如其
+特殊
+appropriate
+此地
+严重
+＞λ
+ｃ］
+least
+莫不然
+主张
+为何
+格外
+倒是
+才能
+we'd
+接下来
+哪怕
+其次
+wouldn't
+针对
+几乎
+多么
+挨家挨户
+促进
+顷
+顺
+最後
+nine
+宁可
+三番两次
+梆
+颇
+ｅ］
+hasn’t
+说明
+或多或少
+理该
+whats
+［②ｊ］
+到了儿
+趁早
+召开
+t's
+只当
+need
+接连不断
+来得及
+ｆ］
+仅仅
+its
+often
+被
+班开学
+volume
+绝对
+数/
+啊哈
+顷刻
+啷当
+［②ｉ］
+省得
+therefore
+日臻
+hardly
+that’s
+useful
+有些
+多亏
+第十名
+强烈
+方面
+其它
+看起来
+几度
+不仅仅是
+c's
+具体来说
+sorry
+可以
+最近
+更有效
+啊哟
+起来
+forty
+欢迎
+其实
+今年
+几经
+新华社
+对方
+迅速
+时候
+第二
+第三日
+从不
+诚如
+不敢
+不至于
+happens
+一直
+tries
+莫非
+called
+又及
+哈哈
+彻夜
+又又
+立马
+目前
+tried
+当下
+却不
+挨着
+从中
+多
+毋宁
+之一
+从严
+应用
+aslant
+不料
+大
+nothing
+anyhow
+specify
+介于
+forth
+纵然
+等等
+叮当
+当中
+长线
+变成
+system
+受到
+［③①］
+哎呀
+other
+indicated
+经常
+against
+奇
+奈
+个别
+矣哉
+老大
+不断
+不管怎样
+isn't
+hadn't
+不然的话
+後来
+asked
+indicates
+自己
+ere
+後面
+这个
+只怕
+率尔
+thoroughly
+有的是
+正确
+于是
+一面
+另悉
+充分
+一来
+很恐惧
+别是
+饱
+不日
+亲自
+从事
+awfully
+固然
+现在
+她
+好
+不时
+要
+从今以后
+如
+除却
+概
+不问
+乌乎
+从古至今
+latterly
+amongst
+敞开儿
+etc
+然后
+net
+这么
+哪儿
+all
+always
+new
+took
+那时
+already
+below
+毕竟
+didn't
+如若
+shall
+谁料
+当庭
+直到
+别的
+且不说
+交口
+...................
+离
+故步自封
+见
+除去
+叫做
+趁机
+般的
+恐怕
+不是
+有一起
+around
+种
+趁着
+亲手
+秒
+是什么意思
+略加
+and
+尤其
+哎哟
+即令
+saying
+说来
+fifteen
+庶几
+错误
+怎样
+不限
+偏偏
+充其极
+每每
+any
+这些
+越是
+until
+大举
+Lex
+从优
+长此下去
+日复一日
+全年
+按说
+第十集
+第四期
+除此而外
+今後
+第五元素
+anywhere
+某某
+自身
+where's
+这麽
+极其
+exp
+开外
+必然
+更加
+using
+达到
+containing
+哪里
+［③⑩］
+此处
+帮助
+specifying
+himself
+归齐
+第四场
+自从
+何以
+有一部
+一样
+第十四
+此外
+专门
+wouldn’t
+千万千万
+第二项
+最后一集
+记者
+maybe
+another
+规定
+虽然
+不能不
+［③ａ］
+大事
+二来
+大约
+偶尔
+are
+不尽然
+作为
+taken
+第三行
+came
+where
+又一村
+首先
+vice
+第三项
+心里
+即使
+不可开交
+从轻
+另方面
+有问题么
+从无到有
+call
+尔后
+such
+正如
+临到
+ask
+暗中
+describe
+孰知
+through
+anyways
+窃
+becoming
+广大
+而外
+恍然
+鉴于
+cant
+您是
+起头
+尽可能
+有一道
+weren’t
+上来
+either
+上面
+ours
+什么时候
+＝☆
+不曾
+很多
+yourself
+those
+seeming
+即便
+might
+let's
+之後
+刚好
+单单
+各个
+他人
+whatever
+第四集
+互相
+但愿
+间或
+下来
+第二行
+everywhere
+表明
+name
+它们的
+》），
+下面
+开始
+next
+如同
+nearly
+show
+you’re
+立
+non
+nor
+传闻
+not
+又一城
+急匆匆
+经过
+据悉
+now
+遇到
+hence
+有点
+最后一眼
+他们
+竟
+绝不
+全体
+unto
+与其说
+可能
+was
+至于
+屡次
+起首
+i'll
+way
+第五期
+can’t
+得天独厚
+怎奈
+what
+从而
+furthermore
+那末
+采取
+满足
+hadn’t
+构成
+.数
+第五集
+大体
+它是
+年复一年
+when
+｛－
+这边
+［③ｈ］
+far
+岂非
+成年累月
+何必
+从速
+truly
+it'll
+一天
+give
+欤
+惯常
+莫如
+至今
+各级
+归根到底
+第
+虽则
+［③ｇ］
+极力
+碰巧
+ZT
+起见
+各人
+再次
+直接
+其一
+理应
+ZZ
+-LRB-
+尽快
+拦腰
+noone
+等
+couldnt
+产生
+但凡
+转贴
+例如
+［②④
+那会儿
+防止
+彼此
+此
+而言
+哗啦
+more
+及时
+双方
+依靠
+举凡
+它的
+罢了
+前后
+总感觉
+关于
+～＋
+假若
+少数
+.日
+昂然
+亲口
+简直
+恰巧
+其中
+certain
+积极
+同一
+｝＞
+进而
+宁
+各式
+它
+再其次
+你们
+有关
+殆
+譬如
+处理
+used
+又喜欢
+［③ｃ］
+看样子
+设使
+looks
+few
+定
+you’ll
+described
+otherwise
+管
+you'd
+..._
+大多数
+话说
+让
+呜呼
+inner
+both
+［③ｂ］
+most
+地三鲜
+outside
+keep
+论
+第三期
+who
+各位
+组成
+why
+以上
+先后
+每
+以下
+连连
+第三集
+alone
+二话不说
+比
+along
+凭借
+不经意
+实现
+相反
+其二
+她是
+到头来
+出于
+更有甚者
+有一群
+这么点儿
+amount
+move
+该
+那样
+saw
+在下
+also
+say
+enough
+gets
+［③ｄ］
+瑟瑟
+［③ｅ］
+various
+诸
+清楚
+对
+反映
+第三回
+latter
+uses
+front
+以为
+仍然
+``
+谁
+决非
+理当
+将
+再说
+小
+这点
+迫于
+bar
+尔
+最后一页
+谁知
+了解
+乃至
+相同
+doesn't
+每时每刻
+免受
+她的
+afterwards
+sure
+nigher
+谨
+其他
+嗡嗡
+屡屡
+am
+an
+比起
+former
+此次
+就
+最后一题
+as
+at
+别处
+甚且
+更有意义
+每个
+they’ll
+looking
+it’ll
+尽
+i've
+看出
+］∧′＝［
+be
+精光
+兼之
+既…又
+当儿
+当然
+consequently
+来看
+继之
+有利
+they’d
+差一点
+牢牢
+see
+inward
+…………………………………………………③
+连日来
+by
+whom
+indicate
+有所
+汝
+由此
+赖以
+甚么
+屡
+sixty
+contain
+类如
+因着
+co
+在于
+或许
+独自
+来着
+第四声
+somewhat
+惟其
+是什么
+既是
+de
+岂
+每年
+全部
+do
+看到
+dr
+基本上
+尽然
+这儿
+粗
+［①ｈ］
+［②
+诸如
+有一片
+全都
+不外
+较比
+which
+needs
+没
+eg
+全身心
+其余
+反之亦然
+好的
+et
+never
+she
+不大
+ex
+从重
+具体
+［①ｇ］
+多多少少
+aren't
+不够
+大都
+有力
+沿
+little
+however
+尽心尽力
+全然
+所有
+过去
+恰似
+for
+greetings
+有一批
+getting
+perhaps
+总的说来
+自各儿
+大不了
+<lsquo>
+先生
+到处
+要是
+并没
+共总
+over
+不仅…而且
+six
+难说
+thence
+所在
+如是
+where’s
+go
+继续
+也罢
+obviously
+kept
+they’re
+let’s
+本身
+［①ｉ］
+挨次
+selves
+进入
+he
+isn’t
+暗地里
+very
+hi
+这里
+之所以
+本人
+最后
+placed
+豁然
+平素
+何况
+即或
+～±
+到头
+thanks
+果然
+else
+four
+beside
+不如
+ie
+做到
+不要
+if
+there's
+likely
+即刻
+in
+末##末
+一次
+is
+it
+you’d
+somebody
+weren't
+不妨
+尽量
+活
+hello
+secondly
+而论
+become
+公然
+好在
+逐步
+顿时
+最后一科
+eventually
+默然
+以後
+当前
+theres
+总是
+hopefully
+everything
+开展
+amidst
+side
+这般
+due to
+seemed
+除非
+每当
+they’ve
+之前
+中间
+off
+特点
+第二首
+－［＊］－
+［②①］
+以便
+赶
+起
+趁
+很少
+theirs
+大量
+向使
+several
+更远的
+日益
+while
+乘胜
+second
+大凡
+that
+重新
+i’d
+一定
+０：２
+than
+me
+i’m
+居然
+策略地
+different
+NULL
+mr
+大致
+ms
+follows
+多年前
+除此以外
+my
+反倒是
+plus
+最后一颗子弹
+第三大
+nd
+自打
+后者
+恰逢
+athwart
+［①ｏ］
+behind
+no
+表示
+换句话说
+遵循
+what’s
+第二声
+如期
+of
+即若
+oh
+somehow
+ok
+距
+跟
+on
+allows
+brief
+伟大
+or
+———
+第三声
+有及
+c'mon
+&nbsp;
+己
+已
+巴
+达旦
+属于
+一
+七
+what's
+三
+设或
+继而
+如前所述
+上
+下
+光是
+恰恰
+不
+somewhere
+与
+［②⑦］
+八成
+haven't
+部分
+on to
+且
+顺着
+they
+here's
+比较
+qv
+带
+old
+成为
+总的来看
+皆可
+个
+them
+简言之
+then
+［②⑧］
+将要
+︰
+rd
+re
+︳
+［＊］
+临
+︴
+︵
+︶
+∈［
+twelve
+︷
+广泛
+常
+︸
+全力
+︹
+大批
+为
+︺
+俺们
+何苦
+︻
+甚而
+︼
+︽
+每逢
+︾
+︿
+暗自
+﹀
+minus
+﹁
+sub
+﹂
+乃
+﹃
+第二类
+﹄
+么
+betwixt
+﹉
+﹊
+之
+﹋
+﹌
+﹍
+乎
+﹎
+seen
+seem
+﹏
+sup
+﹐
+如果
+﹑
+乒
+﹔
+并且
+﹕
+﹖
+默默地
+乘
+第五单元
+偶而
+so
+并不
+九
+﹝
+第三层
+财新网
+﹞
+也
+﹟
+apart
+大力
+不由得
+﹠
+﹡
+﹢
+有著
+﹤
+necessary
+大抵
+叮咚
+﹦
+第三类
+one
+用于
+成年
+姑且
+﹨
+﹩
+［②⑩］
+amid
+﹪
+aren’t
+﹫
+各自
+实际
+为什么
+彻底
+th
+年
+三番五次
+并
+基本
+to
+&nbsp
+they've
+but
+率然
+没有
+了
+willing
+available
+当即
+巴巴
+总而言之
+二
+今后
+于
+zero
+说说
+［②②］
+互
+五
+为什麽
+un
+第三课
+是以
+up
+些
+us
+because of
+亦
+this
+呵呵
+reasonably
+纯
+thin
+处在
+［②③］
+召唤
+故此
+especially
+纵
+once
+know
+人
+不择手段
+具体地说
+vs
+严格
+前面
+似的
+doing
+亲眼
+适应
+仅
+pending
+changes
+今
+that's
+［②⑤］
+仍
+从
+we
+保持
+经
+路经
+第三篇
+他
+throughout
+给
+别管
+绝
+满
+they're
+以
+形成
+正巧
+们
+［②⑥］
+就是说
+对比
+设若
+我们
+ones
+任
+不止
+觉得
+以免
+三天两头
+！
+＂
+＃
+＄
+里面
+％
+＆
+较为
+刚巧
+＇
+（
+）
+＊
+密切
+第二讲
+＋
+none
+，
+beyond
+－
+．
+／
+０
+［②ｆ］
+１
+２
+３
+４
+５
+６
+已经
+７
+弗
+８
+９
+：
+会
+；
+＜
+nobody
+＝
+那么些
+＞
+即是说
+between
+？
+＠
+除外
+传
+Ａ
+别说
+不定
+究竟
+come
+之后
+岂止
+they'll
+借此
+［②ｅ］
+［③Ｆ］
+following
+正常
+较之
+zt
+［
+不管
+］
+c’mon
+＿
+zz
+至若
+不论
+此中
+但
+i'd
+运用
+our
+随后
+there’s
+i'm
+out
+齐
+进去
+归
+当
+seeing
+有效
+何
+［②ｈ］
+get
+course
+｛
+｜
+｝
+～
+dare
+sensible
+你
+存心
+加上
+高低
+而已
+不比
+乘势
+help
+［②ｇ］
+按照
+遭到
+由于
+自后
+顶多
+self
+彼
+一起
+行为
+使
+往
+几番
+适当
+thru
+较
+遵照
+待
+不对
+背地里
+周围
+第二款
+而且
+own
+很
+circa
+只是
+毫不
+［②ａ］
+不知不觉
+得
+only
+should
+结合
+://
+依
+多数
+再者说
+a’s
+但是
+加之
+动不动
+以至
+以致
+like
+goes
+第四种
+云尔
+始而
+towards
+只限
+不少
+regards
+sent
+白白
+哼唷
+任何
+边
+随著
+~~~~
+herself
+thereupon
+便
+成心
+here
+haven’t
+简而言之
+everybody
+迄
+第三站
+必
+过
+［②ｃ］
+hers
+近
+can
+第四套
+莫不
+［②ｄ］
+轰然
+who's
+还
+这
+不尽
+应该
+said
+连
+复杂
+呼哧
+￣
+*RRB*
+￥
+will
+out of
+认真
+快
+［②ｂ］
+第十天
+really
+从此以后
+使得
+怎么
+corresponding
+不怎么
+俺
+若是
+tends
+连同
+傻傻分
+!
+"
+#
+$
+%
+&
+'
+(
+)
+*
++
+,
+-
+--
+.
+..
+...
+......
+...................
+./
+.一
+.数
+.日
+/
+//
+:
+://
+::
+;
+<
+=
+>
+>>
+?
+@
+A
+Lex
+[
+\
+]
+^
+_
+`
+exp
+sub
+sup
+|
+}
+~
+~~~~
+·
+×
+×××
+Δ
+Ψ
+γ
+μ
+φ
+φ．
+В
+—
+——
+———
+‘
+’
+’‘
+“
+”
+”，
+…
+……
+…………………………………………………③
+′∈
+′｜
+℃
+Ⅲ
+↑
+→
+∈［
+∪φ∈
+≈
+①
+②
+②ｃ
+③
+③］
+④
+⑤
+⑥
+⑦
+⑧
+⑨
+⑩
+──
+■
+▲
+
+、
+。
+〈
+〉
+《
+》
+》），
+」
+『
+』
+【
+】
+〔
+〕
+〕〔
+㈧
+一
+一.
+一一
+一下
+一个
+一些
+一何
+一切
+一则
+一则通过
+一天
+一定
+一方面
+一旦
+一时
+一来
+一样
+一次
+一片
+一番
+一直
+一致
+一般
+一起
+一转眼
+一边
+一面
+七
+万一
+三
+三天两头
+三番两次
+三番五次
+上
+上下
+上升
+上去
+上来
+上述
+上面
+下
+下列
+下去
+下来
+下面
+不
+不一
+不下
+不久
+不了
+不亦乐乎
+不仅
+不仅...而且
+不仅仅
+不仅仅是
+不会
+不但
+不但...而且
+不光
+不免
+不再
+不力
+不单
+不变
+不只
+不可
+不可开交
+不可抗拒
+不同
+不外
+不外乎
+不够
+不大
+不如
+不妨
+不定
+不对
+不少
+不尽
+不尽然
+不巧
+不已
+不常
+不得
+不得不
+不得了
+不得已
+不必
+不怎么
+不怕
+不惟
+不成
+不拘
+不择手段
+不敢
+不料
+不断
+不日
+不时
+不是
+不曾
+不止
+不止一次
+不比
+不消
+不满
+不然
+不然的话
+不特
+不独
+不由得
+不知不觉
+不管
+不管怎样
+不经意
+不胜
+不能
+不能不
+不至于
+不若
+不要
+不论
+不起
+不足
+不过
+不迭
+不问
+不限
+与
+与其
+与其说
+与否
+与此同时
+专门
+且
+且不说
+且说
+两者
+严格
+严重
+个
+个人
+个别
+中小
+中间
+丰富
+串行
+临
+临到
+为
+为主
+为了
+为什么
+为什麽
+为何
+为止
+为此
+为着
+主张
+主要
+举凡
+举行
+乃
+乃至
+乃至于
+么
+之
+之一
+之前
+之后
+之後
+之所以
+之类
+乌乎
+乎
+乒
+乘
+乘势
+乘机
+乘胜
+乘虚
+乘隙
+九
+也
+也好
+也就是说
+也是
+也罢
+了
+了解
+争取
+二
+二来
+二话不说
+二话没说
+于
+于是
+于是乎
+云云
+云尔
+互
+互相
+五
+些
+交口
+亦
+产生
+亲口
+亲手
+亲眼
+亲自
+亲身
+人
+人人
+人们
+人家
+人民
+什么
+什么样
+什麽
+仅
+仅仅
+今
+今后
+今天
+今年
+今後
+介于
+仍
+仍旧
+仍然
+从
+从不
+从严
+从中
+从事
+从今以后
+从优
+从古到今
+从古至今
+从头
+从宽
+从小
+从新
+从无到有
+从早到晚
+从未
+从来
+从此
+从此以后
+从而
+从轻
+从速
+从重
+他
+他人
+他们
+他是
+他的
+代替
+以
+以上
+以下
+以为
+以便
+以免
+以前
+以及
+以后
+以外
+以後
+以故
+以期
+以来
+以至
+以至于
+以致
+们
+任
+任何
+任凭
+任务
+企图
+伙同
+会
+伟大
+传
+传说
+传闻
+似乎
+似的
+但
+但凡
+但愿
+但是
+何
+何乐而不为
+何以
+何况
+何处
+何妨
+何尝
+何必
+何时
+何止
+何苦
+何须
+余外
+作为
+你
+你们
+你是
+你的
+使
+使得
+使用
+例如
+依
+依据
+依照
+依靠
+便
+便于
+促进
+保持
+保管
+保险
+俺
+俺们
+倍加
+倍感
+倒不如
+倒不如说
+倒是
+倘
+倘使
+倘或
+倘然
+倘若
+借
+借以
+借此
+假使
+假如
+假若
+偏偏
+做到
+偶尔
+偶而
+傥然
+像
+儿
+允许
+元／吨
+充其极
+充其量
+充分
+先不先
+先后
+先後
+先生
+光
+光是
+全体
+全力
+全年
+全然
+全身心
+全部
+全都
+全面
+八
+八成
+公然
+六
+兮
+共
+共同
+共总
+关于
+其
+其一
+其中
+其二
+其他
+其余
+其后
+其它
+其实
+其次
+具体
+具体地说
+具体来说
+具体说来
+具有
+兼之
+内
+再
+再其次
+再则
+再有
+再次
+再者
+再者说
+再说
+冒
+冲
+决不
+决定
+决非
+况且
+准备
+凑巧
+凝神
+几
+几乎
+几度
+几时
+几番
+几经
+凡
+凡是
+凭
+凭借
+出
+出于
+出去
+出来
+出现
+分别
+分头
+分期
+分期分批
+切
+切不可
+切切
+切勿
+切莫
+则
+则甚
+刚
+刚好
+刚巧
+刚才
+初
+别
+别人
+别处
+别是
+别的
+别管
+别说
+到
+到了儿
+到处
+到头
+到头来
+到底
+到目前为止
+前后
+前此
+前者
+前进
+前面
+加上
+加之
+加以
+加入
+加强
+动不动
+动辄
+勃然
+匆匆
+十分
+千
+千万
+千万千万
+半
+单
+单单
+单纯
+即
+即令
+即使
+即便
+即刻
+即如
+即将
+即或
+即是说
+即若
+却
+却不
+历
+原来
+去
+又
+又及
+及
+及其
+及时
+及至
+双方
+反之
+反之亦然
+反之则
+反倒
+反倒是
+反应
+反手
+反映
+反而
+反过来
+反过来说
+取得
+取道
+受到
+变成
+古来
+另
+另一个
+另一方面
+另外
+另悉
+另方面
+另行
+只
+只当
+只怕
+只是
+只有
+只消
+只要
+只限
+叫
+叫做
+召开
+叮咚
+叮当
+可
+可以
+可好
+可是
+可能
+可见
+各
+各个
+各人
+各位
+各地
+各式
+各种
+各级
+各自
+合理
+同
+同一
+同时
+同样
+后
+后来
+后者
+后面
+向
+向使
+向着
+吓
+吗
+否则
+吧
+吧哒
+吱
+呀
+呃
+呆呆地
+呐
+呕
+呗
+呜
+呜呼
+呢
+周围
+呵
+呵呵
+呸
+呼哧
+呼啦
+咋
+和
+咚
+咦
+咧
+咱
+咱们
+咳
+哇
+哈
+哈哈
+哉
+哎
+哎呀
+哎哟
+哗
+哗啦
+哟
+哦
+哩
+哪
+哪个
+哪些
+哪儿
+哪天
+哪年
+哪怕
+哪样
+哪边
+哪里
+哼
+哼唷
+唉
+唯有
+啊
+啊呀
+啊哈
+啊哟
+啐
+啥
+啦
+啪达
+啷当
+喀
+喂
+喏
+喔唷
+喽
+嗡
+嗡嗡
+嗬
+嗯
+嗳
+嘎
+嘎嘎
+嘎登
+嘘
+嘛
+嘻
+嘿
+嘿嘿
+四
+因
+因为
+因了
+因此
+因着
+因而
+固
+固然
+在
+在下
+在于
+地
+均
+坚决
+坚持
+基于
+基本
+基本上
+处在
+处处
+处理
+复杂
+多
+多么
+多亏
+多多
+多多少少
+多多益善
+多少
+多年前
+多年来
+多数
+多次
+够瞧的
+大
+大不了
+大举
+大事
+大体
+大体上
+大凡
+大力
+大多
+大多数
+大大
+大家
+大张旗鼓
+大批
+大抵
+大概
+大略
+大约
+大致
+大都
+大量
+大面儿上
+失去
+奇
+奈
+奋勇
+她
+她们
+她是
+她的
+好
+好在
+好的
+好象
+如
+如上
+如上所述
+如下
+如今
+如何
+如其
+如前所述
+如同
+如常
+如是
+如期
+如果
+如次
+如此
+如此等等
+如若
+始而
+姑且
+存在
+存心
+孰料
+孰知
+宁
+宁可
+宁愿
+宁肯
+它
+它们
+它们的
+它是
+它的
+安全
+完全
+完成
+定
+实现
+实际
+宣布
+容易
+密切
+对
+对于
+对应
+对待
+对方
+对比
+将
+将才
+将要
+将近
+小
+少数
+尔
+尔后
+尔尔
+尔等
+尚且
+尤其
+就
+就地
+就是
+就是了
+就是说
+就此
+就算
+就要
+尽
+尽可能
+尽如人意
+尽心尽力
+尽心竭力
+尽快
+尽早
+尽然
+尽管
+尽管如此
+尽量
+局外
+居然
+届时
+属于
+屡
+屡屡
+屡次
+屡次三番
+岂
+岂但
+岂止
+岂非
+川流不息
+左右
+巨大
+巩固
+差一点
+差不多
+己
+已
+已矣
+已经
+巴
+巴巴
+带
+帮助
+常
+常常
+常言说
+常言说得好
+常言道
+平素
+年复一年
+并
+并不
+并不是
+并且
+并排
+并无
+并没
+并没有
+并肩
+并非
+广大
+广泛
+应当
+应用
+应该
+庶乎
+庶几
+开外
+开始
+开展
+引起
+弗
+弹指之间
+强烈
+强调
+归
+归根到底
+归根结底
+归齐
+当
+当下
+当中
+当儿
+当前
+当即
+当口儿
+当地
+当场
+当头
+当庭
+当时
+当然
+当真
+当着
+形成
+彻夜
+彻底
+彼
+彼时
+彼此
+往
+往往
+待
+待到
+很
+很多
+很少
+後来
+後面
+得
+得了
+得出
+得到
+得天独厚
+得起
+心里
+必
+必定
+必将
+必然
+必要
+必须
+快
+快要
+忽地
+忽然
+怎
+怎么
+怎么办
+怎么样
+怎奈
+怎样
+怎麽
+怕
+急匆匆
+怪
+怪不得
+总之
+总是
+总的来看
+总的来说
+总的说来
+总结
+总而言之
+恍然
+恐怕
+恰似
+恰好
+恰如
+恰巧
+恰恰
+恰恰相反
+恰逢
+您
+您们
+您是
+惟其
+惯常
+意思
+愤然
+愿意
+慢说
+成为
+成年
+成年累月
+成心
+我
+我们
+我是
+我的
+或
+或则
+或多或少
+或是
+或曰
+或者
+或许
+战斗
+截然
+截至
+所
+所以
+所在
+所幸
+所有
+所谓
+才
+才能
+扑通
+打
+打从
+打开天窗说亮话
+扩大
+把
+抑或
+抽冷子
+拦腰
+拿
+按
+按时
+按期
+按照
+按理
+按说
+挨个
+挨家挨户
+挨次
+挨着
+挨门挨户
+挨门逐户
+换句话说
+换言之
+据
+据实
+据悉
+据我所知
+据此
+据称
+据说
+掌握
+接下来
+接着
+接著
+接连不断
+放量
+故
+故意
+故此
+故而
+敞开儿
+敢
+敢于
+敢情
+数/
+整个
+断然
+方
+方便
+方才
+方能
+方面
+旁人
+无
+无宁
+无法
+无论
+既
+既...又
+既往
+既是
+既然
+日复一日
+日渐
+日益
+日臻
+日见
+时候
+昂然
+明显
+明确
+是
+是不是
+是以
+是否
+是的
+显然
+显著
+普通
+普遍
+暗中
+暗地里
+暗自
+更
+更为
+更加
+更进一步
+曾
+曾经
+替
+替代
+最
+最后
+最大
+最好
+最後
+最近
+最高
+有
+有些
+有关
+有利
+有力
+有及
+有所
+有效
+有时
+有点
+有的
+有的是
+有着
+有著
+望
+朝
+朝着
+末##末
+本
+本人
+本地
+本着
+本身
+权时
+来
+来不及
+来得及
+来看
+来着
+来自
+来讲
+来说
+极
+极为
+极了
+极其
+极力
+极大
+极度
+极端
+构成
+果然
+果真
+某
+某个
+某些
+某某
+根据
+根本
+格外
+梆
+概
+次第
+欢迎
+欤
+正值
+正在
+正如
+正巧
+正常
+正是
+此
+此中
+此后
+此地
+此处
+此外
+此时
+此次
+此间
+殆
+毋宁
+每
+每个
+每天
+每年
+每当
+每时每刻
+每每
+每逢
+比
+比及
+比如
+比如说
+比方
+比照
+比起
+比较
+毕竟
+毫不
+毫无
+毫无例外
+毫无保留地
+汝
+沙沙
+没
+没奈何
+没有
+沿
+沿着
+注意
+活
+深入
+清楚
+满
+满足
+漫说
+焉
+然
+然则
+然后
+然後
+然而
+照
+照着
+牢牢
+特别是
+特殊
+特点
+犹且
+犹自
+独
+独自
+猛然
+猛然间
+率尔
+率然
+现代
+现在
+理应
+理当
+理该
+瑟瑟
+甚且
+甚么
+甚或
+甚而
+甚至
+甚至于
+用
+用来
+甫
+甭
+由
+由于
+由是
+由此
+由此可见
+略
+略为
+略加
+略微
+白
+白白
+的
+的确
+的话
+皆可
+目前
+直到
+直接
+相似
+相信
+相反
+相同
+相对
+相对而言
+相应
+相当
+相等
+省得
+看
+看上去
+看出
+看到
+看来
+看样子
+看看
+看见
+看起来
+真是
+真正
+眨眼
+着
+着呢
+矣
+矣乎
+矣哉
+知道
+砰
+确定
+碰巧
+社会主义
+离
+种
+积极
+移动
+究竟
+穷年累月
+突出
+突然
+窃
+立
+立刻
+立即
+立地
+立时
+立马
+竟
+竟然
+竟而
+第
+第二
+等
+等到
+等等
+策略地
+简直
+简而言之
+简言之
+管
+类如
+粗
+精光
+紧接着
+累年
+累次
+纯
+纯粹
+纵
+纵令
+纵使
+纵然
+练习
+组成
+经
+经常
+经过
+结合
+结果
+给
+绝
+绝不
+绝对
+绝非
+绝顶
+继之
+继后
+继续
+继而
+维持
+综上所述
+缕缕
+罢了
+老
+老大
+老是
+老老实实
+考虑
+者
+而
+而且
+而况
+而又
+而后
+而外
+而已
+而是
+而言
+而论
+联系
+联袂
+背地里
+背靠背
+能
+能否
+能够
+腾
+自
+自个儿
+自从
+自各儿
+自后
+自家
+自己
+自打
+自身
+臭
+至
+至于
+至今
+至若
+致
+般的
+良好
+若
+若夫
+若是
+若果
+若非
+范围
+莫
+莫不
+莫不然
+莫如
+莫若
+莫非
+获得
+藉以
+虽
+虽则
+虽然
+虽说
+蛮
+行为
+行动
+表明
+表示
+被
+要
+要不
+要不是
+要不然
+要么
+要是
+要求
+见
+规定
+觉得
+譬喻
+譬如
+认为
+认真
+认识
+让
+许多
+论
+论说
+设使
+设或
+设若
+诚如
+诚然
+话说
+该
+该当
+说明
+说来
+说说
+请勿
+诸
+诸位
+诸如
+谁
+谁人
+谁料
+谁知
+谨
+豁然
+贼死
+赖以
+赶
+赶快
+赶早不赶晚
+起
+起先
+起初
+起头
+起来
+起见
+起首
+趁
+趁便
+趁势
+趁早
+趁机
+趁热
+趁着
+越是
+距
+跟
+路经
+转动
+转变
+转贴
+轰然
+较
+较为
+较之
+较比
+边
+达到
+达旦
+迄
+迅速
+过
+过于
+过去
+过来
+运用
+近
+近几年来
+近年来
+近来
+还
+还是
+还有
+还要
+这
+这一来
+这个
+这么
+这么些
+这么样
+这么点儿
+这些
+这会儿
+这儿
+这就是说
+这时
+这样
+这次
+这点
+这种
+这般
+这边
+这里
+这麽
+进入
+进去
+进来
+进步
+进而
+进行
+连
+连同
+连声
+连日
+连日来
+连袂
+连连
+迟早
+迫于
+适应
+适当
+适用
+逐步
+逐渐
+通常
+通过
+造成
+逢
+遇到
+遭到
+遵循
+遵照
+避免
+那
+那个
+那么
+那么些
+那么样
+那些
+那会儿
+那儿
+那时
+那末
+那样
+那般
+那边
+那里
+那麽
+部分
+都
+鄙人
+采取
+里面
+重大
+重新
+重要
+鉴于
+针对
+长期以来
+长此下去
+长线
+长话短说
+问题
+间或
+防止
+阿
+附近
+陈年
+限制
+陡然
+除
+除了
+除却
+除去
+除外
+除开
+除此
+除此之外
+除此以外
+除此而外
+除非
+随
+随后
+随时
+随着
+随著
+隔夜
+隔日
+难得
+难怪
+难说
+难道
+难道说
+集中
+零
+需要
+非但
+非常
+非徒
+非得
+非特
+非独
+靠
+顶多
+顷
+顷刻
+顷刻之间
+顷刻间
+顺
+顺着
+顿时
+颇
+风雨无阻
+饱
+首先
+马上
+高低
+高兴
+默然
+默默地
+齐
+︿
+！
+＃
+＄
+％
+＆
+＇
+（
+）
+）÷（１－
+）、
+＊
+＋
+＋ξ
+＋＋
+，
+，也
+－
+－β
+－－
+－［＊］－
+．
+／
+０
+０：２
+１
+１．
+１２％
+２
+２．３％
+３
+４
+５
+５：０
+６
+７
+８
+９
+：
+；
+＜
+＜±
+＜Δ
+＜λ
+＜φ
+＜＜
+＝
+＝″
+＝☆
+＝（
+＝－
+＝［
+＝｛
+＞
+＞λ
+？
+＠
+Ａ
+ＬＩ
+Ｒ．Ｌ．
+ＺＸＦＩＴＬ
+［
+［①①］
+［①②］
+［①③］
+［①④］
+［①⑤］
+［①⑥］
+［①⑦］
+［①⑧］
+［①⑨］
+［①Ａ］
+［①Ｂ］
+［①Ｃ］
+［①Ｄ］
+［①Ｅ］
+［①］
+［①ａ］
+［①ｃ］
+［①ｄ］
+［①ｅ］
+［①ｆ］
+［①ｇ］
+［①ｈ］
+［①ｉ］
+［①ｏ］
+［②
+［②①］
+［②②］
+［②③］
+［②④
+［②⑤］
+［②⑥］
+［②⑦］
+［②⑧］
+［②⑩］
+［②Ｂ］
+［②Ｇ］
+［②］
+［②ａ］
+［②ｂ］
+［②ｃ］
+［②ｄ］
+［②ｅ］
+［②ｆ］
+［②ｇ］
+［②ｈ］
+［②ｉ］
+［②ｊ］
+［③①］
+［③⑩］
+［③Ｆ］
+［③］
+［③ａ］
+［③ｂ］
+［③ｃ］
+［③ｄ］
+［③ｅ］
+［③ｇ］
+［③ｈ］
+［④］
+［④ａ］
+［④ｂ］
+［④ｃ］
+［④ｄ］
+［④ｅ］
+［⑤］
+［⑤］］
+［⑤ａ］
+［⑤ｂ］
+［⑤ｄ］
+［⑤ｅ］
+［⑤ｆ］
+［⑥］
+［⑦］
+［⑧］
+［⑨］
+［⑩］
+［＊］
+［－
+［］
+］
+］∧′＝［
+］［
+＿
+ａ］
+ｂ］
+ｃ］
+ｅ］
+ｆ］
+ｎｇ昉
+｛
+｛－
+｜
+｝
+｝＞
+～
+～±
+～＋
+￥
-- 
Gitee


From 8d698d13b488d78945ab5f0446cfbbb8dd73deb5 Mon Sep 17 00:00:00 2001
From: zengxianghuai <zengxianghuai@h-partners.com>
Date: Tue, 27 May 2025 19:33:58 +0800
Subject: [PATCH 2/2] =?UTF-8?q?=E6=B7=BB=E5=8A=A0=E6=B5=8B=E8=AF=95?=
 =?UTF-8?q?=E8=84=9A=E6=9C=AC?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

---
 test/requirements.txt          |   6 +
 test/test_qa.py                | 664 +++++++++++++++++++++++++++++++++
 test/tools/config.py           |  90 +++++
 test/tools/llm.py              |  60 +++
 test/tools/similar_cal_tool.py | 158 ++++++++
 5 files changed, 978 insertions(+)
 create mode 100644 test/requirements.txt
 create mode 100644 test/test_qa.py
 create mode 100644 test/tools/config.py
 create mode 100644 test/tools/llm.py
 create mode 100644 test/tools/similar_cal_tool.py

diff --git a/test/requirements.txt b/test/requirements.txt
new file mode 100644
index 0000000..e4b18f8
--- /dev/null
+++ b/test/requirements.txt
@@ -0,0 +1,6 @@
+jieba==0.42.1
+pandas==2.1.4
+pydantic==2.10.2
+langchain==0.1.16
+langchain-openai==0.1.7
+synonyms==3.23.5
\ No newline at end of file
diff --git a/test/test_qa.py b/test/test_qa.py
new file mode 100644
index 0000000..369a69e
--- /dev/null
+++ b/test/test_qa.py
@@ -0,0 +1,664 @@
+import subprocess
+import argparse
+import asyncio
+import json
+import os
+import random
+import time
+from pathlib import Path
+import jieba
+import pandas as pd
+
+import yaml
+import requests
+from typing import Optional, List
+from pydantic import BaseModel, Field
+from tools.config import config
+from tools.llm import LLM
+from tools.similar_cal_tool import Similar_cal_tool
+current_dir = Path(__file__).resolve().parent
+
+
+def login_and_get_tokens(account, password, base_url):
+    """
+    尝试登录并获取新的session ID和CSRF token。
+
+    :param login_url: 登录的URL地址
+    :param account: 用户账号
+    :param password: 用户密码
+    :return: 包含新session ID和CSRF token的字典，或者在失败时返回None
+    """
+    # 构造请求头部
+    headers = {
+        'Content-Type': 'application/x-www-form-urlencoded',
+    }
+
+    # 构造请求数据
+    params = {
+        'account': account,
+        'password': password
+    }
+    # 发送POST请求
+    url = f"{base_url}/user/login"
+    response = requests.get(url, headers=headers, params=params)
+    # 检查响应状态码是否为200表示成功
+    if response.status_code == 200:
+        # 如果登录成功，获取新的session ID和CSRF token
+        new_session = response.cookies.get("WD_ECSESSION")
+        new_csrf_token = response.cookies.get("wd_csrf_tk")
+        if new_session and new_csrf_token:
+            return response.json(), {
+                'ECSESSION': new_session,
+                'csrf_token': new_csrf_token
+            }
+        else:
+            print("Failed to get new session or CSRF token.")
+            return None
+    else:
+        print(f"Failed to login, status code: {response.status_code}")
+        return None
+
+
+def tokenize(text):
+    return len(list(jieba.cut(str(text))))
+
+
+class DictionaryBaseModel(BaseModel):
+    pass
+
+
+class ListChunkRequest(DictionaryBaseModel):
+    document_id: str
+    text: Optional[str] = None
+    page_number: int = 1
+    page_size: int = 50
+    type: Optional[list[str]] = None
+
+
+def list_chunks(session_cookie: str, csrf_cookie: str, document_id: str,
+                text: Optional[str] = None, page_number: int = 1, page_size: int = 50,
+                base_url="http://0.0.0.0:9910") -> dict:
+    """
+    请求文档块列表的函数。
+
+    :param session_cookie: 用户会话cookie
+    :param csrf_cookie: CSRF保护cookie
+    :param document_id: 文档ID
+    :param text: 可选的搜索文本
+    :param page_number: 页码，默认为1
+    :param page_size: 每页大小，默认为10
+    :param base_url: API基础URL，默认为本地测试服务器地址
+    :return: JSON响应数据
+    """
+    # 构造请求cookies
+    # print(document_id)
+    cookies = {
+        "WD_ECSESSION": session_cookie,
+        "wd_csrf_tk": csrf_cookie
+    }
+
+    # 创建请求体实例
+    payload = ListChunkRequest(
+        document_id=document_id,
+        text=text,
+        page_number=page_number,
+        page_size=page_size,
+    ).dict()
+
+    # 发送POST请求
+    url = f"{base_url}/chunk/list"
+    response = requests.post(url, cookies=cookies, json=payload)
+
+    # 一次性获取所有chunk
+    # print(response.json())
+    page_size = response.json()['data']['total']
+
+    # 创建请求体实例
+    payload = ListChunkRequest(
+        document_id=document_id,
+        text=text,
+        page_number=page_number,
+        page_size=page_size,
+    ).dict()
+
+    # 发送POST请求
+    url = f"{base_url}/chunk/list"
+    response = requests.post(url, cookies=cookies, json=payload)
+
+    # 返回JSON响应数据
+    return response.json()
+
+
+def parser():
+    # 创建 ArgumentParser 对象
+    parser = argparse.ArgumentParser(description="Script to process document and generate QA pairs.")
+
+    # 添加必需的参数
+    parser.add_argument('-n', '--name', type=str, required=True, help='User name')
+    parser.add_argument('-p', '--password', type=str, required=True, help='User password')
+    parser.add_argument('-k', '--kb_id', type=str, required=True, help='KnowledgeBase ID')
+    parser.add_argument('-u', '--url', type=str, required=True, help='URL for witChainD')
+
+    # 添加可选参数，并设置默认值
+    parser.add_argument('-q', '--qa_count', type=int, default=1,
+                        help='Number of QA pairs to generate per text block (default: 1)')
+
+    # 添加文件名列表参数
+    parser.add_argument('-d', '--doc_names', nargs='+', required=False, default=[], help='List of document names')
+
+    # 解析命令行参数
+    args = parser.parse_args()
+    return args
+
+
+def get_prompt_dict():
+    """
+    获取prompt表
+    """
+    try:
+        with open(config['PROMPT_PATH'], 'r', encoding='utf-8') as f:
+            prompt_dict = yaml.load(f, Loader=yaml.SafeLoader)
+        return prompt_dict
+    except Exception as e:
+        print(f"open {config['PROMPT_PATH']} error {e}")
+        raise e
+
+
+prompt_dict = get_prompt_dict()
+llm = LLM(model_name=config['MODEL_NAME'],
+          openai_api_base=config['OPENAI_API_BASE'],
+          openai_api_key=config['OPENAI_API_KEY'],
+          max_tokens=config['MAX_TOKENS'],
+          request_timeout=60,
+          temperature=0.35)
+
+
+def get_random_number(l, r):
+    return random.randint(l, r-1)
+
+
+class QAgenerator:
+
+    async def qa_generate(self, chunks, file):
+        """
+        多线程生成问答对
+        """
+        start_time = time.time()
+        results = []
+        prev_texts = []
+        ans = 0
+        # 使用 asyncio.gather 来并行处理每个 chunk
+        tasks = []
+        # 获取 chunks 的长度
+        num_chunks = len(chunks)
+        image_sum = 0
+        for chunk in chunks:
+            chunk['count'] = 0
+        #     if chunk['type'] == 'image':
+        #         chunk['count'] = chunk['count'] + 1
+        #         image_sum = image_sum + 1
+        for i in range(args.qa_count):
+            x = get_random_number(min(3, num_chunks-1), num_chunks)
+            print(x)
+            chunks[x]['count'] = chunks[x]['count'] + 1
+
+        now_text = ""
+        for chunk in chunks:
+            now_text = now_text + chunk['text'] + '\n'
+            # if chunk['type'] == 'table' and len(now_text) < (config['MAX_TOKENS'] // 8):
+            #     continue
+            prev_text = '\n'.join(prev_texts)
+            while tokenize(prev_text) > (config['MAX_TOKENS'] / 4):
+                prev_texts.pop(0)
+                prev_text = '\n'.join(prev_texts)
+            if chunk['count'] > 0:
+                tasks.append(self.generate(now_text, prev_text, results, file, chunk['count'], chunk['type']))
+            prev_texts.append(now_text)
+            now_text = ''
+            ans = ans + chunk['count'] + image_sum
+
+        # 等待所有任务完成
+        await asyncio.gather(*tasks)
+        print('问答对案例：', results[:50])
+        print("问答对生成总计用时：", time.time() - start_time)
+        print(f"总计生成{ans}条问答对")
+        return results
+
+    async def generate(self, now_text, prev_text, results, file, qa_count, type_text):
+        """
+        生成问答
+        """
+        prev_text = prev_text[-(config['MAX_TOKENS'] // 8):]
+        prompt = prompt_dict.get('GENERATE_QA')
+        count = 0
+        while count < 5:
+            try:
+                # 使用多线程处理 chat_with_llm 调用
+                result_temp = await self.chat_with_llm(llm, prompt, now_text, prev_text,
+                                                       qa_count, file)
+
+                for result in result_temp:
+                    result['text'] = prev_text + now_text
+                    result['type_text'] = type_text
+                    results.append(result)
+                    count = 5
+            except Exception as e:
+                count += 1
+                print('error:', e, 'retry times', count)
+                if count == 5:
+                    results.append({'text': now_text, 'question': '无法生成问答对',
+                                   'answer': '无法生成问答对', 'type': 'error', 'type_text': 'error'})
+
+    @staticmethod
+    async def chat_with_llm(llm, prompt, text, prev_text, qa_count, file_name) -> dict:
+        """
+        对于给定的文本，通过llm生成问题-答案-段落对。
+        params:
+        - llm: LLm
+        - text: str
+        - prompt: str
+        return:
+        - qa_pairs: list[dict]
+
+        """
+        text.replace("\"", "\\\"")
+        user_call = (f"文本内容来自于{file_name},请以JSON格式输出{qa_count}对不同的问题-答案-领域，格式为["
+                     "{"
+                     "\"question\": \"  问题  \", "
+                     "\"answer\": \"  回答  \","
+                     "\"type\": \" 领域 \""
+                     "}\n"
+                     "]，并且必须将问题和回答中和未被转义的双引号转义，元素标签请用双引号括起来")
+        prompt = prompt.format(chunk=text, qa_count=qa_count, text=prev_text, file_name=file_name)
+        # print(prompt)
+        qa_pair = await llm.nostream([], prompt, user_call)
+        # 提取问题、答案段落对的list，字符串格式为["问题","答案","段落对"]
+        print(qa_pair)
+        # print("原文：", text)
+        qa_pair = json.loads(qa_pair)
+        return qa_pair
+
+
+class QueryRequest(BaseModel):
+    question: str
+    kb_sn: Optional[str] = None
+    top_k: int = Field(5, ge=0, le=10)
+    fetch_source: bool = False
+    history: Optional[List] = []
+
+
+def call_get_answer(text, kb_id, session_cookie, csrf_cookie, base_url="http://0.0.0.0:9910"):
+    # 构造请求cookies
+    cookies = {
+        "WD_ECSESSION": session_cookie,
+        "wd_csrf_tk": csrf_cookie
+    }
+
+    # 构造请求体
+    req = QueryRequest(
+        question=text,
+        kb_sn=kb_id,
+        top_k=3,
+        fetch_source=True,
+        history=[]
+    )
+
+    url = f"{base_url}/kb/get_answer"
+    print(url)
+    headers = {
+        "Content-Type": "application/json",
+        "Accept": "application/json"
+    }
+    data = req.json().encode("utf-8")
+
+    for i in range(5):
+        try:
+            response = requests.post(url, headers=headers, cookies=cookies, data=data)
+
+            if response.status_code == 200:
+                result = response.json()
+                # print("成功获取答案")
+                return result
+            print(f"请求失败，状态码: {response.status_code}, 响应内容: {response.text}")
+            time.sleep(1)
+        except Exception as e:
+            print(f"请求answer失败，错误原因{e}, 重试次数：{i+1}")
+            time.sleep(1)
+
+
+async def get_answers(QA, kb_id, session_cookie, csrf_cookie, base_url):
+    text = QA['question']
+    print(f"原文：{QA['text'][:40]}...")
+    result = call_get_answer(text, kb_id, session_cookie, csrf_cookie, base_url)
+    if result is None:
+        return None
+    else:
+        QA['witChainD_answer'] = result['data']['answer']
+        QA['witChainD_source'] = result['data']['source']
+        QA['time_cost']=result['data']['time_cost']
+        print(f"原文：{QA['text'][:40] + '...'}\n问题：{text}\n回答:{result['data']['answer'][:40]}\n\n")
+        return QA
+
+
+async def get_QAs_answers(QAs, kb_id, session_cookie, csrf_cookie, base_url):
+    results = []
+    tasks = []
+    for QA in QAs:
+        tasks.append(get_answers(QA, kb_id, session_cookie, csrf_cookie, base_url))
+    response = await asyncio.gather(*tasks)
+    for idx, result in enumerate(response):
+        if result is not None:
+            results.append(result)
+    return results
+
+
+class QAScore():
+    async def get_score(self, QA):
+        prompt = prompt_dict['SCORE_QA']
+        llm_score_dict = await self.chat_with_llm(llm, prompt, QA['question'], QA['text'], QA['witChainD_source'], QA['answer'], QA['witChainD_answer'])
+        print(llm_score_dict)
+        QA['context_relevancy'] = llm_score_dict['context_relevancy']
+        QA['context_recall'] = llm_score_dict['context_recall']
+        QA['faithfulness'] = llm_score_dict['faithfulness']
+        QA['answer_relevancy'] = llm_score_dict['answer_relevancy']
+        print(QA)
+        try:
+            lcs_score = Similar_cal_tool.longest_common_subsequence(QA['answer'], QA['witChainD_answer'])
+        except:
+            lcs_score = 0
+        QA['lcs_score'] = lcs_score
+        try:
+            jac_score = Similar_cal_tool.jaccard_distance(QA['answer'], QA['witChainD_answer'])
+        except:
+            jac_score = 0
+        QA['jac_score'] = jac_score
+        try:
+            leve_score = Similar_cal_tool.levenshtein_distance(QA['answer'], QA['witChainD_answer'])
+        except:
+            leve_score = 0
+        QA['leve_score'] = leve_score
+        return QA
+
+    async def get_scores(self, QAs):
+        tasks = []
+        results = []
+        for QA in QAs:
+            tasks.append(self.get_score(QA))
+        response = await asyncio.gather(*tasks)
+        for idx, result in enumerate(response):
+            if result is not None:
+                results.append(result)
+        return results
+
+    @staticmethod
+    async def chat_with_llm(llm, prompt, question, meta_chunk, chunk, answer, answer_text) -> dict:
+        """
+        对于给定的文本，通过llm生成问题-答案-段落对。
+        params:
+        - llm: LLm
+        - text: str
+        - prompt: str
+        return:
+        - qa_pairs: list[dict]
+
+        """
+        for i in range(5):
+            try:
+                user_call = '''请对答案打分，并以下面形式返回结果{
+  \"context_relevancy\": 分数,
+  \"context_recall\": 分数,
+  \"faithfulness\": 分数,
+  \"answer_relevancy\": 分数
+}
+'''
+                prompt = prompt.format(question=question, meta_chunk=meta_chunk,
+                                   chunk=chunk, answer=answer, answer_text=answer_text)
+                # print(prompt)
+                score_dict = await llm.nostream([], prompt, user_call)
+                st = score_dict.find('{')
+                en = score_dict.rfind('}')
+                if st != -1 and en != -1:
+                    score_dict = score_dict[st:en+1]
+                print(score_dict)
+                score_dict = json.loads(score_dict)
+                # 提取问题、答案段落对的list，字符串格式为["问题","答案","段落对"]
+                # print(score)
+                return score_dict
+            except Exception as e:
+                continue
+        return {
+                    "context_relevancy": 0,
+                    "context_recall": 0,
+                    "faithfulness": 0,
+                    "answer_relevancy": 0,
+                }
+
+
+def list_documents(session_cookie, csrf_cookie, kb_id, base_url="http://0.0.0.0:9910"):
+    # 构造请求cookies
+    cookies = {
+        "WD_ECSESSION": session_cookie,
+        "wd_csrf_tk": csrf_cookie
+    }
+
+    # 构造请求URL
+    url = f"{base_url}/doc/list"
+
+    # 构造请求体
+    payload = {
+        "kb_id": str(kb_id),  # 将uuid对象转换为字符串
+        "page_number": 1,
+        "page_size": 50,
+    }
+
+    # 发送POST请求
+    response = requests.post(url, cookies=cookies, json=payload)
+    # print(response.text)
+
+    # 一次性获取所有document
+    total = response.json()['data']['total']
+    documents = []
+    for i in range(1, (total + 50) // 50 + 1):
+        # 创建请求体实例
+        print(f"page {i} gets")
+        payload = {
+            "kb_id": str(kb_id),  # 将uuid对象转换为字符串
+            "page_number": i,
+            "page_size": 50,
+        }
+
+        response = requests.post(url, cookies=cookies, json=payload)
+        js = response.json()
+        now_documents = js['data']['data_list']
+        documents.extend(now_documents)
+    # 返回响应文本
+    return documents
+
+
+if __name__ == '__main__':
+    """
+    脚本参数包含 name, password, doc_id, qa_count, url
+    - name: 通过-n或者--name读入，必须
+    - password: 通过-p或者--password读入，必须
+    - kb_id: 通过-k或者--kb_id读入，必须
+    - qa_count: 通过-q或者--qa_count读入，非必须，默认为1，表示每个文档生成多少个问答对
+    - url: 通过-u或者--url读入，必须，为witChainD的路径
+    - doc_names: 通过-d或者--doc_names读入，非必须，默认为None，表示所有文档的名称
+    需要在.env中配置好LLM和witChainD相关的config，以及prompt路径
+    """
+    args = parser()
+    js, tmp_dict = login_and_get_tokens(args.name, args.password, args.url)
+    session_cookie = tmp_dict['ECSESSION']
+    csrf_cookie = tmp_dict['csrf_token']
+    print('login success')
+    documents = list_documents(session_cookie, csrf_cookie, args.kb_id, args.url)
+    print('get document success')
+    QAs = []
+    # print(documents)
+    for document in documents:
+        # print('refresh tokens')
+        # print(json.dumps(document, indent=4, ensure_ascii=False))
+        if args.doc_names != [] and document['name'] not in args.doc_names:
+            # args.doc_names = []
+            continue
+        else:
+            args.doc_names = []
+        js, tmp_dict = login_and_get_tokens(args.name, args.password, args.url)
+        session_cookie = tmp_dict['ECSESSION']
+        csrf_cookie = tmp_dict['csrf_token']
+        args.doc_id = document['id']
+        args.doc_name = document['name']
+        count = 0
+        while count < 5:
+            try:
+                js = list_chunks(session_cookie, csrf_cookie, str(args.doc_id), base_url=args.url)
+                count = 10
+            except Exception as e:
+                print(f"document {args.doc_name} check failed {e} with retry {count}")
+                count = count + 1
+                time.sleep(1)
+                continue
+        if count == 5:
+            print(f"document {args.doc_name} check failed")
+            continue
+        chunks = js['data']['data_list']
+        new_chunks = []
+        for chunk in chunks:
+            new_chunk = {
+                'text': chunk['text'],
+                'type': chunk['type'],
+            }
+            new_chunks.append(new_chunk)
+        chunks = new_chunks
+        model = QAgenerator()
+        try:
+            print('正在生成QA对...')
+            t_QAs = asyncio.run(model.qa_generate(chunks=chunks, file=args.doc_name))
+            print("QA对生成完毕，正在获取答案...")
+            tt_QAs = asyncio.run(get_QAs_answers(t_QAs, args.kb_id, session_cookie, csrf_cookie, args.url))
+            print("答案获取完毕，正在计算答案正确性...")
+            ttt_QAs = asyncio.run(QAScore().get_scores(tt_QAs))
+            for QA in t_QAs:
+                QAs.append(QA)
+            df = pd.DataFrame(QAs)
+            df.astype(str)
+            print(document['name'], 'down')
+            print('sample:', t_QAs[0]['question'][:40])
+            df.to_excel(current_dir / 'temp_answer.xlsx', index=False)
+            print(f'temp_Excel结果已输出到{current_dir}/temp_answer.xlsx')
+        except Exception as e:
+            import traceback
+            print(traceback.print_exc())
+            print(f"document {args.doc_name} failed {e}")
+            continue
+    #
+    # # 输出QAs到xlsx中
+    # exit(0)
+    newQAs = []
+    total = {
+        "context_relevancy(上下文相关性)": [],
+        "context_recall(召回率)": [],
+        "faithfulness(忠实性)": [],
+        "answer_relevancy(答案的相关性)": [],
+        "lcs_score(最大公共子串)": [],
+        "jac_score(杰卡德距离)": [],
+        "leve_score(编辑距离)": [],
+        "time_cost": {
+            "keyword_searching": [],
+            "text_to_vector": [],
+            "vector_searching": [],
+            "vectors_related_texts": [],
+            "text_expanding": [],
+            "llm_answer": [],
+        },
+    }
+
+    time_cost_metrics = list(total["time_cost"].keys())
+
+    for QA in QAs:
+        try:
+            if 'time_cost' in QA.keys():
+                ReOrderedQA = {
+                    '领域': str(QA['type']),
+                    '问题': str(QA['question']),
+                    '标准答案': str(QA['answer']),
+                    'witChainD 回答': str(QA['witChainD_answer']),
+                    'context_relevancy(上下文相关性)': str(QA['context_relevancy']),
+                    'context_recall(召回率)': str(QA['context_recall']),
+                    'faithfulness(忠实性)': str(QA['faithfulness']),
+                    'answer_relevancy(答案的相关性)': str(QA['answer_relevancy']),
+                    'lcs_score(最大公共子串)': str(QA['lcs_score']),
+                    'jac_score(杰卡德距离)': str(QA['jac_score']),
+                    'leve_score(编辑距离)': str(QA['leve_score']),
+                    '原始片段': str(QA['text']),
+                    '检索片段': str(QA['witChainD_source']),
+                    'keyword_searching_cost(关键字搜索时间消耗)': str(QA['time_cost']['keyword_searching'])+'s',
+                    'query_to_vector_cost(qeury向量化时间消耗)': str(QA['time_cost']['text_to_vector'])+'s',
+                    'vector_searching_cost(向量化检索时间消耗)': str(QA['time_cost']['vector_searching'])+'s',
+                    'vectors_related_texts_cost(向量关联文档时间消耗)': str(QA['time_cost']['vectors_related_texts'])+'s',
+                    'text_expanding_cost(上下文关联时间消耗)': str(QA['time_cost']['text_expanding'])+'s',
+                    'llm_answer_cost(大模型回答时间消耗)': str(QA['time_cost']['llm_answer'])+'s'
+                }
+            else:
+                ReOrderedQA = {
+                    '领域': str(QA['type']),
+                    '问题': str(QA['question']),
+                    '标准答案': str(QA['answer']),
+                    'witChainD 回答': str(QA['witChainD_answer']),
+                    'context_relevancy(上下文相关性)': str(QA['context_relevancy']),
+                    'context_recall(召回率)': str(QA['context_recall']),
+                    'faithfulness(忠实性)': str(QA['faithfulness']),
+                    'answer_relevancy(答案的相关性)': str(QA['answer_relevancy']),
+                    'lcs_score(最大公共子串)': str(QA['lcs_score']),
+                    'jac_score(杰卡德距离)': str(QA['jac_score']),
+                    'leve_score(编辑距离)': str(QA['leve_score']),
+                    '原始片段': str(QA['text']),
+                    '检索片段': str(QA['witChainD_source'])
+                }
+            print(ReOrderedQA)
+            newQAs.append(ReOrderedQA)
+
+            for metric in total.keys():
+                if metric != "time_cost":  # 跳过time_cost（特殊处理）
+                    value = ReOrderedQA.get(metric)
+                    if value is not None:
+                        total[metric].append(float(value))
+
+            if "time_cost" in QA:
+                for sub_metric in time_cost_metrics:
+                    value = QA["time_cost"].get(sub_metric)
+                    if value is not None:
+                        total["time_cost"][sub_metric].append(float(value))
+        except Exception as e:
+            print(f"QA {QA} error {e}")
+
+    # 计算平均值
+    avg = {}
+    for metric, values in total.items():
+        if metric != "time_cost":
+            avg[metric] = sum(values) / len(values) if values else 0.0
+        else:  # 处理time_cost
+            avg_time_cost = {}
+            for sub_metric, sub_values in values.items():
+                avg_time_cost[sub_metric] = (
+                    sum(sub_values) / len(sub_values) if sub_values else 0.0
+                )
+            avg[metric] = avg_time_cost
+
+    print(f"生成测试结果: {avg}")
+
+    excel_path = current_dir / "answer.xlsx"
+    with pd.ExcelWriter(excel_path, engine="xlsxwriter") as writer:
+        # 写入第一个sheet（测试样例）
+        df = pd.DataFrame(newQAs).astype(str)
+        df.to_excel(writer, sheet_name="测试样例", index=False)
+
+        # 写入第二个sheet（测试结果）
+        flat_avg = {
+            **{k: v for k, v in avg.items() if k != "time_cost"},
+            **{f"time_cost_{k}": v for k, v in avg["time_cost"].items()},
+        }
+        avg_df = pd.DataFrame([flat_avg])  # 转换为DataFrame（一行数据）
+        avg_df.to_excel(writer, sheet_name="测试结果", index=False)
+
+    print(f"测试样例和结果已输出到{excel_path}")
diff --git a/test/tools/config.py b/test/tools/config.py
new file mode 100644
index 0000000..caf547f
--- /dev/null
+++ b/test/tools/config.py
@@ -0,0 +1,90 @@
+# Copyright (c) Huawei Technologies Co., Ltd. 2023-2024. All rights reserved.
+# import os
+# from pathlib import Path
+
+# from dotenv import dotenv_values
+# from pydantic import BaseModel, Field
+
+
+# class ConfigModel(BaseModel):
+#     # FastAPI
+#     UVICORN_IP: str = Field(None, description="FastAPI 服务的IP地址")
+#     UVICORN_PORT: int = Field(None, description="FastAPI 服务的端口号")
+#     SSL_CERTFILE: str = Field(None, description="SSL证书文件的路径")
+#     SSL_KEYFILE: str = Field(None, description="SSL密钥文件的路径")
+#     SSL_ENABLE: bool = Field(None, description="是否启用SSL连接")
+#     # LOG METHOD
+#     LOG_METHOD:str = Field('stdout', description="日志记录方式")
+#     # Postgres
+#     DATABASE_URL: str = Field(None, description="Postgres数据库链接url")
+
+#     # MinIO
+#     MINIO_ENDPOINT: str = Field(None, description="MinIO连接地址")
+#     MINIO_ACCESS_KEY: str = Field(None, description="Minio认证ak")
+#     MINIO_SECRET_KEY: str = Field(None, description="MinIO认证sk")
+#     MINIO_SECURE: bool = Field(None, description="MinIO安全连接")
+#     # Redis
+#     REDIS_HOST: str = Field(None, description="redis地址")
+#     REDIS_PORT: int = Field(None, description="redis端口")
+#     REDIS_PWD:  str = Field(None, description="redis密码")
+#     REDIS_PENDING_TASK_QUEUE_NAME: str = Field(default='rag_pending_task_queue', description="redis等待开始任务队列名称")
+#     REDIS_SUCCESS_TASK_QUEUE_NAME: str = Field(default='rag_success_task_queue', description="redis已经完成任务队列名称")
+#     REDIS_RESTART_TASK_QUEUE_NAME: str = Field(default='rag_restart_task_queue', description="redis等待重启任务队列名称")
+#     # Task
+#     TASK_RETRY_TIME: int = Field(None, description="任务重试次数")
+#     # Embedding
+#     REMOTE_EMBEDDING_ENDPOINT: str = Field(None, description="远程embedding服务url地址")
+
+#     # Token
+#     SESSION_TTL: int = Field(None, description="用户session过期时间")
+#     CSRF_KEY: str = Field(None, description="csrf的密钥")
+
+#     # Security
+#     HALF_KEY1: str = Field(None, description="两层密钥管理组件1")
+#     HALF_KEY2: str = Field(None, description="两层密钥管理组件2")
+#     HALF_KEY3: str = Field(None, description="两层密钥管理组件3")
+
+#     # Prompt file
+#     PROMPT_PATH: str = Field(None, description="prompt路径")
+
+#     # PATH
+#     STOP_WORDS_PATH: str = Field(None, description="停用词表存放位置")
+#     SENSITIVE_WORDS_PATH: str = Field(None, description="敏感词表存放位置")
+#     TERM_REPLACEMENTS_PATH: str = Field(None, description="术语替换表存放位置")
+#     SENSITIVE_PATTERNS_PATH: str = Field(None, description="敏感词匹配表存放位置")
+#     # LLM my_tools
+#     MODEL_NAME: str = Field(None, description="使用的语言模型名称或版本")
+#     OPENAI_API_BASE: str = Field(None, description="语言模型服务的基础URL")
+#     OPENAI_API_KEY: str = Field(None, description="语言模型访问密钥")
+#     REQUEST_TIMEOUT: int = Field(None, description="大模型请求超时时间")
+#     MAX_TOKENS: int = Field(None, description="单次请求中允许的最大Token数")
+#     MODEL_ENH: bool = Field(None, description="是否使用大模型能力增强")
+# class Config:
+#     config: ConfigModel
+
+#     def __init__(self):
+#         current_dir = Path(__file__).resolve().parent
+#         if os.getenv("CONFIG"):
+#             config_file = os.getenv("CONFIG")
+#         else:
+#             config_file = current_dir / ".env"
+#         self.config = ConfigModel(**(dotenv_values(config_file)))
+#         if os.getenv("PROD"):
+#             os.remove(config_file)
+
+#     def __getitem__(self, key):
+#         if key in self.config.__dict__:
+#             return self.config.__dict__[key]
+#         return None
+
+
+# config = Config()
+config = {
+    'PROMPT_PATH': './tools/prompt.yaml',
+    'MODEL_NAME': 'Qwen2.5-32B-Instruct-GPTQ-Int4',
+    'OPENAI_API_BASE': "http://120.46.78.178:8000/v1",
+    'OPENAI_API_KEY': 'sk-EulerCopilot1bT1WtG2ssG92pvOPTkpT3BlbkFJVruTv8oUe',
+    'REQUEST_TIMEOUT': 120,
+    'MAX_TOKENS': 8096,
+    'MODEL_ENH': 'false',
+}
diff --git a/test/tools/llm.py b/test/tools/llm.py
new file mode 100644
index 0000000..103f4ff
--- /dev/null
+++ b/test/tools/llm.py
@@ -0,0 +1,60 @@
+# Copyright (c) Huawei Technologies Co., Ltd. 2023-2024. All rights reserved.
+import asyncio
+import time
+import json
+from langchain_openai import ChatOpenAI
+from langchain.schema import SystemMessage, HumanMessage
+
+
+class LLM:
+    def __init__(self, openai_api_key, openai_api_base, model_name, max_tokens, request_timeout=60, temperature=0.1):
+        self.client = ChatOpenAI(model_name=model_name,
+                                 openai_api_base=openai_api_base,
+                                 openai_api_key=openai_api_key,
+                                 request_timeout=request_timeout,
+                                 max_tokens=max_tokens,
+                                 temperature=temperature)
+        print(model_name)
+    def assemble_chat(self, chat=None, system_call='', user_call=''):
+        if chat is None:
+            chat = []
+        chat.append(SystemMessage(content=system_call))
+        chat.append(HumanMessage(content=user_call))
+        return chat
+
+    async def nostream(self, chat, system_call, user_call):
+        chat = self.assemble_chat(chat, system_call, user_call)
+        response = await self.client.ainvoke(chat)
+        return response.content
+
+    async def data_producer(self, q: asyncio.Queue, history, system_call, user_call):
+        message = self.assemble_chat(history, system_call, user_call)
+        try:
+            async for frame in self.client.astream(message):
+                await q.put(frame.content)
+        except Exception as e:
+            await q.put(None)
+            print(f"Error in data producer due to: {e}")
+            return
+        await q.put(None)
+
+    async def stream(self, chat, system_call, user_call):
+        st = time.time()
+        q = asyncio.Queue(maxsize=10)
+
+        # 启动生产者任务
+        producer_task = asyncio.create_task(self.data_producer(q, chat, system_call, user_call))
+        first_token_reach = False
+        while True:
+            data = await q.get()
+            if data is None:
+                break
+            if not first_token_reach:
+                first_token_reach = True
+                print(f"大模型回复第一个字耗时 = {time.time() - st}")
+            for char in data:
+                yield "data: " + json.dumps({'content': char}, ensure_ascii=False) + '\n\n'
+                await asyncio.sleep(0.03)  # 使用异步 sleep
+
+        yield "data: [DONE]"
+        print(f"大模型回复耗时 = {time.time() - st}")
diff --git a/test/tools/similar_cal_tool.py b/test/tools/similar_cal_tool.py
new file mode 100644
index 0000000..56319a4
--- /dev/null
+++ b/test/tools/similar_cal_tool.py
@@ -0,0 +1,158 @@
+import jieba
+import jieba.analyse
+import synonyms
+
+class Similar_cal_tool:
+    with open('./tools/stopwords.txt', 'r', encoding='utf-8') as f:
+        stopwords = set(f.read().splitlines())
+
+    @staticmethod
+    def normalized_scores(scores):
+        min_score = None
+        max_score = None
+        for score in scores:
+            if min_score is None:
+                min_score = score
+            else:
+                min_score = min(min_score, score)
+            if max_score is None:
+                max_score = score
+            else:
+                max_score = max(max_score, score)
+        if min_score == max_score:
+            for i in range(len(scores)):
+                scores[i] = 1
+        else:
+            for i in range(len(scores)):
+                scores[i] = (scores[i]-min_score)/(max_score-min_score)
+        return scores
+
+    @staticmethod
+    def filter_stop_words(text):
+        words = jieba.lcut(text)
+        filtered_words = [word for word in words if word not in Similar_cal_tool.stopwords]
+        text = ''.join(filtered_words)
+        return text
+
+    @staticmethod
+    def extract_keywords_sorted(text, topK=10):
+        keywords = jieba.analyse.textrank(text, topK=topK, withWeight=False)
+        return keywords
+
+    @staticmethod
+    def get_synonyms_score_dict(word):
+        try:
+            syns, scores = synonyms.nearby(word)
+            scores = Similar_cal_tool.normalized_scores(scores)
+            syns_scores_dict = {}
+            for syn, score in tuple(syns, scores):
+                syns_scores_dict[syn] = score
+            return syns_scores_dict
+        except:
+            return {word: 1}
+
+    @staticmethod
+    def text_to_keywords(text):
+        words = jieba.lcut(text)
+        if len(set(words)) <64:
+            return words
+        topK = 5
+        lv = 64
+        while lv < len(words):
+            topK *= 2
+            lv *= 2
+        keywords_sorted = Similar_cal_tool.extract_keywords_sorted(text, topK)
+        keywords_sorted_set = set(keywords_sorted)
+        new_words = []
+        for word in words:
+            if word in keywords_sorted_set:
+                new_words.append(word)
+        return new_words
+    @staticmethod
+    def cal_syns_word_score(word, syns_scores_dict):
+            if word not in syns_scores_dict:
+                return 0
+            return syns_scores_dict[word]
+    @staticmethod
+    def longest_common_subsequence(str1, str2):
+        words1 = Similar_cal_tool.text_to_keywords(str1)
+        words2 = Similar_cal_tool.text_to_keywords(str2)
+        m, n = len(words1), len(words2)
+        if m == 0 and n == 0:
+            return 1
+        if m == 0:
+            return 0
+        if n == 0:
+            return 0
+        dp = [[0]*(n+1) for _ in range(m+1)]
+        syns_scores_dicts_1 = []
+        syns_scores_dicts_2 = []
+        for word in words1:
+            syns_scores_dicts_1.append(Similar_cal_tool.get_synonyms_score_dict(word))
+        for word in words2:
+            syns_scores_dicts_2.append(Similar_cal_tool.get_synonyms_score_dict(word))
+
+        for i in range(1, m+1):
+            for j in range(1, n+1):
+                dp[i][j] = max(dp[i-1][j], dp[i][j-1])
+                dp[i][j] = dp[i-1][j-1] + (Similar_cal_tool.cal_syns_word_score(words1[i-1], syns_scores_dicts_2[j-1]
+                                                          )+Similar_cal_tool.cal_syns_word_score(words2[j-1], syns_scores_dicts_1[i-1]))
+
+        return dp[m][n]/(2*min(m,n))
+
+    def jaccard_distance(str1, str2):
+        words1 = set(Similar_cal_tool.text_to_keywords(str1))
+        words2 = set(Similar_cal_tool.text_to_keywords(str2))
+        m, n = len(words1), len(words2)
+        if m == 0 and n == 0:
+            return 1
+        if m == 0:
+            return 0
+        if n == 0:
+            return 0
+        syns_scores_dict_1 = {}
+        syns_scores_dict_2 = {}
+        for word in words1:
+            tmp_dict=Similar_cal_tool.get_synonyms_score_dict(word)
+            for key,val in tmp_dict.items():
+                syns_scores_dict_1[key]=max(syns_scores_dict_1.get(key,0),val)
+        for word in words2:
+            tmp_dict=Similar_cal_tool.get_synonyms_score_dict(word)
+            for key,val in tmp_dict.items():
+                syns_scores_dict_2[key]=max(syns_scores_dict_2.get(key,0),val)
+        sum=0
+        for word in words1:
+            sum+=Similar_cal_tool.cal_syns_word_score(word,syns_scores_dict_2)
+        for word in words2:
+            sum+=Similar_cal_tool.cal_syns_word_score(word,syns_scores_dict_2)
+        return sum/(len(words1)+len(words2))
+    def levenshtein_distance(str1, str2):
+        words1 = Similar_cal_tool.text_to_keywords(str1)
+        words2 = Similar_cal_tool.text_to_keywords(str2)
+        m, n = len(words1), len(words2)
+        if m == 0 and n == 0:
+            return 1
+        if m == 0:
+            return 0
+        if n == 0:
+            return 0
+        dp = [[0]*(n+1) for _ in range(m+1)]
+        syns_scores_dicts_1 = []
+        syns_scores_dicts_2 = []
+        for word in words1:
+            syns_scores_dicts_1.append(Similar_cal_tool.get_synonyms_score_dict(word))
+        for word in words2:
+            syns_scores_dicts_2.append(Similar_cal_tool.get_synonyms_score_dict(word))
+        dp = [[0 for _ in range(n + 1)] for _ in range(m + 1)]
+
+        for i in range(m + 1):
+            dp[i][0] = i
+        for j in range(n + 1):
+            dp[0][j] = j
+
+        for i in range(1, m + 1):
+            for j in range(1, n + 1):
+                dp[i][j] = 1 + min(dp[i - 1][j], dp[i][j - 1], dp[i - 1][j - 1])
+                dp[i][j] = min(dp[i][j],dp[i - 1][j - 1]+1-((Similar_cal_tool.cal_syns_word_score(words1[i-1], syns_scores_dicts_2[j-1]
+                                                          )+Similar_cal_tool.cal_syns_word_score(words2[j-1], syns_scores_dicts_1[i-1])))/2)
+        return 1-dp[m][n]/(m+n)
-- 
Gitee