Notice: Function _load_textdomain_just_in_time was called incorrectly. Translation loading for the twentysixteen domain was triggered too early. This is usually an indicator for some code in the plugin or theme running too early. Translations should be loaded at the init action or later. Please see Debugging in WordPress for more information. (This message was added in version 6.7.0.) in /var/www/softbeam/hobby/wp-includes/functions.php on line 6121
Too good to be true – Page 16

遭际词频中

基于全唐诗中的文本,就李白和杜甫做了单字和双字词的字频分析。两人文本量不等,去除句读,李白8.63万字,杜甫12.0万字。所用单字:李白有3416个,杜甫有4210个。所以在单字上,各归一化为十万字,则李白要乘上1.2,杜甫要除上1.2 。考虑到杜甫用字是李白的4210/3416=1.23倍,补偿单字多带来的稀释效应,即杜甫的字频要再乘上1.23以便和李白进行归一化比较。这里统计了字频较高(归一化后字频超过50次)且差距明显(相差2倍以上)的单字。从用字上就可以看出人生遭际来:李白,杜甫

双字词基于相邻两字的计算机读取:李白有41461个,杜甫有63470,归一化系数为0.93,即杜甫的字频要除以0.93以便和李白比较。有些词李白大量使用而杜甫最多出现一次的,如“相思”60次,“金陵”58次,“黄鹤”33次,“秋浦”30次,“夜郎”27次,“绿水”23次,“谢公”,“敬亭”19次,“蛾眉”17次,“友人”16次,“荷花”,“天门”,“秋霜”15次,“紫霞”14次,“海月”,“山月”,“紫烟”,“阳春”,“朱颜”,“浔阳”,“宣城”,“渌水”,“会稽”,“谢朓”,“送别”13次。反过来,杜甫大量使用而李白最多只出现一次的,如“乾坤”46次,“巫峡”37次,“干戈”35次,“老夫”34次,“朝廷”30次,“草堂”29次,“老翁”,“多病”,“呜呼”24次,“戎马”23次,“丈人”22次,“成都”21次,“茅屋”,“老病”,“孤城”19次,“丧乱”18次,“梓州”,“英雄”,“使者”,“至尊”,“西南”17次,“阆州”,“迟暮”,“衰年”16次,“故国”,“寂寥”15次,“他乡”,“头白”,“老去”,“暮春”14次,“此生”13次。