博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
文件方式实现完整的英文词频统计实例
阅读量:5251 次
发布时间:2019-06-14

本文共 1827 字,大约阅读时间需要 6 分钟。

可以下载一长篇的英文小说,进行词频的分析。

1.读入待分析的字符串

2.分解提取单词 

3.计数字典

4.排除语法型词汇

5.排序

6.输出TOP(20)

7.对输出结果的简要说明。

 

fo = open('C:/Uscer/ben/test.txt','r') #读入该分析的字符串str = fo.read()fo.close()#将所有大写转换成小写str=str.lower() #把,。转换为空格for i in ',.':    str=str.replace(i,'') #分隔出一个一个单词words = str.split(' ') #排除语法型词汇exp = {'','the','and','to','on','of','s','a','is','u','as','also'}dic={}keys = set(words)-exp #计数字典for w in keys:    dic[w] = words.count(w)#排序wc = list(dic.items())wc.sort(key= lambda x:x[1],reverse=True)for i in range(20):    print(wc[i])

  test.txt:

Canadian Prime Minister Justin Trudeau (central) and Jack Ma (right), executive chairman and founder of the Alibaba Group, attend the Alibaba Group's Gateway'17 Canada conference in Toronto on Sept 25, 2017. [Photo/Xinhua]

The Alibaba Group's Gateway'17 Canada conference opened Monday in Canada's largest city Toronto.

Jack Ma, executive chairman and founder of the Alibaba Group, and Canadian Prime Minister Justin Trudeau delivered key speeches at the conference, which was attended by more than 3,600 people, at the Toronto Exhibition Place.

The event, along with a trade show, attracted a variety of organizations and businesses, covering such sectors as manufacturing, retail, professional services, agribusiness, and travel and tourism.

Data showed 68 percent of the participants were from small businesses with fewer than 50 employees.

The one-day conference featured presentations and breakout sessions aimed at educating enterprises about what and how to sell to China, especially through e-commerce platforms.

For example, people learned about how Alibaba's online travel marketplace and payment solutions can help Canadian businesses serve the rapidly expanding outbound Chinese travel and tourism market.

从输出结果中可知此文是关于加拿大的旅游与生意的

转载于:https://www.cnblogs.com/0055sun/p/7602223.html

你可能感兴趣的文章
帧的最小长度 CSMA/CD
查看>>
xib文件加载后设置frame无效问题
查看>>
编程算法 - 左旋转字符串 代码(C)
查看>>
IOS解析XML
查看>>
Python3多线程爬取meizitu的图片
查看>>
树状数组及其他特别简单的扩展
查看>>
zookeeper适用场景:分布式锁实现
查看>>
110104_LC-Display(液晶显示屏)
查看>>
httpd_Vhosts文件的配置
查看>>
php学习笔记
查看>>
普通求素数和线性筛素数
查看>>
PHP截取中英文混合字符
查看>>
【洛谷P1816 忠诚】线段树
查看>>
电子眼抓拍大解密
查看>>
poj 1331 Multiply
查看>>
tomcat7的数据库连接池tomcatjdbc的25个优势
查看>>
Html 小插件5 百度搜索代码2
查看>>
P1107 最大整数
查看>>
多进程与多线程的区别
查看>>
Ubuntu(虚拟机)下安装Qt5.5.1
查看>>