首页 > 编程语言 > 详细

python 自然语言处理(五)____WordNet

时间:2017-02-19 21:10:01      阅读:874      评论:0      收藏:0      [点我收藏+]

WordNet是面向语义的英语词典,与传统辞典类似,但结构更丰富。nltk中包括英语WordNet,共有155287个单词和117659个同义词。

1.寻找同义词

这里以motorcar为例,寻找它的同义词集。

1 >>> from nltk.corpus import wordnet as wn
2 >>> wn.synsets(motorcar)                                //找到同义词集
3 [Synset(car.n.01)]
4 >>> wn.synset(car.n.01).lemma_names
5 <bound method Synset.lemma_names of Synset(car.n.01)>
6 >>> wn.synset(car.n.01).lemma_names()                   //访问同义词集
7 [car, auto, automobile, machine, motorcar]
8 >>>
技术分享
 1 >>> wn.synset(car.n.01).definition()              //获取该词在该词集的定义
 2 a motor vehicle with four wheels; usually propelled by an internal combustion engine
 3 >>> wn.synset(car.n.01).examples()            //获取该词在该词集下的例句
 4 [he needs a car to get to work]
 5 >>> wn.synset(car.n.01).lemmas()
 6 [Lemma(car.n.01.car), Lemma(car.n.01.auto), Lemma(car.n.01.automobile), Lemma(car.n.01.machine), Lemma(car.n.01.motorcar)]
 7 >>> wn.lemma(car.n.01.automobile)
 8 Lemma(car.n.01.automobile)
 9 >>> wn.lemma(car.n.01.automobile).synset()
10 Synset(car.n.01)
11 >>> wn.lemma(car.n.01.automobile).name()
12 automobile
13 >>> wn.synsets(car)
14 [Synset(car.n.01), Synset(car.n.02), Synset(car.n.03), Synset(car.n.04), Synset(cable_car.n.01)]
15 >>> for synset in wn.synsets(car):
16 ...     print (synset.lemma_names())
17 ...
18 [car, auto, automobile, machine, motorcar]
19 [car, railcar, railway_car, railroad_car]
20 [car, gondola]
21 [car, elevator_car]
22 [cable_car, car]
23 >>> wn.lemmas(car)                          //访问所有包含词car的词条
24 [Lemma(car.n.01.car), Lemma(car.n.02.car), Lemma(car.n.03.car), Lemma(car.n.04.car), Lemma(cable_car.n.01.car)]
25 >>>
View Code

2.WordNet的层次结构

WordNet的同义词集相当于抽象的概念,它们并不总是有对应的英语词汇。这些概念在层次结构中相互联系在一起。

技术分享

如上图,是WordNet概念的层次片段。每个节点对应一个同义词集,边表示上位词/下位词关系,即上级概念与从属概念的关系。

技术分享
 1 >>> motorcar=wn.synset(car.n.01)
 2 >>> types_of_motorcar=motorcar.hyponyms()
 3 >>> types_of_motorcar[26]
 4 Synset(stanley_steamer.n.01)
 5 >>> sorted(
 6 ... [lemma.name()
 7 ... for synset in types_of_motorcar
 8 ... for lemma in synset.lemmas()])
 9 [Model_T, S.U.V., SUV, Stanley_Steamer, ambulance, beach_waggon, beach_wagon, bus, cab, compact, compact_car, convert
10 ible, coupe, cruiser, electric, electric_automobile, electric_car, estate_car, gas_guzzler, hack, hardtop, hatchback, 
11 heap, horseless_carriage, hot-rod, hot_rod, jalopy, jeep, landrover, limo, limousine, loaner, minicar, minivan, pace
12 _car, patrol_car, phaeton, police_car, police_cruiser, prowl_car, race_car, racer, racing_car, roadster, runabout, sal
13 oon, secondhand_car, sedan, sport_car, sport_utility, sport_utility_vehicle, sports_car, squad_car, station_waggon, statio
14 n_wagon, stock_car, subcompact, subcompact_car, taxi, taxicab, tourer, touring_car, two-seater, used-car, waggon, wago
15 n]
16 >>> motorcar.hypernyms()
17 [Synset(motor_vehicle.n.01)]
18 >>> paths=motorcar.hypernym_paths()
19 >>> len(paths)
20 2
21 >>> [synset.name for synset in paths[0]]
22 [<bound method Synset.name of Synset(entity.n.01)>, <bound method Synset.name of Synset(physical_entity.n.01)>, <bound method Synset.nam
23 e of Synset(object.n.01)>, <bound method Synset.name of Synset(whole.n.02)>, <bound method Synset.name of Synset(artifact.n.01)>, <bou
24 nd method Synset.name of Synset(instrumentality.n.03)>, <bound method Synset.name of Synset(container.n.01)>, <bound method Synset.name
25 of Synset(wheeled_vehicle.n.01)>, <bound method Synset.name of Synset(self-propelled_vehicle.n.01)>, <bound method Synset.name of Synset
26 (motor_vehicle.n.01)>, <bound method Synset.name of Synset(car.n.01)>]
27 >>> [synset.name() for synset in paths[0]]
28 [entity.n.01, physical_entity.n.01, object.n.01, whole.n.02, artifact.n.01, instrumentality.n.03, container.n.01, wheeled_veh
29 icle.n.01, self-propelled_vehicle.n.01, motor_vehicle.n.01, car.n.01]
30 >>> [synset.name() for synset in paths[1]]
31 [entity.n.01, physical_entity.n.01, object.n.01, whole.n.02, artifact.n.01, instrumentality.n.03, conveyance.n.03, vehicle.n.
32 01, wheeled_vehicle.n.01, self-propelled_vehicle.n.01, motor_vehicle.n.01, car.n.01]
33 >>> motorcar.root_hypernyms()
34 [Synset(entity.n.01)]
35 >>>
View Code

3.更多的词汇关系

上位词和下位词被称为词汇关系,因为它们是同义集之间的关系。这两者的关系为上下定位“is-a”层次。WordNet网络另一个重要的定位方式是从条目到它们的部件(部分)或到包含它们的东西(整体)。

1)部分-整体关系

 1 >>> wn.synset(tree.n.01).part_meronyms()
 2 [Synset(burl.n.02), Synset(crown.n.07), Synset(limb.n.02), Synset(stump.n.01), Synset(trunk.n.01)]
 3 >>> wn.synset(tree.n.01).substance_meronyms()
 4 [Synset(heartwood.n.01), Synset(sapwood.n.01)]
 5 >>> wn.synset(tree.n.01).member_holonyms()
 6 [Synset(forest.n.01)]
 7 >>> for synset in wn.synsets(mint, wn.NOUN):
 8 ...     print("%s : %s" % (synset.name(), synset.definition())
 9 ...
10 ...
11 ... )
12 ...
13 batch.n.02 : (often followed by `of) a large number or amount or extent
14 mint.n.02 : any north temperate plant of the genus Mentha with aromatic leaves and small mauve flowers
15 mint.n.03 : any member of the mint family of plants
16 mint.n.04 : the leaves of a mint plant used fresh or candied
17 mint.n.05 : a candy that is flavored with a mint oil
18 mint.n.06 : a plant where money is coined by authority of the government
19 >>> wn.synset(mint.n.04).part_holonyms()
20 [Synset(mint.n.02)]
21 >>> wn.synset(mint.n.04).substance_holonyms()
22 [Synset(mint.n.05)]

2)蕴涵关系

1 >>> wn.synset(walk.v.01).entailments()
2 [Synset(step.v.01)]
3 >>> wn.synset(eat.v.01).entailments()
4 [Synset(chew.v.01), Synset(swallow.v.01)]
5 >>> wn.synset(tease.v.03).entailments()
6 [Synset(arouse.v.07), Synset(disappoint.v.01)]

3)反义词

1 >>> wn.lemma(supply.n.02.supply).antonyms()
2 [Lemma(demand.n.02.demand)]
3 >>> wn.lemma(rush.v.01.rush).antonyms()
4 [Lemma(linger.v.04.linger)]
5 >>> wn.lemma(horizontal.a.01.horizontal).antonyms()
6 [Lemma(inclined.a.02.inclined), Lemma(vertical.a.01.vertical)]
7 >>> wn.lemma(staccato.r.01.staccato).antonyms()
8 [Lemma(legato.r.01.legato)]
9 >>>

4. 语义相似度

同义词集是由复杂的词汇关系网络所连接起来的。给定一个同义词集,可以遍历WordNet网络来查找相关含义的同义词集。每个同义词集都有一个或多个上位词路径连接到一个根上位词。连接到同一个根的两个同义词集可能有一些共同的上位词。如果两个同义词集共用一个特定的上位词——在上位词层次结构中处于较底层——它们一定有密切的联系。

python 自然语言处理(五)____WordNet

原文:http://www.cnblogs.com/no-tears-girl/p/6416765.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!