无监督中文抽取式摘要

时间：2021-06-20 00:35:19 阅读：38 评论：0 收藏：0 [点我收藏+]

Github : https://github.com/dmmiller612/bert-extractive-summarizer

该git提供了一个中文无监督抽取关键句的方法，主要思想就是bert做向量表示，然后利用聚类计算距离。本文提供了中文的实现方法

pip install bert-extractive-summarizer
pip install spacy==2.3.1
pip install transformers
pip install neuralcoref
python -m spacy download zh_core_web_lg #中文spacy

import spacy
import zh_core_web_lg
import neuralcoref

nlp = zh_core_web_lg.load()
neuralcoref.add_to_pipe(nlp)

# summarizer 中文模型
from summarizer import Summarizer
from summarizer.sentence_handler import SentenceHandler
from spacy.lang.zh import Chinese
from transformers import *

# Load model, model config and tokenizer via Transformers
modelName = "bert-base-chinese" 
custom_config = AutoConfig.from_pretrained(modelName)
custom_config.output_hidden_states=True
custom_tokenizer = AutoTokenizer.from_pretrained(modelName)
custom_model = AutoModel.from_pretrained(modelName, config=custom_config)

model = Summarizer(
    custom_model=custom_model, 
    custom_tokenizer=custom_tokenizer,
    sentence_handler = SentenceHandler(language=Chinese)
    )
body = "要摘要的文章"

result = model(body)
full = ‘‘.join(result)
print(full) # 摘要出來的句子
函数参数
model(
    body: str # The string body that you want to summarize
    ratio: float # The ratio of sentences that you want for the final summary
    min_length: int # Parameter to specify to remove sentences that are less than 40 characters
    max_length: int # Parameter to specify to remove sentences greater than the max length,
    num_sentences: Number of sentences to use. Overrides ratio if supplied.
)

无监督中文抽取式摘要

原文：https://www.cnblogs.com/amazement/p/14905143.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)