Abstract: Among the existing research on text abstracts, the research on Tibetan text abstracts is still in its infancy, especially in the construction of data sets. Therefore, we constructed a dataset specifically for Tibetan text abstracts to fill this research gap. In our approach to data set production, we adopted an innovative greedy strategy. First, the Tibetan text is processed by clauses, then the Rough value of each sentence and title is calculated, and the sentence with the highest Rough value is selected. Then, the sentence is selected from the remaining sentence and the selected sentence is spliced, the Rough value is calculated again, and the sentence with the highest Rough value is selected for splicing. This process is repeated until three sentences are selected as summaries. The design of this method aims to make the generated abstracts reflect the main information of the text more accurately and comprehensively through the process of gradual screening and optimization, so as to improve the quality and effectiveness of Tibetan text abstracts. Through this research, we hope to provide strong data support for the study of Tibetan text abstract and promote the development of Tibetan information processing technology.
Keywords: text abstract; dataset; Tibetan news; avarice; rough value