中文文本拼寫糾錯研究綜述

打開文本圖片集
中圖分類號:TP391.1;TP301.1 文獻標識碼:A 文章編號:2096-4706(2025)08-0138-08
Abstract:Chinese Spelling Correction(CSC)isacrucial foundational task inNaturalLanguage Processing (NLP),and providessupport forthedownstreamtasks andresearch.Theresearch in the fieldofCSCtaskscontinues to develop,mainly divided into eror corrction methods based onN-Gram language models,Deep Leaming,andLarge Language Models (LLMs). Firstly,techaracteristicsoftheN-GamlnguagemodelanditsapplicationinCSCareanalyzed,rvealingitsadvatagesin capturing contextual information.Secondly,methodsbasedonDepLearning improve theaccuracyof error coectionthrough deep neural networksand are widelyused in Chinese text procesing.Atthesame time,theriseofLLMs provides new ideas for speling correction,demonstrating their enormous potentialindealing withcomplex languagephenomena.Thisreviewprovides adetailedoverviewofthecurrentresearchstatusintheCSCfeld,providingareferenceforscholars engaged inrelatedresearch.
Keywords: Chinese text; spelling correction; N-Gram language model; Deep Learning; Large Language Model
0 引言
中文文本拼寫錯誤(CSC)是自然語言處理(NLP)領域的一個重要的基礎研究方向,其目的是檢測和糾正文本中出現(xiàn)的拼寫錯誤,為后續(xù)的文本分析、信息檢索、文本生成等任務提供了干凈、準確的輸入數(shù)據(jù)。(剩余13506字)