DefExtractor: 基於大語言模型雙向互動的數學標識符-定義對的自動提取與視覺化

讀懂數學式是閱讀科學文章時的必要任務之一。閱讀數學式時，讀者需要找出其中各個符號的定義，而良好的視覺化與結構化的文字解釋，能夠增強這些數學式可讀性。然而，這些設計需要仰賴文章作者的編輯，過程不僅繁瑣且費時。為此，我們製作了DefExtractor，一個基於大語言模型的編輯工具，能夠幫助文章作者提取並視覺化數學標識符與對應的定義。只要給定由LaTeX編輯的數學式與解釋性文字，DefExtractor便能利用大語言模型辨識其語意，推薦適合的上色設計。對於模型的推薦設計，使用者還能夠進一步提出改正建議讓模型改正，以此達到人與AI模型的雙向互動與協作。我們進行了技術評估，發現DefExtractor的標識符-定義對自動提取流程，在目標使用情境下優於過去的模型。而一項包含十二位受試者的使用者研究，顯示DefExtractor中AI的輔助，能夠有效降低使用者負荷並縮短編輯時間。

關鍵字

數學式；數學符號；編輯工具；視覺化；大語言模型； LaTeX

並列摘要

Following mathematical formulas is a critical task in scientific paper reading. Readers would trace the definition and the relation of identifiers in a formula. To enhance the readability, it relies on paper authors to create effective visualization and structured text explanations, which is especially challenging for long formulas. We propose DefExtractor, an LLM-based tool that assists authors in extracting and visualizing identifier-definition pairs with AI interactively. Given a LATEX input, DefExtractor identifies the semantics and automatically suggests colored identifiers and definitions based on the LLM response. Users can modify via text prompt or syntax, where AI adapts the edits iteratively. A technical evaluation showed our pair extraction pipeline outperforms previous model in our target scenario, and a usability study with 12 participants showed that DefExtractor effectively reduced the workloads of authors and shortened editing time compared with a baseline tool.

並列關鍵字

Mathematical formulas ； Mathematical notation ； Authoring tools ； Visualization ； LLM ； LaTeX

參考文獻

T. L. Adams. Reading mathematics: More than words can say. The reading teacher, 56(8):786–795, 2003.

Google Scholar

L. Alcock. e-proofs: Student experience of online resources to aid understanding of mathematical proofs. In Proceedings of the 12th Conference on Research in Undergraduate Mathematics Education. Raleigh, NC: Special Interest Group of the Mathematical Association of America on Research in Undergraduate Mathematics Education. Citeseer, 2009.

Google Scholar

M. Alexeeva, R. Sharp, M. A. Valenzuela-Esc´arcega, J. Kadowaki, A. Pyarelal, and C. Morrison. Mathalign: Linking formula identifiers to their contextual natural language descriptions. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 2204–2212, 2020.

Google Scholar

S. Amershi, D. Weld, M. Vorvoreanu, A. Fourney, B. Nushi, P. Collisson, J. Suh, S. Iqbal, P. N. Bennett, K. Inkpen, et al. Guidelines for human-ai interaction. In Proceedings of the 2019 chi conference on human factors in computing systems, pages 1–13, 2019.

Google Scholar

K. Azad. Colorized math equations. Better Explained, 2017.

Google Scholar

延伸閱讀

全文下載

主題瀏覽