Automated extraction and structuring of menus from PDF files using machine learning and NLP techniques


Views: 12 / PDF downloads: 7

Authors

DOI:

https://doi.org/10.32523/2616-7263-2025-153-4-257-267

Keywords:

PDF document processing, text analysis automation, weakly structured data, restaurant menus, Natural Language Processing (NLP), machine learning, data extraction, semantic analysis, food service digitalization

Abstract

This study explores state-of-the-art approaches for processing PDF
documents, with a focus on analyzing poorly structured restaurant menus. The focus will be on analyzing poorly structured restaurant menus. Successful automated processing typically requires well-structured documents, meaning that aesthetic design must often be sacrificed for machine readability. However, in case of restaurants, the design of the menu is more valuable than its structure, that is why the menus are harder to process, due to its poor structure. With the ability to successfully process the poorly structured PDF documents, further processing of the documents from other spheres of businesses should become much easier. A comparative analysis is conducted of structural features in different types of PDF documents, including legislative acts and academic publications.

The research is aimed to use machine learning methods in order to overcome the challenges in automation of data extraction, analysis and structuring.
Solution that has been described in the study is developed to overcome the
problems with poorly structured PDF documents.

Downloads

Published

2025-12-22

How to Cite

Mashkanov, A. ., Akhayeva, Z., & Zakirova, A. (2025). Automated extraction and structuring of menus from PDF files using machine learning and NLP techniques . Bulletin of L.N. Gumilyov Eurasian National University Technical Science and Technology Series, 153(4), 257–267. https://doi.org/10.32523/2616-7263-2025-153-4-257-267

Issue

Section

Article

Categories

Most read articles by the same author(s)