A robust PDF parsing pipeline that extracts text, tables, and images from PDF documents into structured JSON format. Designed as the first stage in a multimodal RAG (Retrieval-Augmented Generation) ...