Expand my Community achievements bar.

PDF table extraction is different than expected

Avatar

Level 1

Hello there, I am trying to extract the tables from the PDF. Currently, I am using the PDF service python SDK. But, I am facing a problem in extraction results. The extracted results looks like encoded or corrupted. I have attached a screenshot of extracted table below. Additionally, Following JSON sample highlights the issue I stuck on.

{"version": {"json_export": "218", "page_segmentation": "55", "schema": "1.1.0", "structure": "1.1136.0", "table_structure": "5"}, "extended_metadata": {"ID_instance": "44 DC 8C 73 ED BB B2 11 0A 00 0D B1 8B 55 A7 7F ", "ID_permanent": "AD 6A 2D 61 E0 97 2E 6F E3 26 3A 24 3C CA F3 20 ", "has_acroform": false, "has_embedded_files": false, "is_XFA": false, "is_certified": false, "is_encrypted": false, "is_digitally_signed": false, "language": "en", "page_count": 9, "pdf_version": "1.6", "pdfa_compliance_level": "", "pdfua_compliance_level": ""}, "elements": [{"Bounds": [30.080001831054688, 36.42799377441406, 47.07200622558594, 477.36900329589844], "Font": {"alt_family_name": "Arial Narrow", "embedded": true, "encoding": "Identity-H", "family_name": "Arial Narrow", "font_type": "CIDFontType2", "italic": false, "monospaced": false, "name": "MBGINL+ArialNarrow", "subset": true, "weight": 400}, "HasClip": false, "Lang": "en", "ObjectID": 188, "Page": 0, "Path": "//Document/P", "Rotation": 90.0, "Text": "\ue003\ue004\ue005\ue005\ue006\ue007\ue008\ue001 \ue001 \ue00e\ue00f\ue010\ue001\ue006 \ue011\ue001\ue012 \ue013 \ue007\ue006\ue014 \ue015 \ue003\ue004\ue005\ue006\ue001\ue006\ue004\ue007\ue008\ue001 \ue005 \ue001 \ue00e\ue00f\ue010\ue008\ue001\ue011\ue001\ue003\ue004\ue005\ue006\ue001\ue012 \ue013\ue001 \ue005\ue014\ue001\ue015 \ue010\ue001 \ue00e\ue00f\ue010\ue00f\ue016\ue001\ue017\ue00f\ue010\ue00e\ue007\ue018\ue00f\ue008 ", "TextSize": 12.0, "attributes": {"LineHeight": 14.375, "SpaceAfter": 5}},

Do you have any idea why I am getting above error. I have followed the documentation. Thank you.

Topics

Topics help categorize Community content and increase your ability to discover relevant content.

0 Replies