I am trying to extract both text and tables from PDF document using python sdk. But, extracted data might be corrupted or encoded differently in my case. I have added sample o/p below. Do you have any idea, how to fix such issues during extraction. I have tested similar document in adobe pdf to excel service, which is working as expected. Thank you.
JSON o/p sample:
{"Bounds": [52.51200866699219, 36.227996826171875, 72.38600158691406, 145.63192749023438], "Font": {"alt_family_name": "Arial Narrow", "embedded": false, "encoding": "WinAnsiEncoding", "family_name": "Arial Narrow", "font_type": "TrueType", "italic": false, "monospaced": false, "name": "ArialNarrow", "subset": false, "weight": 400}, "HasClip": false, "Lang": "en", "ObjectID": 190, "Page": 0, "Path": "//Document/P[3]", "Rotation": 90.0, "Text": "New Growth Horizon ", "TextSize": 14.0, "attributes": {"LineHeight": 16.75, "SpaceAfter": 5.5}}, {"Bounds": [54.444000244140625, 530.0039978027344, 71.43600463867188, 758.7480316162109], "Font": {"alt_family_name": "Arial Narrow", "embedded": true, "encoding": "Identity-H", "family_name": "Arial Narrow", "font_type": "CIDFontType2", "italic": false, "monospaced": false, "name": "MBGINL+ArialNarrow", "subset": true, "weight": 400}, "HasClip": false, "Lang": "en", "ObjectID": 191, "Page": 0, "Path": "//Document/P[4]", "Rotation": 90.0, "Text": "\ue012 \ue013 \ue007\ue006\ue014 \ue001 \ue007\ue015 \ue01e\ue01f \ue014\ue00f\ue00f!\ue01c\ue005\ue01f\ue007 \ue014\ue001\" \ue016!\ue006 \ue001\"\ue008# \ue015 PO ", "TextSize": 12.0, "attributes": {"LineHeight": 14.375, "SpaceAfter": 11, "TextAlign": "End"}}, {"Bounds": [80.54400634765625, 34.798004150390625, 81.74400329589844, 66.17799377441406], "ObjectID": 193, "Page": 0, "Path": "//Document/P[5]/Figure", "attributes": {"BBox": [80.51669999999649, 35.43079999999827, 81.83669999999984, 65.66959999999744]}}, {"Bounds": [79.94400024414062, 755.1799926757812, 106.33399963378906, 756.3800048828125], "ObjectID": 195, "Page": 0, "Path": "//Document/P[5]/Figure[2]", "attributes": {"BBox": [80.51669999999649, 755.1609999999928, 105.83599999999933, 756.4809999999998], "Placement": "Block"}}, {"Bounds": [77.54400634765625, 34.798004150390625, 106.53599548339844, 756.3800048828125], "Font": {"alt_family_name": "Arial Narrow", "embedded": true, "encoding": "Identity-H", "family_name": "Arial Narrow", "font_type": "CIDFontType2", "italic": false, "monospaced": false, "name": "MBGINO+ArialNarrow-Bold", "subset": true, "weight": 700}, "HasClip": false, "Lang": "en", "ObjectID": 192, "Page": 0, "Path": "//Document/P[5]", "Rotation": 90.0, "Text": "\"$\ue00c\ue001\ue003\ue004\ue005\ue005\ue006\ue007\ue008\ue001\t\n\ue001\ue00b\ue00c\ue00d\ue00c\n\ue00e\ue00f\ue010\ue001\ue006\ue00d\ue011\ue001\ue012\t\ue013\ue00c\ue007\ue006\ue014\ue00c\ue001%\ue003\ue00b\ue012&\ue001\ue011\t'\ue004\ue005\ue00c\ue00d\ue00f\ue001(\ue00e!!\ue001$\ue00c!#\ue001\ue008 \t\ue004\ue001'$\t\t\ue010\ue00c\ue001\ue006\ue001$\ue00c\ue006!\ue00f$\ue001#!\ue006\ue00d )\ue001\"$\ue00c\ue001\ue003\ue00b\ue012\ue001\ue010$\t(\ue010\ue001\ue008\t\ue004\ue001$\t(\ue001\ue008\t\ue004\ue001\ue006\ue00d\ue011\ue001\ue00f$\ue00c\ue001#!\ue006\ue00d \ue001(\t\ue004!\ue011 \ue010$\ue006\ue007 \ue001\ue00f$ \ue001' \ue010\ue00f\ue001 \ue007\ue001' \ue013 \ue007 \ue011\ue001$ \ue006!\ue00f$\ue001'\ue006\ue007 \ue001\ue010 \ue007\ue013\ue00e' \ue010)\ue001*+\",\ue015\ue001- \ue007\ue005\ue006\ue00f\ue00e \ue001\ue006. \ue004\ue00f\ue001\ue00f$ \ue001' \ue010\ue00f\ue001 \ue001\ue00f$\ue00e\ue010\ue001#!\ue006 %'\ue006!! \ue011\ue001\ue00f$ \ue001#\ue007 \ue005\ue00e\ue004\ue005&\ue001(\ue00e!!\ue001. \ue001#\ue007 \ue013\ue00e\ue011 \ue011\ue001\ue010 #\ue006\ue007\ue006\ue00f !\ue008) ", "TextSize": 12.0, "attributes": {"LineHeight": 12, "SpaceAfter": 8.5, "TextAlign": "Justify"}}, {"Bounds": [148.218994140625, 165.58900451660156, 149.03900146484375, 168.82899475097656], "ObjectID": 206, "Page": 0, "Path": "//Document/P[6]/Figure", "attributes": {"BBox": [148.43399999999383, 165.86499999999796, 148.91399999999703, 168.74499999999534]}},