활동이 없어 이 대화는 잠겼습니다. 새 게시물을 작성해 주세요.
활동이 없어 이 대화는 잠겼습니다. 새 게시물을 작성해 주세요.
Hi,
recently I've been working on search functionality in AEM 6.3. I have no problem with searching by content of sites or assets' metadata, but I'm not able to properly configure searching by content of PDF files. From what I've already read it appears that indexing of PDF files' content should be implemented out of the box. Most of the articles are actually about disabling it. I will be very grateful for any snippet or any other example of proper index configuration enabling PDF content indexing.
Thanks in advance!
조회 수
답글
좋아요 수
You can also using other Java APIs that are meant to search PDF content. This would require a custom service. See for example - PDF Text Search And PDF Text Extraction Using PDFOne (for Java)
조회 수
답글
좋아요 수
Thank you for your reply, but I'd rather try to achieve indexing without any custom services. If it fails I'll just create my own using Tika API. If anyone here with working PDF indexing out of the box sent me his oak:index config I would be much obliged.
조회 수
답글
좋아요 수
Update:
after extracting indexed data with Luke I've noticed that instead of extracted text :fulltext field has value: TextExtractionError. Then I've indexed data using oak-run.jar with tika added in classpath:
java -cp oak-run-1.7.4.jar;tika-app-1.17.jar org.apache.jackrabbit.oak.run.Main index --reindex --index-paths=/oak:index/lucene --read-write --fds-path="path-to-aem\crx-quickstart\repository\datastore" "path-to-aem\crx-quickstart\repository\segmentstore"
Text has been extracted successfully.
The question is: why text is not extracted using default AEM OAK Index Manager? I'm using clean pristine installation of AEM 6.3 with newest service pack.
조회 수
답글
좋아요 수
I need to implement the same. But, looks like AEM 6.3 OOB search indexes the content of the word based assets too. I can find the assets by searching a word available in the content (tried for excel, powerpoint, word and pdf). Are there any specific cases in which the OOB search fails? What additional advantages would AEM with, Solr integrated with Apache Tika offer. Is it better in terms of performance? Any help is greatly appreciated. Thanks!
조회 수
답글
좋아요 수