Apache Tika config in Lucene Index and Query Flow Summary | Community
Skip to main content
Vijayalakshmi_S
Level 10
June 23, 2020

Apache Tika config in Lucene Index and Query Flow Summary

  • June 23, 2020
  • 1 reply
  • 1743 views

Abstract:

Apache Tika is used to detect and extract the text from varying file formats. It uses Detector and Parser for the same, as with the name, former is used to detect the content Type of the file and latter is used to parse the text content. Oak uses default Tika config. (XML file defining the Detector and Parser used).

This post illustrates 

  • The default config and one simple use case to showcase the need for custom config.
  • High level summary of the flow for Query based functionalities.

Blog content: 

https://myaemlearnings.blogspot.com/2020/06/apache-tika-config-in-lucene-index-and.html

This post is no longer active and is closed to new replies. Need help? Start a new post to ask your question.

1 reply

kautuk_sahni
Community Manager
Community Manager
June 23, 2020

Making this as a featured post.

Kautuk Sahni
Vijayalakshmi_S
Level 10
June 25, 2020

Thanks Kautuk