Highlighted

Apache Tika config in Lucene Index and Query Flow Summary

Avatar

Avatar

Vijayalakshmi_S

MVP

Avatar

Vijayalakshmi_S

MVP

Vijayalakshmi_S
MVP

22-06-2020

Indexing in AEM - Indexing modes and Index Types (30).png

Abstract:

Apache Tika is used to detect and extract the text from varying file formats. It uses Detector and Parser for the same, as with the name, former is used to detect the content Type of the file and latter is used to parse the text content. Oak uses default Tika config. (XML file defining the Detector and Parser used).

This post illustrates 

  • The default config and one simple use case to showcase the need for custom config.
  • High level summary of the flow for Query based functionalities.

Blog content: 

https://myaemlearnings.blogspot.com/2020/06/apache-tika-config-in-lucene-index-and.html

Replies

Highlighted

Avatar

Avatar

kautuk_sahni

Community Manager

Total Posts

5.6K

Likes

961

Correct Answer

1.1K

Avatar

kautuk_sahni

Community Manager

Total Posts

5.6K

Likes

961

Correct Answer

1.1K
kautuk_sahni
Community Manager

23-06-2020

Making this as a featured post.

Highlighted

Avatar

Avatar

Vijayalakshmi_S

MVP

Avatar

Vijayalakshmi_S

MVP

Vijayalakshmi_S
MVP

25-06-2020

Thanks Kautuk