AEMaaCS customized damAssetLucene-11 index doesn't return results from PDFs
Hi everyone,
I had to customize OOTB index /oak:index/damAssetLucene-11 with stopwords, some new indexRules and suggestions for AEMaaCS. I built my customization on /oak:index/damAssetLucene-11. I added tika.config.xml as well. Example of my index you can find below:
<damAssetLucene-11-custom-2
jcr:primaryType="oak:QueryIndexDefinition"
async="[async,nrt]"
compatVersion="{Long}2"
evaluatePathRestrictions="{Boolean}true"
excludedPaths="[/some/path]"
includedPaths="[/content/dam]"
maxFieldLength="{Long}100000"
tags="[visualSimilaritySearch,assetsOmnisearch]"
type="lucene">
<aggregates jcr:primaryType="nt:unstructured">
...
</aggregates>
<analyzers jcr:primaryType="nt:unstructured">
<default jcr:primaryType="nt:unstructured">
<tokenizer jcr:primaryType="nt:unstructured" name="Standard"/>
<filters jcr:primaryType="nt:unstructured">
<LowerCase jcr:primaryType="nt:unstructured"/>
<Stop jcr:primaryType="nt:unstructured" words="[stopwords.txt]">
<stopwords.txt jcr:primaryType="nt:file">
<jcr:content jcr:primaryType="nt:unstructured"/>
</stopwords.txt>
</Stop>
</filters>
</default>
</analyzers>
<indexRules jcr:primaryType="nt:unstructured">
...
</indexRules>
<suggestion
jcr:primaryType="nt:unstructured"
suggestAnalyzed="{Boolean}true"
suggestUpdateFrequencyMinutes="{Long}5"/>
<tika jcr:primaryType="nt:unstructured">
<config.xml jcr:primaryType="nt:file">
<jcr:content jcr:primaryType="nt:unstructured"/>
</config.xml>
</tika>
</damAssetLucene-11-custom-2>However, with this index I can't do search by PDF content. Queries return no results.
Locally, if I remove tika/config.xml, index will return results. After the deployment to the AEMaaCS, index doesn't return PDF documents in results.
Query example: /jcr:root/content/dam/project/en/sitecontent/documents//element(*, dam:Asset)[(jcr:contains(., 'some text in the pdf*'))]/rep:excerpt(.)
By the way, after the deployment to the AEMaaCS I still have /oak:index/damAssetLucene-11-custom-1 and /oak:index/damAssetLucene-11 indexes enabled.
Do you have any ideas about potential root cause?
