AEMaaCS upload author index step failing with stopwords file | Community
Skip to main content
konstantyn_diachenko
Community Advisor
Community Advisor
August 28, 2024
Solved

AEMaaCS upload author index step failing with stopwords file

  • August 28, 2024
  • 1 reply
  • 558 views

Hi everyone,

 

I customized OOTB index /oak:index/damAssetLucene-11 and added support of stopwords there.

<damAssetLucene-11-custom-2 jcr:primaryType="oak:QueryIndexDefinition" async="[async,nrt]" compatVersion="{Long}2" evaluatePathRestrictions="{Boolean}true" excludedPaths="[/some/path]" includedPaths="[/content/dam]" maxFieldLength="{Long}100000" tags="[visualSimilaritySearch,assetsOmnisearch]" type="lucene"> <aggregates jcr:primaryType="nt:unstructured"> ... </aggregates> <analyzers jcr:primaryType="nt:unstructured"> <default jcr:primaryType="nt:unstructured" luceneMatchVersion="LUCENE_47" class="org.apache.lucene.analysis.standard.StandardAnalyzer"> <stopwords jcr:primaryType="nt:file"> <jcr:content jcr:primaryType="nt:unstructured" jcr:mimeType="text/plain"/> </stopwords> </default> </analyzers> <indexRules jcr:primaryType="nt:unstructured"> ... </indexRules> <suggestion jcr:primaryType="nt:unstructured" suggestAnalyzed="{Boolean}true" suggestUpdateFrequencyMinutes="{Long}5"/> <tika jcr:primaryType="nt:unstructured"> <config.xml jcr:primaryType="nt:file"> <jcr:content jcr:primaryType="nt:unstructured"/> </config.xml> </tika> </damAssetLucene-11-custom-2>

I have the following structure in the project:
_oak_index
|__damAssetLecene-11-custom-2
     |__analyzers
     |    |__default
     |         |__stopwords.dir
     |         |    |__.content.xml

     |         |__stopwords
     |__tika

          |__config.xml

 

stopwords.dir/.content.xml

<?xml version="1.0" encoding="UTF-8"?> <jcr:root xmlns:jcr="http://www.jcp.org/jcr/1.0" xmlns:nt="http://www.jcp.org/jcr/nt/1.0" jcr:primaryType="nt:file"> <jcr:content jcr:encoding="utf-8" jcr:mimeType="text/plain" jcr:primaryType="nt:resource"/> </jcr:root>

 

tika.xml is uploaded without any problem to AEMaaCS instance, but stopwords isn't. 

In the Cloud Manager Pipeline logs I see the following error:

23:31:55.820 [main] INFO o.a.j.o.p.i.i.IndexDefinitionUpdater - Adding new index definition at path [/oak:index/damAssetLucene-11-custom-2] 23:31:55.853 [main] INFO o.a.j.oak.index.IndexerSupport - Switched the async lane for indexes at [/oak:index/cqPageLucene-0-custom-2, /oak:index/damAssetLucene-11-custom-2] to offline-reindex-async and marked them for reindex 23:31:55.879 [main] INFO o.a.j.oak.index.LuceneIndexHelper - Setting RAMBufferSize for LuceneIndexWriter (configurable via system property 'oak.index.ramBufferSizeMB') to 32 MB 23:31:55.917 [main] INFO o.a.j.o.p.i.s.s.e.FulltextIndexEditorContext - Stored the cloned index definition for [/oak:index/cqPageLucene-0-custom-2]. Changes in index definition would now only be effective post reindexing 23:31:55.917 [main] INFO o.a.j.o.p.i.s.s.e.FulltextIndexEditorContext - IndexDefinition creation timestamp added for [/oak:index/cqPageLucene-0-custom-2] 23:31:56.078 [main] ERROR c.adobe.granite.indexing.tool.Main - Can't perform operation java.lang.NullPointerException: Cannot invoke "org.apache.jackrabbit.oak.api.Blob.getNewStream()" because "blob" is null at org.apache.jackrabbit.oak.plugins.index.lucene.NodeStateAnalyzerFactory.loadStopwordSet(NodeStateAnalyzerFactory.java:247) at org.apache.jackrabbit.oak.plugins.index.lucene.NodeStateAnalyzerFactory.createAnalyzerViaReflection(NodeStateAnalyzerFactory.java:161) at org.apache.jackrabbit.oak.plugins.index.lucene.NodeStateAnalyzerFactory.createInstance(NodeStateAnalyzerFactory.java:97) at org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexDefinition.collectAnalyzers(LuceneIndexDefinition.java:167) at org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexDefinition.<init>(LuceneIndexDefinition.java:76) at org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexDefinition$Builder.createInstance(LuceneIndexDefinition.java:102) at org.apache.jackrabbit.oak.plugins.index.search.IndexDefinition$Builder.build(IndexDefinition.java:410) at org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexDefinition$Builder.build(LuceneIndexDefinition.java:91) at org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexDefinition$Builder.build(LuceneIndexDefinition.java:88) at org.apache.jackrabbit.oak.plugins.index.search.spi.editor.FulltextIndexEditorContext.createIndexDefinition(FulltextIndexEditorContext.java:310) at org.apache.jackrabbit.oak.plugins.index.search.spi.editor.FulltextIndexEditorContext.<init>(FulltextIndexEditorContext.java:107) at org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexEditorContext.<init>(LuceneIndexEditorContext.java:48) at org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexEditorProvider.getIndexEditor(LuceneIndexEditorProvider.java:236) at org.apache.jackrabbit.oak.plugins.index.CompositeIndexEditorProvider.getIndexEditor(CompositeIndexEditorProvider.java:73) at org.apache.jackrabbit.oak.plugins.index.IndexUpdate.collectIndexEditors(IndexUpdate.java:322) at org.apache.jackrabbit.oak.plugins.index.IndexUpdate.enter(IndexUpdate.java:178) at org.apache.jackrabbit.oak.spi.commit.VisibleEditor.enter(VisibleEditor.java:53) at org.apache.jackrabbit.oak.spi.commit.EditorDiff.process(EditorDiff.java:48) at org.apache.jackrabbit.oak.index.OutOfBandIndexerBase.preformIndexUpdate(OutOfBandIndexerBase.java:126) at org.apache.jackrabbit.oak.index.OutOfBandIndexerBase.reindex(OutOfBandIndexerBase.java:77) at com.adobe.granite.indexing.tool.ReindexCmd.index(ReindexCmd.java:244) at com.adobe.granite.indexing.tool.ReindexCmd.run(ReindexCmd.java:141) at com.adobe.granite.indexing.tool.Main.execute(Main.java:174) at com.adobe.granite.indexing.tool.Main.main(Main.java:77)

This is because index definition for this index doesn't contain jcr:data property, however for tika/config.xml it's present.

{ "analyzers":{ "jcr:primaryType":"nam:nt:unstructured", "default":{ "jcr:primaryType":"nam:nt:unstructured", "luceneMatchVersion":"LUCENE_47", "class":"org.apache.lucene.analysis.standard.StandardAnalyzer", "stopwords":{ "jcr:primaryType":"nam:nt:file", "jcr:content":{ "jcr:encoding":"utf-8", "jcr:mimeType":"text/plain", "jcr:primaryType":"nam:nt:resource" } } } } }

 

Could you please help me to fix FileVault representation of index nodes to make it compatible with AEMaaCS pipelines? Locally it's working fine.

This post is no longer active and is closed to new replies. Need help? Start a new post to ask your question.

1 reply

Himanshu_Jain
Community Advisor
Himanshu_JainCommunity AdvisorAccepted solution
Community Advisor
August 29, 2024
konstantyn_diachenko
Community Advisor
Community Advisor
August 29, 2024

Hi @himanshu_jain, thank you for your answer. I came to this solution as well and it fixed my problem.

The result:

<damAssetLucene-11-custom-2 jcr:primaryType="oak:QueryIndexDefinition" async="[async,nrt]" compatVersion="{Long}2" evaluatePathRestrictions="{Boolean}true" excludedPaths="[/some/path]" includedPaths="[/content/dam]" maxFieldLength="{Long}100000" tags="[visualSimilaritySearch,assetsOmnisearch]" type="lucene"> <aggregates jcr:primaryType="nt:unstructured"> ... </aggregates> <analyzers jcr:primaryType="nt:unstructured"> <default jcr:primaryType="nt:unstructured"> <tokenizer jcr:primaryType="nt:unstructured" name="Standard"/> <filters jcr:primaryType="nt:unstructured"> <LowerCase jcr:primaryType="nt:unstructured"/> <Stop jcr:primaryType="nt:unstructured" words="[stopwords.txt]"> <stopwords.txt jcr:primaryType="nt:file"> <jcr:content jcr:primaryType="nt:unstructured"/> </stopwords.txt> </Stop> </filters> </default> </analyzers> <indexRules jcr:primaryType="nt:unstructured"> ... </indexRules> <suggestion jcr:primaryType="nt:unstructured" suggestAnalyzed="{Boolean}true" suggestUpdateFrequencyMinutes="{Long}5"/> <tika jcr:primaryType="nt:unstructured"> <config.xml jcr:primaryType="nt:file"> <jcr:content jcr:primaryType="nt:unstructured"/> </config.xml> </tika> </damAssetLucene-11-custom-2>

The files structure:

I have the following structure in the project:
_oak_index
|__damAssetLecene-11-custom-2
     |__analyzers
     |    |__default
     |         |__filters
     |              |__Stop

     |                   |__stopwords.txt
     |__tika

          |__config.xml

Kostiantyn Diachenko, Community Advisor, Certified Senior AEM Developer, creator of free AEM VLT Tool, maintainer of AEM Tools plugin.