AEMaaCS upload author index step failing with stopwords file
Hi everyone,
I customized OOTB index /oak:index/damAssetLucene-11 and added support of stopwords there.
<damAssetLucene-11-custom-2
jcr:primaryType="oak:QueryIndexDefinition"
async="[async,nrt]"
compatVersion="{Long}2"
evaluatePathRestrictions="{Boolean}true"
excludedPaths="[/some/path]"
includedPaths="[/content/dam]"
maxFieldLength="{Long}100000"
tags="[visualSimilaritySearch,assetsOmnisearch]"
type="lucene">
<aggregates jcr:primaryType="nt:unstructured">
...
</aggregates>
<analyzers jcr:primaryType="nt:unstructured">
<default
jcr:primaryType="nt:unstructured"
luceneMatchVersion="LUCENE_47"
class="org.apache.lucene.analysis.standard.StandardAnalyzer">
<stopwords jcr:primaryType="nt:file">
<jcr:content jcr:primaryType="nt:unstructured" jcr:mimeType="text/plain"/>
</stopwords>
</default>
</analyzers>
<indexRules jcr:primaryType="nt:unstructured">
...
</indexRules>
<suggestion
jcr:primaryType="nt:unstructured"
suggestAnalyzed="{Boolean}true"
suggestUpdateFrequencyMinutes="{Long}5"/>
<tika jcr:primaryType="nt:unstructured">
<config.xml jcr:primaryType="nt:file">
<jcr:content jcr:primaryType="nt:unstructured"/>
</config.xml>
</tika>
</damAssetLucene-11-custom-2>I have the following structure in the project:
_oak_index
|__damAssetLecene-11-custom-2
|__analyzers
| |__default
| |__stopwords.dir
| | |__.content.xml
| |__stopwords
|__tika
|__config.xml
stopwords.dir/.content.xml
<?xml version="1.0" encoding="UTF-8"?>
<jcr:root xmlns:jcr="http://www.jcp.org/jcr/1.0" xmlns:nt="http://www.jcp.org/jcr/nt/1.0"
jcr:primaryType="nt:file">
<jcr:content
jcr:encoding="utf-8"
jcr:mimeType="text/plain"
jcr:primaryType="nt:resource"/>
</jcr:root>
tika.xml is uploaded without any problem to AEMaaCS instance, but stopwords isn't.
In the Cloud Manager Pipeline logs I see the following error:
23:31:55.820 [main] INFO o.a.j.o.p.i.i.IndexDefinitionUpdater - Adding new index definition at path [/oak:index/damAssetLucene-11-custom-2]
23:31:55.853 [main] INFO o.a.j.oak.index.IndexerSupport - Switched the async lane for indexes at [/oak:index/cqPageLucene-0-custom-2, /oak:index/damAssetLucene-11-custom-2] to offline-reindex-async and marked them for reindex
23:31:55.879 [main] INFO o.a.j.oak.index.LuceneIndexHelper - Setting RAMBufferSize for LuceneIndexWriter (configurable via system property 'oak.index.ramBufferSizeMB') to 32 MB
23:31:55.917 [main] INFO o.a.j.o.p.i.s.s.e.FulltextIndexEditorContext - Stored the cloned index definition for [/oak:index/cqPageLucene-0-custom-2]. Changes in index definition would now only be effective post reindexing
23:31:55.917 [main] INFO o.a.j.o.p.i.s.s.e.FulltextIndexEditorContext - IndexDefinition creation timestamp added for [/oak:index/cqPageLucene-0-custom-2]
23:31:56.078 [main] ERROR c.adobe.granite.indexing.tool.Main - Can't perform operation
java.lang.NullPointerException: Cannot invoke "org.apache.jackrabbit.oak.api.Blob.getNewStream()" because "blob" is null
at org.apache.jackrabbit.oak.plugins.index.lucene.NodeStateAnalyzerFactory.loadStopwordSet(NodeStateAnalyzerFactory.java:247)
at org.apache.jackrabbit.oak.plugins.index.lucene.NodeStateAnalyzerFactory.createAnalyzerViaReflection(NodeStateAnalyzerFactory.java:161)
at org.apache.jackrabbit.oak.plugins.index.lucene.NodeStateAnalyzerFactory.createInstance(NodeStateAnalyzerFactory.java:97)
at org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexDefinition.collectAnalyzers(LuceneIndexDefinition.java:167)
at org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexDefinition.<init>(LuceneIndexDefinition.java:76)
at org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexDefinition$Builder.createInstance(LuceneIndexDefinition.java:102)
at org.apache.jackrabbit.oak.plugins.index.search.IndexDefinition$Builder.build(IndexDefinition.java:410)
at org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexDefinition$Builder.build(LuceneIndexDefinition.java:91)
at org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexDefinition$Builder.build(LuceneIndexDefinition.java:88)
at org.apache.jackrabbit.oak.plugins.index.search.spi.editor.FulltextIndexEditorContext.createIndexDefinition(FulltextIndexEditorContext.java:310)
at org.apache.jackrabbit.oak.plugins.index.search.spi.editor.FulltextIndexEditorContext.<init>(FulltextIndexEditorContext.java:107)
at org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexEditorContext.<init>(LuceneIndexEditorContext.java:48)
at org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexEditorProvider.getIndexEditor(LuceneIndexEditorProvider.java:236)
at org.apache.jackrabbit.oak.plugins.index.CompositeIndexEditorProvider.getIndexEditor(CompositeIndexEditorProvider.java:73)
at org.apache.jackrabbit.oak.plugins.index.IndexUpdate.collectIndexEditors(IndexUpdate.java:322)
at org.apache.jackrabbit.oak.plugins.index.IndexUpdate.enter(IndexUpdate.java:178)
at org.apache.jackrabbit.oak.spi.commit.VisibleEditor.enter(VisibleEditor.java:53)
at org.apache.jackrabbit.oak.spi.commit.EditorDiff.process(EditorDiff.java:48)
at org.apache.jackrabbit.oak.index.OutOfBandIndexerBase.preformIndexUpdate(OutOfBandIndexerBase.java:126)
at org.apache.jackrabbit.oak.index.OutOfBandIndexerBase.reindex(OutOfBandIndexerBase.java:77)
at com.adobe.granite.indexing.tool.ReindexCmd.index(ReindexCmd.java:244)
at com.adobe.granite.indexing.tool.ReindexCmd.run(ReindexCmd.java:141)
at com.adobe.granite.indexing.tool.Main.execute(Main.java:174)
at com.adobe.granite.indexing.tool.Main.main(Main.java:77)This is because index definition for this index doesn't contain jcr:data property, however for tika/config.xml it's present.
{
"analyzers":{
"jcr:primaryType":"nam:nt:unstructured",
"default":{
"jcr:primaryType":"nam:nt:unstructured",
"luceneMatchVersion":"LUCENE_47",
"class":"org.apache.lucene.analysis.standard.StandardAnalyzer",
"stopwords":{
"jcr:primaryType":"nam:nt:file",
"jcr:content":{
"jcr:encoding":"utf-8",
"jcr:mimeType":"text/plain",
"jcr:primaryType":"nam:nt:resource"
}
}
}
}
}
Could you please help me to fix FileVault representation of index nodes to make it compatible with AEMaaCS pipelines? Locally it's working fine.
