I searched for the term "contrary" in reference implementation in the search bar and only got a PDF. I was expecting both PDF and Word doc to show as search results as both had the same text and content. I am not sure if ASC supports search based on text/contents within the document. Can someone confirm if full-text content search within docs is supported in ASC?
Solved! Go to Solution.
Views
Replies
Total Likes
HI @digarg17 ,
If the query is working for you on author and not on publisher instance then there might be some issue with indexing.
You can try to follow below steps:
1. Go to query performance console in your author <host>:<port>libs/granite/operations/content/diagnosistools/queryPerformance.html and inside explain query tab you can test the query, look if it finds any indexing.
2. Try to follow the step 1 in your publish instance and compare the result of author.
3. Also run your query in query debugger tool of publisher and check for results
4. Go to the indexed used in both author and publisher and compare the nodes property.
5. If you don't see any differences then manually trigger the indexing in publish.
-Tarun
@h_kataria @TarunKumar @MukeshYadav_ Could you kindly take a look at this question and provide your thoughts? Your insights would be greatly appreciated.
Views
Replies
Total Likes
AEM’s Oak indexing system, using Apache Tika, can extract the text content from Word documents. You need to ensure that this is correctly configured.
/crx/de
)./oak:index/damAssetLucene
(or your custom asset index if you have one).application/vnd.openxmlformats-officedocument.wordprocessingml.document
is being indexed.Ensure the indexing rule includes the content of the Word documents by ensuring properties like jcr:content/metadata
are indexed.
Once DOCX files are being indexed, configure Asset Share Commons to allow searching within DOCX content.
Modify Search Facets in Asset Share Commons:
Example full-text predicate configuration:
{
"predicates": [
{
"type": "fulltext",
"path": "/content/dam",
"relPath": "jcr:content/metadata", // Where the DOCX text content is extracted.
"property": "fulltext",
"operation": "CONTAINS"
}
]
}
Update Oak Index for DOCX Files: If you’re using a custom Oak index (e.g., damAssetLucene
), ensure that the index is configured to include DOCX files. You can modify the indexing rules to ensure it includes full-text fields for DOCX files.
In CRX/DE:
/oak:index/damAssetLucene
.indexRules/dam:Asset/properties
include the relevant fields for DOCX files, like jcr:content/metadata
.Modify Search Bar to Use Full-Text Predicate: Ensure that the search bar or component on your Asset Share Commons page is configured to use the fulltext predicate. This will allow users to enter search terms that will match content inside DOCX files.
A JCR SQL2 query that searches for text within DOCX documents might look like this:
SELECT * FROM [dam:Asset] AS asset
WHERE CONTAINS(asset.[jcr:content/metadata], 'searchTerm')
AND ISDESCENDANTNODE(asset, '/content/dam')
AND asset.[jcr:content/metadata/dc:format] = 'application/vnd.openxmlformats-officedocument.wordprocessingml.document'
This query searches for the searchTerm
within the metadata of DOCX files (dc:format
for DOCX is specified as 'application/vnd.openxmlformats-officedocument.wordprocessingml.document'
) in the /content/dam
folder.
Once the full-text indexing and Asset Share Commons configuration is in place:
Please let me know how it goes.
On author it gives both results (doc and pdf) without doing any changes On new cloud prod instance. Issue is only on publisher with ASC.
HI @digarg17 ,
If the query is working for you on author and not on publisher instance then there might be some issue with indexing.
You can try to follow below steps:
1. Go to query performance console in your author <host>:<port>libs/granite/operations/content/diagnosistools/queryPerformance.html and inside explain query tab you can test the query, look if it finds any indexing.
2. Try to follow the step 1 in your publish instance and compare the result of author.
3. Also run your query in query debugger tool of publisher and check for results
4. Go to the indexed used in both author and publisher and compare the nodes property.
5. If you don't see any differences then manually trigger the indexing in publish.
-Tarun
@digarg17 Did you find the suggestions helpful? Please let us know if you require more information. Otherwise, please mark the answer as correct for posterity. If you've discovered a solution yourself, we would appreciate it if you could share it with the community. Thank you!
Views
Replies
Total Likes
Hi @digarg17
Short Answer yes it is supported in ASC i did a POC for a client some months back.
but you need to search the a complete text which is being mentioned in file for instance lets say "Abhishek" is there in the file it won't result until i search "Abhishek" not a single character should be missing or misplaced.