Expand my Community achievements bar.

SOLVED

How does Indexing happen in CQ 5.6.1

Avatar

Level 9

Hi All,

Details as below :

1] Suppose I upload a DAM asset in CQ with name xyz.jpg

2] The very next moment I can make use of that asset in my page. Go to content finder in my page, search by name xyz.jpg and drag the asset onto the relevant component in my page.

3] How is it that CQ indexes DAM asset so quickly and makes it available for searching. What exactly is the process flow that happens in the background.

4] Can someone please provide a brief description to this and provide few good references to it.

1 Accepted Solution

Avatar

Correct answer by
Employee Advisor

Hi,

The indexing happens as part of the write action to the repository. When you upload large assets, it's likely that it's done asynchronously, for smaller writes it happens synchronously. This is part of the repository implementation and I don't know if there's a good documentation on that.

kind regards,
Jörg

View solution in original post

6 Replies

Avatar

Correct answer by
Employee Advisor

Hi,

The indexing happens as part of the write action to the repository. When you upload large assets, it's likely that it's done asynchronously, for smaller writes it happens synchronously. This is part of the repository implementation and I don't know if there's a good documentation on that.

kind regards,
Jörg

Avatar

Level 9

Hi Jorg,

Thank you for your reply.

I have heard couple of terms related to indexing as below :

- Workspace index

-Repository index

-Version history reindex

I am not getting as to what exactly does this mean. Brief description on this willl be helpful.

Avatar

Employee

Just as an addendum to what Joerg explained, please refer to http://jackrabbit.apache.org/how-jackrabbit-works.html . Look for links to Query Manager in that link. 

Avatar

Employee Advisor

Hi,

Ok, some more details (assuming, that we talk about TarPM here)

  • Each workspace consists of a bunch of tar files, where changes are just appended. So to find the latest entry for any given item, you need to maintain a kind of "HEAD" pointer. These pointers are maintained in the workspace index files. These are the files named "index_0.tar", "index_1.tar" etc just next to the "data*.tar" files. This index is maintained within as part of the transaction.
  • To support JCQ query, a separate Lucene index is maintained (this the index which I referred in my first response). Depending on the change this index is updated either within the transaction (synchronous) or outside the transaction (async).
  • The term "repository index" isn't clearly defined :-) It can refer to both of the 2 indexes I just mentioned.
  • "Version history reindex": There is a dedicated workspace to keep the versions, and as just described, this workspace has its own index.

Is that sufficient?

Please note, that this has completly changed with AEM 6.0 and Oak as repository.

kind regards,
Jörg

Avatar

Level 9

Hi Kalyanar,

Thanks a lot for the reference link you provided.

Avatar

Level 9

Hi Kalyanar/Jorg,

Also, can you please let me know the difference between index folder present in the below two locations 

crx-quickstart/repository/workspaces/crx.default/index/

crx-quickstart/repository/repository/index/