Hi Guys,
We are migrating from CQ5.5 to AEM6.1 and we would like to migrate content version history from crx to oak. So, we used CRX2OAK migration tool to do this. Versions are migrated successfully but once the AEM instance is up. It keeps recreating indexes and run out of disk space. We tried couple of times but seems to be same issue every time.
We tried offline and online compaction but size seems to be same. Can you please guide us what can be tried?
Thanks,Sai
Solved! Go to Solution.
Views
Replies
Total Likes
Considering my previous experiences, there are a few things you could try out at your end and see how much it helps:
a) Ensure that the repository and datastore are on local disk instead of a NAS/SAN. This should increase
performance greatly.
Startup options
--mmap --early-shutdown
Above options
1) mmap - Use memory mapped io by default
2) early-shutdown - Shutdown the CRX post copying and before indexing is done to ensure that critical memory resources are freed up
b) CRX repository.xml was tweaked
1) TarPersistenceMananger change in all 3 places
2) Disabled autoOptimizeAt by setting the value to empty. This would prevent the auto optimizer to trigger during the long run
3) Set indexInMemory to true
4) Changed the bundleCacheSize to 100 MB
c) An effort was made to reduce time spent in getting full text indexing to complete. This involved pre extracting text from binaries based on new feature
https://issues.apache.org/jira/browse/OAK-2892 For more details refer to http://jackrabbit.apache.org/oak/docs/query/lucene.html#Pre-Extracting_Text_from_Binaries
This can help greatly. Steps are mentioned in the documentation.
d) There are also configurations available to fasten this process and I am looking forward to understand that do you have a requirement to enable full text indexing for pdf's
and word docs. If not, we can complete exclude them from indexing.
If your answer is YES and your concern is to reduce the upgrade process time, the config change should be done at the very start of the upgrade process as described below :
1) you can disable indexes with disable-indexes as follows :
java -Xmx4G -jar ~/Downloads/crx2oak-1.3.4-standalone.jar --mmap --copy-orphaned-versions=flase --copy-versions=false --disable-indexes=lucene crx-quickstart/repository newrepo/
2) Once the content migration is done, unpack aem-quickstart-6.1.0.jar as follows.
java -jar ../aem-quickstart-6.1.0.jar -unpack
Then, create an install folder (crx-quickstart/install) and deploy lucene-index-config.zip to the install folder.
The config file is here - https://files.acrobat.com/a/preview/e47bccf6-56c4-41b9-8b79-1d911b5ddb15
It contains the definition of the lucene index with the right tika config.
Please note that if you disable an index during upgrade, it will not appear in the AEM anymore. Therefore the package contains all the OOTB indexes together with the tika configuration.
Hi
Varun has given very exhaustive answer to your problem. But i would like to share few ways(you might have already tried) for repository growing:
1. Manually optimizing tar files using the JMX Console
o Open the CQ Web Console and click the JMX item in the Main menu .
o Click the Repository MBean for the com.adobe.granite domain.
o Click startTarOptimization()
o To stop the optimization process, click stopTarOptimization()
2. Access diskUsage interface to see what is consuming the unnecessary space. This interface available in Reports(link on CRX home page)
Find the resources what can unnecessarily consuming space and can be removed.
3. Run Online/Ofline Tar Compaction in AEM. Link http://www.aemcq5tutorials.com/tutorials/online-offline-tar-compaction-in-aem/
4. The AEM instance will run out of disk space, which will cause outages in production. It is highly recommended that you follow the monitoring best practices as mentioned in Maintenance and Monitoring.
I hope this would help you.
~kautuk
Thanks vmehrotr and kautuksahni . Makes sense. We will try this and update you.
Views
Replies
Total Likes
Views
Replies
Total Likes
Hi,
Almost all of the points are covered by kautuk and Varun. Just want to add 1 more point that. It is not recommended to run online tar compaction as it affects the performance of current running instance and takes very log time upto 24 hours to complete.
Sair,
I think Opkar requires the offline tar scripts that you are using to run compaction. Also could you please share available disk space and heap memory on server on which you are running compaction.
Note:- Dont forgot to change tar log level from Debug to error. Else you log size will increase very rapidly.
As mentioned by @AnkurAhlawat, please share the scriot, java heap size in use, systen disk space.
To find out default Java Heap size,
Windows :- http://stackoverflow.com/questions/19028297/how-to-identify-default-java-heapsize-in-windows
~Kautuk
Hi,
while it is possible to run online compaction it is not recommended. If you do wish to use it, you should contact daycare who will work with you on using and monitoring online compaction, it should not be used without daycares approval for a production instance.
If you are using offline compaction and use the "rm-all" parameter, this will clear out your indexes, which means a re-index after every restart.
Are you sure that you give sufficient tie for re-index to complete? If you restart your server before re-indexing has completed, the system will just attempt to reindex again when the instance is restarted.
Regards,
Opkar
Do you get any exceptions in error.log? Look for the index it's trying to recreate and look for the total number of occurrences. I've gotten into similar issue dealing with an index owned by root. It could just be that.
Hello Guys,
Thanks all for your valuable inputs. We approached through some of your suggestions. But looking close look at the logs Adobe[Day Care Ticket] suggested us to disable all the workflows in the AEM instance before running crx2oak tool. It did work and after the instance is up, re-indexing ran only for 20 mins. We enabled back the workflows after the whole indexing process completed.
Response from Adobe after looking at our logs:
From the error.log, I think you used VLT to ingest assets in AEM which triggered mass workflow instances execution and further indexing. I see 206626 instances which ran once the VLT was initiated and system stopped when workflow instances and its indexing was still on.
Can you retry and refer to [1] to disable the launchers to trigger workflows that cause the system to flood and fail the system.
[1] : http://cq-ops.tumblr.com/post/43179911102/how-to-efficiently-copy-large-amounts-of-content
Thanks,
Sai
Views
Replies
Total Likes
Views
Likes
Replies
Views
Likes
Replies