Expand my Community achievements bar.

SOLVED

AEM sites 6.5 SP3 After upgrade to Windows build 19042.1110 bundles are automatically uninstalled

Avatar

Level 3

Hi,

 

Env: AEM sites 6.5 SP3

OS: Windows build 19042.1110

 

After upgrade to Windows build 19042.1110 bundles are automatically uninstalled. (Our application bundles)

 

Has anyone seen this?

 

Thanks..Joel

1 Accepted Solution

Avatar

Correct answer by
Employee Advisor

Hi @JoelLucuik!

That sounds a bit strange.

If I understand things correctly, you upgraded your operating system and after that you are missing your application bundles in AEM.

Is my understanding correct?

 

With this level of information, there is not much that I can recommend.

I have never seen a similar behavior and to be honest I doubt that there is a direct connection between the two events.

Before the OS upgrade (which often comes with a reboot of the machine) you have probably shut down your AEM instance. In this state, there is no "regular" way to remove OSGI bundles. Bundles could theoretically be removed from the persistence level (= deleted from the hard disk) but this would be reflected in the logs on the next startup of AEM as it would still try to find the bundles (because they have not been properly uninstalled). I don't see a lot of other options what could have happened here as a direct result of the OS upgrade with regards to the AEM instance.

 

As you say the bundles seem to be automatically uninstalled:

  • Are the bundles missing/have disappeared or have they actively been uninstalled/undeployed from the OSGI context?
  • Did you find any traces to this in the according log files? Are there exceptions? Is the OSGI context still trying to load the bundles?
  • Are you sure that everything was correctly working right before the OS upgrade?
  • Did you shut down the instance properly (and waited for the shutdown to complete) before you started the OS upgrade?
  • Have there been any deployments to AEM shortly before or after the OS upgrade?

 

 

Update:

Thanks for providing additional details and sharing some extracts from your log files, @JoelLucuik.

Looking at the log entries, there are a couple of infos on regular scheduled maintenance tasks:

  • RevisionCleanup (daily window)
  • LuceneBinariesCleanup (daily window)
  • DataStoreGarbageCollection (weekly window)
  • WorkflowPurge (weekly window)
  • AuditLogMaintenance (weekly window)

These are regular maintenance jobs as indicated in the documentation. You can check their schedule and status if you navigate to "Tools" -> "Operations" -> "Maintenance":

Screenshot 2021-08-05 at 19.43.53.png

Screenshot 2021-08-05 at 19.44.38.png  Screenshot 2021-08-05 at 19.44.46.png  Screenshot 2021-08-05 at 19.44.57.png

Looking at the sequence of these jobs being logged as status "unscheduled", some ServiceEvent unregistrations followed by an updated status for the jobs as "scheduled" this looks like there has been a change to the scheduler configurations (could also just be a redeployment of the same, already existing configurations under certain circumstances). Looks like there actually is happening something on this instance!

Nothing to worry about so far, but this indicates that there has probably been some kind of adjustment to the instance. This could be a manual action through the UI (would be reflected in the access and request logs, probably also audit log), a package installation or similar.

 

Going further through the logs, there are a couple of health checks running. These are taking certain probes to measure the system health. This is a pure monitoring operation and has no effects whatsoever on you application bundles. Every result that is unequal to 0 shows that the according health check found an error with the aspect that it's monitoring. You can run the health checks manually and see the results from the OSGI Web Console. Please navigate to "Sling" -> "Sling Health Checks".

Screenshot 2021-08-05 at 20.04.07.png

The two occasions with a result of "2" indicate that some maintenance tasks have not yet run or could not finish properly. If you haven't actively configured the maintenance tasks on your instance, this is not unusual as some of the jobs need configuration adjustments to run properly that are not present out-of-the-box (because these are very individual and can not be the same for all deployments). Running the health checks manually as outlined above will give you additional details. But I'm sure that this has nothing to do with your issue of uninstalled bundles.

 

After these health checks, the interesting part of your log file starts:

 

 

03.08.2021 12:25:09.988 *INFO* [OsgiInstallerImpl] org.apache.sling.scripting.javascript.internal.RhinoJavaScriptEngineFactory Activated with optimization level 9
03.08.2021 12:25:09.989 *INFO* [OsgiInstallerImpl] org.apache.sling.scripting.javascript Service [org.apache.sling.scripting.javascript.internal.RhinoJavaScriptEngineFactory,34332, [javax.script.ScriptEngineFactory]] ServiceEvent REGISTERED
03.08.2021 12:25:09.992 *INFO* [OsgiInstallerImpl] ourapp.core BundleEvent UNINSTALLED
03.08.2021 12:25:09.997 *INFO* [OsgiInstallerImpl] org.apache.sling.audit.osgi.installer Uninstalled bundle ourapp.core [597] from resource TaskResource(url=jcrinstall:/apps/ourapp/install/ourapp.core-0.0.46-SNAPSHOT.jar, entity=bundle:ourapp.core, state=UNINSTALL, attributes=[org.apache.sling.installer.api.tasks.ResourceTransformer=:30:22:, Bundle-SymbolicName=ourapp.core, Bundle-Version=0.0.46.SNAPSHOT], digest=1628007435817)
03.08.2021 12:25:09.997 *INFO* [FelixDispatchQueue] org.apache.felix.framework FrameworkEvent PACKAGES REFRESHED
03.08.2021 12:25:10.001 *INFO* [OsgiInstallerImpl] ourapp.msclient Service [xxxxxx.AAAAAA,33363, [xxxxxx.AAAAAA]] ServiceEvent UNREGISTERING
03.08.2021 12:25:10.005 *INFO* [OsgiInstallerImpl] ourapp.msclient Service [xxxxxx.BBBBBB,33364, [xxxxxx.BBBBBB]] ServiceEvent UNREGISTERING
03.08.2021 12:25:10.011 *INFO* [OsgiInstallerImpl] ourapp.msclient Service [xxxxxx.CCCCCC,33372, [xxxxxx.CCCCCC]] ServiceEvent UNREGISTERING
03.08.2021 12:25:10.015 *INFO* [OsgiInstallerImpl] ourapp.msclient Service [xxxxxx.DDDDDD,33373, [xxxxxx.DDDDDD]] ServiceEvent UNREGISTERING
03.08.2021 12:25:10.019 *INFO* [OsgiInstallerImpl] ourapp.msclient Service [xxxxxx.EEEEEE,33397, [xxxxxx.EEEEEE]] ServiceEvent UNREGISTERING
03.08.2021 12:25:10.021 *INFO* [OsgiInstallerImpl] ourapp.msclient Service [xxxxxx.FFFFFF,33362, [xxxxxx.FFFFFF]] ServiceEvent UNREGISTERING
03.08.2021 12:25:10.024 *INFO* [OsgiInstallerImpl] ourapp.msclient BundleEvent STOPPING
03.08.2021 12:25:10.026 *INFO* [OsgiInstallerImpl] ourapp.msclient Service [33361, [org.osgi.service.cm.ManagedService]] ServiceEvent UNREGISTERING
03.08.2021 12:25:10.028 *INFO* [OsgiInstallerImpl] ourapp.msclient BundleEvent STOPPED
03.08.2021 12:25:10.029 *INFO* [OsgiInstallerImpl] ourapp.msclient BundleEvent UNRESOLVED
03.08.2021 12:25:10.030 *INFO* [OsgiInstallerImpl] ourapp.msclient BundleEvent UNINSTALLED
03.08.2021 12:25:10.061 *INFO* [OsgiInstallerImpl] org.apache.sling.audit.osgi.installer Uninstalled bundle ourapp.msclient [598] from resource TaskResource(url=jcrinstall:/apps/ourapp/install/ourapp.msclient-0.0.46-SNAPSHOT.jar, entity=bundle:ourapp.msclient, state=UNINSTALL, attributes=[org.apache.sling.installer.api.tasks.ResourceTransformer=:30:22:, Bundle-SymbolicName=ourapp.msclient, Bundle-Version=0.0.46.SNAPSHOT], digest=1628007435998)

 

 

As you can see, the OsgiInstaller is starting his work here.

Looking at the logs it activates and registers the RhinoJavaScriptEngineFactory and afterwards starts to uninstall your core bundle.

So there is some event that triggers a proper de-installation of the bundle from the OSGI context. Everything that follows is just the execution and finalization of this action.

 

So the big question here is: what is causing the bundle de-installation?

There are a couple of things that could trigger this under normal circumstances:

  • Deinstallation has been triggered manually through the Web Console UI.
  • If the bundle has been deployed as part of a content package, the removal of the according node (/apps/yourapp/install/your-core.jar) from the repository or the deinstallation of the deployment package could trigger this.
  • If the bundle/package has been deployed by dropping it to the /crx-quickstart/install folder on the filesystem, the removal of that file from the filesystem could trigger the deinstallation under certain circumstances.
  • Some (external/custom) deployment automation has triggered the deinstallation.

I assume that nothing of these happened in your case.

But also the log extract is a bit limited. There could be a trigger event even some time before this log extract starts. The re-configuration of the maintenance jobs (where the log extract starts) also indicates that something triggered changes even before the first entry provided. What could that be?

 

Summarizing, I unfortunately still can't explain what happened to your instance.

As already mentioned before: From my experience, I very much doubt that there is a direct connection between the OS update and the bundle de-installations. The two events might have taken place around a certain time frame that may indicate a correlation but I have never seen something similar. Also the delay of 20 minutes sounds quite suspect to me to assume a correlation here. I double checked internally if any other customer has reported something similar and could not find other reports like yours.

On the other hand, there are two main components involved here: the operating system and AEM. (In reality there are probably even much more components, e. g. the JRE, your build system, other components running on your machine, etc.) And I'm not an expert on the Windows side of things. Maybe there are upgrade and event logs of the OS that could be checked.

But again: Personally, I deem the link between the OS upgrade and the bundle uninstallation as extremely unlikely. I'm 95% sure that the root cause is something different.

So while I don't have an explanation or a lot of recommendations, I would try to step back a bit and reevaluate the situation without the assumption that the OS upgrade is a cause. Have there been other changes to the system? Have you changed something to your package definitions and/or build and deployment process?

You mention that this also happened on the machine of one of your AEM architects. So he hopefully will have enough knowledge to drive the analysis of the issue on his instance. That will probably be much more efficient as he has direct access to the instance.

 

With regards to your latest comment:

  • Updating to a later/the latest SP is always a good idea but I would be very surprised if this solved the issue.
  • Switching to a different OS sounds a bit radical based on this single event. Isn't re-deploying your application fixing the issue? Or did I miss something and you are unable to (re-) deploy your application after the upgrade?
    On the other hand: with Unix, you'll definitely have no issues with Windows updates...

 

I'm sorry, that I could not provide a proper explanation.

Wish you good luck by finding the cause of it. If you do, please let us know. I would be very curious to learn about it.

View solution in original post

12 Replies

Avatar

Correct answer by
Employee Advisor

Hi @JoelLucuik!

That sounds a bit strange.

If I understand things correctly, you upgraded your operating system and after that you are missing your application bundles in AEM.

Is my understanding correct?

 

With this level of information, there is not much that I can recommend.

I have never seen a similar behavior and to be honest I doubt that there is a direct connection between the two events.

Before the OS upgrade (which often comes with a reboot of the machine) you have probably shut down your AEM instance. In this state, there is no "regular" way to remove OSGI bundles. Bundles could theoretically be removed from the persistence level (= deleted from the hard disk) but this would be reflected in the logs on the next startup of AEM as it would still try to find the bundles (because they have not been properly uninstalled). I don't see a lot of other options what could have happened here as a direct result of the OS upgrade with regards to the AEM instance.

 

As you say the bundles seem to be automatically uninstalled:

  • Are the bundles missing/have disappeared or have they actively been uninstalled/undeployed from the OSGI context?
  • Did you find any traces to this in the according log files? Are there exceptions? Is the OSGI context still trying to load the bundles?
  • Are you sure that everything was correctly working right before the OS upgrade?
  • Did you shut down the instance properly (and waited for the shutdown to complete) before you started the OS upgrade?
  • Have there been any deployments to AEM shortly before or after the OS upgrade?

 

 

Update:

Thanks for providing additional details and sharing some extracts from your log files, @JoelLucuik.

Looking at the log entries, there are a couple of infos on regular scheduled maintenance tasks:

  • RevisionCleanup (daily window)
  • LuceneBinariesCleanup (daily window)
  • DataStoreGarbageCollection (weekly window)
  • WorkflowPurge (weekly window)
  • AuditLogMaintenance (weekly window)

These are regular maintenance jobs as indicated in the documentation. You can check their schedule and status if you navigate to "Tools" -> "Operations" -> "Maintenance":

Screenshot 2021-08-05 at 19.43.53.png

Screenshot 2021-08-05 at 19.44.38.png  Screenshot 2021-08-05 at 19.44.46.png  Screenshot 2021-08-05 at 19.44.57.png

Looking at the sequence of these jobs being logged as status "unscheduled", some ServiceEvent unregistrations followed by an updated status for the jobs as "scheduled" this looks like there has been a change to the scheduler configurations (could also just be a redeployment of the same, already existing configurations under certain circumstances). Looks like there actually is happening something on this instance!

Nothing to worry about so far, but this indicates that there has probably been some kind of adjustment to the instance. This could be a manual action through the UI (would be reflected in the access and request logs, probably also audit log), a package installation or similar.

 

Going further through the logs, there are a couple of health checks running. These are taking certain probes to measure the system health. This is a pure monitoring operation and has no effects whatsoever on you application bundles. Every result that is unequal to 0 shows that the according health check found an error with the aspect that it's monitoring. You can run the health checks manually and see the results from the OSGI Web Console. Please navigate to "Sling" -> "Sling Health Checks".

Screenshot 2021-08-05 at 20.04.07.png

The two occasions with a result of "2" indicate that some maintenance tasks have not yet run or could not finish properly. If you haven't actively configured the maintenance tasks on your instance, this is not unusual as some of the jobs need configuration adjustments to run properly that are not present out-of-the-box (because these are very individual and can not be the same for all deployments). Running the health checks manually as outlined above will give you additional details. But I'm sure that this has nothing to do with your issue of uninstalled bundles.

 

After these health checks, the interesting part of your log file starts:

 

 

03.08.2021 12:25:09.988 *INFO* [OsgiInstallerImpl] org.apache.sling.scripting.javascript.internal.RhinoJavaScriptEngineFactory Activated with optimization level 9
03.08.2021 12:25:09.989 *INFO* [OsgiInstallerImpl] org.apache.sling.scripting.javascript Service [org.apache.sling.scripting.javascript.internal.RhinoJavaScriptEngineFactory,34332, [javax.script.ScriptEngineFactory]] ServiceEvent REGISTERED
03.08.2021 12:25:09.992 *INFO* [OsgiInstallerImpl] ourapp.core BundleEvent UNINSTALLED
03.08.2021 12:25:09.997 *INFO* [OsgiInstallerImpl] org.apache.sling.audit.osgi.installer Uninstalled bundle ourapp.core [597] from resource TaskResource(url=jcrinstall:/apps/ourapp/install/ourapp.core-0.0.46-SNAPSHOT.jar, entity=bundle:ourapp.core, state=UNINSTALL, attributes=[org.apache.sling.installer.api.tasks.ResourceTransformer=:30:22:, Bundle-SymbolicName=ourapp.core, Bundle-Version=0.0.46.SNAPSHOT], digest=1628007435817)
03.08.2021 12:25:09.997 *INFO* [FelixDispatchQueue] org.apache.felix.framework FrameworkEvent PACKAGES REFRESHED
03.08.2021 12:25:10.001 *INFO* [OsgiInstallerImpl] ourapp.msclient Service [xxxxxx.AAAAAA,33363, [xxxxxx.AAAAAA]] ServiceEvent UNREGISTERING
03.08.2021 12:25:10.005 *INFO* [OsgiInstallerImpl] ourapp.msclient Service [xxxxxx.BBBBBB,33364, [xxxxxx.BBBBBB]] ServiceEvent UNREGISTERING
03.08.2021 12:25:10.011 *INFO* [OsgiInstallerImpl] ourapp.msclient Service [xxxxxx.CCCCCC,33372, [xxxxxx.CCCCCC]] ServiceEvent UNREGISTERING
03.08.2021 12:25:10.015 *INFO* [OsgiInstallerImpl] ourapp.msclient Service [xxxxxx.DDDDDD,33373, [xxxxxx.DDDDDD]] ServiceEvent UNREGISTERING
03.08.2021 12:25:10.019 *INFO* [OsgiInstallerImpl] ourapp.msclient Service [xxxxxx.EEEEEE,33397, [xxxxxx.EEEEEE]] ServiceEvent UNREGISTERING
03.08.2021 12:25:10.021 *INFO* [OsgiInstallerImpl] ourapp.msclient Service [xxxxxx.FFFFFF,33362, [xxxxxx.FFFFFF]] ServiceEvent UNREGISTERING
03.08.2021 12:25:10.024 *INFO* [OsgiInstallerImpl] ourapp.msclient BundleEvent STOPPING
03.08.2021 12:25:10.026 *INFO* [OsgiInstallerImpl] ourapp.msclient Service [33361, [org.osgi.service.cm.ManagedService]] ServiceEvent UNREGISTERING
03.08.2021 12:25:10.028 *INFO* [OsgiInstallerImpl] ourapp.msclient BundleEvent STOPPED
03.08.2021 12:25:10.029 *INFO* [OsgiInstallerImpl] ourapp.msclient BundleEvent UNRESOLVED
03.08.2021 12:25:10.030 *INFO* [OsgiInstallerImpl] ourapp.msclient BundleEvent UNINSTALLED
03.08.2021 12:25:10.061 *INFO* [OsgiInstallerImpl] org.apache.sling.audit.osgi.installer Uninstalled bundle ourapp.msclient [598] from resource TaskResource(url=jcrinstall:/apps/ourapp/install/ourapp.msclient-0.0.46-SNAPSHOT.jar, entity=bundle:ourapp.msclient, state=UNINSTALL, attributes=[org.apache.sling.installer.api.tasks.ResourceTransformer=:30:22:, Bundle-SymbolicName=ourapp.msclient, Bundle-Version=0.0.46.SNAPSHOT], digest=1628007435998)

 

 

As you can see, the OsgiInstaller is starting his work here.

Looking at the logs it activates and registers the RhinoJavaScriptEngineFactory and afterwards starts to uninstall your core bundle.

So there is some event that triggers a proper de-installation of the bundle from the OSGI context. Everything that follows is just the execution and finalization of this action.

 

So the big question here is: what is causing the bundle de-installation?

There are a couple of things that could trigger this under normal circumstances:

  • Deinstallation has been triggered manually through the Web Console UI.
  • If the bundle has been deployed as part of a content package, the removal of the according node (/apps/yourapp/install/your-core.jar) from the repository or the deinstallation of the deployment package could trigger this.
  • If the bundle/package has been deployed by dropping it to the /crx-quickstart/install folder on the filesystem, the removal of that file from the filesystem could trigger the deinstallation under certain circumstances.
  • Some (external/custom) deployment automation has triggered the deinstallation.

I assume that nothing of these happened in your case.

But also the log extract is a bit limited. There could be a trigger event even some time before this log extract starts. The re-configuration of the maintenance jobs (where the log extract starts) also indicates that something triggered changes even before the first entry provided. What could that be?

 

Summarizing, I unfortunately still can't explain what happened to your instance.

As already mentioned before: From my experience, I very much doubt that there is a direct connection between the OS update and the bundle de-installations. The two events might have taken place around a certain time frame that may indicate a correlation but I have never seen something similar. Also the delay of 20 minutes sounds quite suspect to me to assume a correlation here. I double checked internally if any other customer has reported something similar and could not find other reports like yours.

On the other hand, there are two main components involved here: the operating system and AEM. (In reality there are probably even much more components, e. g. the JRE, your build system, other components running on your machine, etc.) And I'm not an expert on the Windows side of things. Maybe there are upgrade and event logs of the OS that could be checked.

But again: Personally, I deem the link between the OS upgrade and the bundle uninstallation as extremely unlikely. I'm 95% sure that the root cause is something different.

So while I don't have an explanation or a lot of recommendations, I would try to step back a bit and reevaluate the situation without the assumption that the OS upgrade is a cause. Have there been other changes to the system? Have you changed something to your package definitions and/or build and deployment process?

You mention that this also happened on the machine of one of your AEM architects. So he hopefully will have enough knowledge to drive the analysis of the issue on his instance. That will probably be much more efficient as he has direct access to the instance.

 

With regards to your latest comment:

  • Updating to a later/the latest SP is always a good idea but I would be very surprised if this solved the issue.
  • Switching to a different OS sounds a bit radical based on this single event. Isn't re-deploying your application fixing the issue? Or did I miss something and you are unable to (re-) deploy your application after the upgrade?
    On the other hand: with Unix, you'll definitely have no issues with Windows updates...

 

I'm sorry, that I could not provide a proper explanation.

Wish you good luck by finding the cause of it. If you do, please let us know. I would be very curious to learn about it.

Avatar

Level 3

I have not upgraded my Windows OS. I don't have the issue. Two of my coworkers have upgraded, and they are seeing the issue. The bundles are removed automatically after about 20 mins. Yeah it's strange..I've never seen this before either. I'll try and find logs. 

Avatar

Employee Advisor
Then please have them check on the above mentioned points.

Avatar

Level 3
Logs are here: https://drive.google.com/file/d/1vYQy7uh9S94dcUf5-xNg20GJl_AiWcGa/view?usp=sharing. Note one of the users is an AEM architect. I will ask about the instance being shut down properly. There are no exceptions. Everything was fine before.

Avatar

Level 3
From one of our users: "We did not shut down our servers improperly. There is no action on our part that makes this happen. "

Avatar

Level 3
Wondering what could be related to Windows 19042.1110? In our logs....line 5 has a WorkflowPurgeTask, Line 8 has HealthCheck. Just grasping at straws here.

Avatar

Employee Advisor
These are just regular maintenance tasks that are run periodically on any instance. Nothing suspicious with these two from my point of view.

Avatar

Level 3

Do you know what status 2 is?

 

03.08.2021 12:25:09.741 *INFO* [Adobe Granite Maintenance Scheduler Update Task] com.adobe.granite.queries.impl.hc.QueryHealthCheckMetrics First execution of gauge [health-com.adobe.granite.maintenance.healthcheck.impl.MaintenanceTaskHealthCheck] returned status [2]

 

My local never returns status 2 (That I can see in my logs)

Avatar

Employee Advisor

You should be able to see the result of that health check on the operations panel in the AEM UI.

But this is only a health check. It’s taking certain probes on your instance and won’t trigger any activity, especially it’s unable to interfere with bundles - not even talking about uninstalling them.

Avatar

Level 3
Operations panel? Can you be more specific? We work mostly in publish. I think we'll need to update to a later SP. Without a fix for this we'll have to move to Unix.

Avatar

Employee Advisor

Hi @JoelLucuik!

I have update my initial posting to go through the additional information that you provided.

Hope that helps!

Avatar

Level 3

Hi Markus.

 

This appears related to our configuration of NPM and webpack. We are using this to deploy changed artifacts to pub. Seems there is a threshold causing it to delete. Will post more when we know more. --Joel