we have an issue with our on-premise multi-instance solution (multiple servers -> one for MTA, one for Message Center, etc...) and one big mounted partition for storing logs from mentioned servers. In this design, processes (web.log, webmdl.log, watchdog.log, trackinglogd.log) share the same log file on the mentioned partition.
Recently we have experienced an issue with login via console and restart of processes did not help because syslogd and web@default are stucking (if we force kill, they are crashing when we try to start them again...). Therefore, we changed path for logs (to the default) and UDP port in customer.sh to the following:
export XTK_VAR_DIR=/usr/local/neolane/nl6/var/ (previous was soft link: /mounted_partition/instance/instance/)
export TRACE_ADDR=localhost:6667 (default one is 6666)
and after this step, we are able to login properly. Changing only path or UDP port does not work, it needs to be done for both options.
Instance worked fine on mounted partition (access via soft link) for two years and now problem came up and we are forced to switch logs locally. We have also tried with direct access to the mounted partition (without soft link) and problem persist. Several tickets have been rejected from our Linux team regarding servers (there are no changes on servers they said).
We are on deprecated build 8981 and problem persist only on two of our environments which are multi-instance and one working properly which is mono-instance.
Reinstallation of this build did not help and Adobe Support and consultants were not experienced this kind of error.
Does anyone have an idea why AC is not able to write logs on mounted linux partition (even worked for two years on that way)?
Are there any logs/dumps available for the crash?
Were there any changes to the filesystem or permissions that may have triggered the change in behavior?
Can the separate instances write to separate files, periodically merged by an offline process? What's the requirement for having them merged?
It seems that there was an issue with mentioned partition. We tried to delete everything and give all rights but after some time we have the same situation.
We have created a new partition and using port 6667 instead 6666 and now it is in the test period but looks good for now.