Fri Apr 22, 2016 5:46 am
We have scheduled a batch upgrade for tonight that failed.
Here are the logs :
22/4/2016 11:11:21 Version 1.0.0rc5 starting up
22/4/2016 11:11:22 Starting scheduled upgrade
22/4/2016 11:11:22 Not upgrading 172.17.0.17, device is not available
22/4/2016 11:11:22 Not upgrading 172.17.0.100, device is not available
22/4/2016 11:11:22 Not upgrading 172.17.0.34, device is not available
22/4/2016 11:11:22 Not upgrading 172.17.0.60, device is not available
22/4/2016 11:11:22 Not upgrading 172.17.0.20, device is not available
22/4/2016 11:11:22 Not upgrading 172.17.0.80, device is not available
22/4/2016 11:11:22 Not upgrading 172.17.0.32, device is not available
22/4/2016 11:11:22 Not upgrading 172.17.0.120, device is not available
22/4/2016 11:11:22 Not upgrading 172.17.0.150, device is not available
22/4/2016 11:11:22 Not upgrading 172.17.0.160, device is not available
22/4/2016 11:11:22 Not upgrading 172.17.0.10, device is not available
22/4/2016 11:11:22 Not upgrading 172.17.0.220, device is not available
22/4/2016 11:11:22 Not upgrading 172.18.190.254, device is not available
22/4/2016 11:11:22 Scheduled upgrade complete, 0 of 13 devices successfully upgraded
I'm in France and the network in California so the time is actually 02:11:22 AM :)
So, the first thing is that the app had crashed (the node app.js process was dead) when the upgrade was originally scheduled (01:35 AM). And the upgrade started as soon as I restarted the app. That could be an issue if the process is restarted in the middle of the day and it triggers some downtime for the users.
As for the reason of the crash, it may be because I have finally installed it on a CentOS server. I did not manage (not much time to spend on that) to turn it into a nice systemd service so I've just edited the init script you provided to run the start.sh and stop.sh scripts. And it happened a few times that the app.js process died (not the bootstrap.js though).
The server runs node 4.4.3
Do you have an idea why it would crash? Turning it into a systemd service would make sure that both processes are restarted on a crash, but the crashes would happen anyway.
We had another minor issue. After the above crash and recovery, one of the switch was always marked as down, even though all the others were ok, and the switch itself was working fine. I tried to remove it from the list and add it back and it finally showed up. This one is a WS-6-MINI.
One minor improvement would also be to display the system time on the main screen to help scheduling the upgrades.