v1.5.23 Bug Reports and Comments

DOWNLOAD THE LATEST FIRMWARE HERE
RTGLW
Member
 
Posts: 20
Joined: Thu Jun 08, 2023 7:25 pm
Location: New Zealand
Has thanked: 20 times
Been thanked: 10 times

Re: v1.5.23 Bug Reports and Comments

Sun Dec 15, 2024 11:06 pm

Hi team,

Upgraded 2 lab switches (a WS-8-150-DC and a WS-26-500-DC) from .22 to .23 - can confirm our RSA ssh-keys are working again now that they are no longer truncated.

Acknowledging that the SNMP data collection hang-up wasn't specifically fixed in this version, but noting for posterity that SNMP collection for fans, voltages, temps, power consumption, and interface networking details were reporting 0 (zero) directly after the FW upgrade.

But I've run into a strange issue; 5 minutes after disabling and re-enabling the SNMP server from the webGUI (which had restored SNMP data collection) I lost connectivity to the switch. According to logs it decided to "revert to a last known good configuration", as if it failed to confirm connectivity after my changes - but I have no idea why it did this. I was in the webGUI after the SNMP server was re-enabled, saw that it had processed that request successfully and had navigated around the GUI for a short while afterwards. Below are the switch logs from 1.5.23 upgrade to connectivity loss:

Code: Select all
Dec 16 08:03:57 sw-rofl-access1 UI[818]: Firmware upgrade by admin (redacted)
Dec 16 08:07:36 sw-rofl-access1 init: starting pid 2897, tty '': '/etc/init.d/rcS K stop'
Dec 16 08:07:36 sw-rofl-access1 sysinit: no process in pidfile '/var/run/switch-g' found; none killed
Dec 16 08:07:36 sw-rofl-access1 sysinit: rm: can't remove '/var/run/switch-g': No such file or directory
Dec 16 08:07:36 sw-rofl-access1 root: stopped ntp daemon
Dec 16 08:07:36 sw-rofl-access1 sysinit: stopped process in pidfile '/var/run/ntp' (pid 913)
Dec 16 08:07:38 sw-rofl-access1 root: removing lan (eth0.100) from firewall zone lan
Dec 16 08:08:44 sw-rofl-access1 kernel: eth0.100: no IPv6 routers present
Dec 16 08:09:33 sw-rofl-access1 root: upgrading certificate
Dec 16 08:09:34 sw-rofl-access1 sysinit: Generating a RSA private key
Dec 16 08:10:03 sw-rofl-access1 sysinit: .................................................................................................................+++++
Dec 16 08:10:42 sw-rofl-access1 sysinit: ..................................................................................................................................................................................................................................
Dec 16 08:10:42 sw-rofl-access1 sysinit: writing new private key to '/etc/config/lighttpd.pem'
Dec 16 08:10:42 sw-rofl-access1 sysinit: -----
Dec 16 08:10:43 sw-rofl-access1 switch[1543]: Detected warm boot
Dec 16 08:10:46 sw-rofl-access1 switch[1541]: temp sensor version 1
Dec 16 08:10:46 sw-rofl-access1 root: stopped ntp daemon
Dec 16 08:10:46 sw-rofl-access1 root: started ntp daemon
Dec 16 08:10:47 sw-rofl-access1 root: sync time via ntp
Dec 16 14:42:57 sw-rofl-access1 UI[1514]: Configuration changed by admin (redacted)
Dec 16 14:42:57 sw-rofl-access1 UI[1514]: Config_Version: 31 => 32
Dec 16 14:42:57 sw-rofl-access1 UI[1514]: SNMP_Server_Enable: true => false
Dec 16 14:43:02 sw-rofl-access1 UI[1514]: Configuration changed by admin (redacted)
Dec 16 14:43:02 sw-rofl-access1 UI[1514]: Config_Version: 32 => 33
Dec 16 14:43:02 sw-rofl-access1 UI[1514]: SNMP_Server_Enable: false => true
Dec 16 14:48:06 sw-rofl-access1 root: !Reverting to last known good configuration
Dec 16 14:48:07 sw-rofl-access1 passwd: Password for admin changed by admin


Edit: I originally said I couldn't reach it on it's configured or default IP, but I was doing a dumb. Anyway, I have found the switch has indeed defaulted itself. Any theories as to why it's done this?

I'll be headed to our lab tomorrow to investigate further, but was hoping to get feedback on this in case you need any further information from me before I tried to recover the switch.

oeyre
Member
 
Posts: 24
Joined: Mon Feb 05, 2024 1:38 am
Location: Australia
Has thanked: 0 time
Been thanked: 10 times

Re: v1.5.23 Bug Reports and Comments

Mon Dec 16, 2024 3:07 am

sirhc wrote:Seems strange as some have no issue with PPPoE and a few do.

I'm told there was a recurrence of the PPPoE issue after going to 1.5.23, however I was not able to try the fixes that previously worked for me. Confoundingly, we also have a WS-26-400-AC sitting right next to our affected WS-12-250-AC using the exact same DAC model that was not affected (although no OSPF running through that one).

sirhc wrote:Just told stephen we will have to arrange a time when we all 4 have access to said switch.

Appreciate your dedication, however it will have to wait. We are now in our end of year change freeze and I am not willing to roll the dice on any more stuff like this until after it ends. We have elected to roll back the affected unit to 1.5.16 and just manage the memory leak until we revisit in January. I'm not worried about the security vulnerability as our management is in a VRF that does not have access to the Internet.

It sounds like Stephen is quite deserving of a fat Christmas bonus and some time off to enjoy with his family though? :smile:
Last edited by oeyre on Mon Dec 16, 2024 3:41 am, edited 1 time in total.

User avatar
sirhc
Employee
Employee
 
Posts: 7490
Joined: Tue Apr 08, 2014 3:48 pm
Location: Lancaster, PA
Has thanked: 1643 times
Been thanked: 1336 times

Re: v1.5.23 Bug Reports and Comments

Mon Dec 16, 2024 3:23 am

Stephen is getting ready to release v1.5.24

FIXES: SMTP and something else, sad part we were just talking about it a few hours ago and I can't remember?


Down to:
SNMP (with few people maybe 2) - works for others ?
PPPoE (few people, I think 1 many be 2 people) - works for others ?

VERY CONFUSING THE ABOVE TWO ITEMS WITH MIXED OUTCOMES????
Support is handled on the Forums not in Emails and PMs.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.

User avatar
yahel
Member
 
Posts: 78
Joined: Wed May 27, 2015 12:07 am
Location: Berkeley, CA
Has thanked: 20 times
Been thanked: 17 times

Re: v1.5.23 Bug Reports and Comments

Mon Dec 16, 2024 1:14 pm

Thanks Netonix team - and sorry it took us a while to report back (only seeing the issue on a production site).

I'm sorry to report that the OSPF problem is still happening - no change.
This is probably the Multicast issue (but I'm just assuming - didn't debug further).
In order to help understand things, we have simplified the config a bit:
For the SFP port that exhibits the issue (P14) we removed it from being a LAG member (already did that last week) - no change.
We stopped using this port as a trunk, so no more T in any of the VLANs for this port (only a single U).
Apart from the OSPF peering session, it seems like this port is working - we don't have real traffic on it (because OSPF peering isn't up), but things like ping and ssh are working.
I suspect it's only the Multicast traffic (but I have not debugged further).

I'm not sure if we tried to reboot another time after the upgrade (probably not - need to confirm with Vivek - we do such work between 2am and 5am).

In the same switch, the other SFP port (P13) is working as expected (Trunk and LAG member - OSPF works fine over it).
We did had, at some point with 1.5.22, the same issue with OSPF also over P13 -- last week -- but some configuration changes fixed it (not sure what -- probably something to do with changes to VLANs - their order or something).
FWIW - the SFP module in P14 is different from the SFP module in P13 --- if there's interest by Netonix team, I could try to swap them, but not before tomorrow, and see if the problem shifts with the module. Let us know (it's quite a bit of work, and may result in short downtime - so I prefer to avoid this if not very helpful).

Thanks!

Yahel.

User avatar
Stephen
Employee
Employee
 
Posts: 1061
Joined: Sun Dec 24, 2017 8:56 pm
Has thanked: 90 times
Been thanked: 192 times

Re: v1.5.23 Bug Reports and Comments

Mon Dec 16, 2024 2:11 pm

yahel wrote:Thanks Netonix team - and sorry it took us a while to report back (only seeing the issue on a production site).

I'm sorry to report that the OSPF problem is still happening - no change.
This is probably the Multicast issue (but I'm just assuming - didn't debug further).
In order to help understand things, we have simplified the config a bit:
For the SFP port that exhibits the issue (P14) we removed it from being a LAG member (already did that last week) - no change.
We stopped using this port as a trunk, so no more T in any of the VLANs for this port (only a single U).
Apart from the OSPF peering session, it seems like this port is working - we don't have real traffic on it (because OSPF peering isn't up), but things like ping and ssh are working.
I suspect it's only the Multicast traffic (but I have not debugged further).

I'm not sure if we tried to reboot another time after the upgrade (probably not - need to confirm with Vivek - we do such work between 2am and 5am).

In the same switch, the other SFP port (P13) is working as expected (Trunk and LAG member - OSPF works fine over it).
We did had, at some point with 1.5.22, the same issue with OSPF also over P13 -- last week -- but some configuration changes fixed it (not sure what -- probably something to do with changes to VLANs - their order or something).
FWIW - the SFP module in P14 is different from the SFP module in P13 --- if there's interest by Netonix team, I could try to swap them, but not before tomorrow, and see if the problem shifts with the module. Let us know (it's quite a bit of work, and may result in short downtime - so I prefer to avoid this if not very helpful).

Thanks!

Yahel.


Hi yahel, there is one more test that might be worth trying, I had asked someone else to do it but they never got back to me.
With the switch that exhibits these problems live and settled (as in, no config changes in progress and running for at least an hour) please run the following command from the linux shell
Code: Select all
/etc/init.d/vtss_appl restart


Please be aware, this will reconfigure the switchcore, meaning that services like STP and LACP will reset. This will potentially cause dropped frames for a couple minutes while everything settles again. I wouldn't do this until later hours when things are slower and a few minutes of downtime is acceptable.

However, this potentially might kick the SFP into working correctly. If it does it tells us something about the bug that I'm looking for. If it doesn't it also tells us something, mostly that I need to look elsewhere. Regardless, it could be enlightening.

clearwaterflys
Member
 
Posts: 1
Joined: Mon Dec 16, 2024 11:34 pm
Has thanked: 0 time
Been thanked: 0 time

Re: v1.5.23 Bug Reports and Comments

Mon Dec 16, 2024 11:51 pm

I can confirm that once upgrading to 1.5.23, SMTP emails stopped working from WS-8-150-DC. Hoping that can get fixed soon. Thank you!

User avatar
Stephen
Employee
Employee
 
Posts: 1061
Joined: Sun Dec 24, 2017 8:56 pm
Has thanked: 90 times
Been thanked: 192 times

Re: v1.5.23 Bug Reports and Comments

Tue Dec 17, 2024 12:03 am

clearwaterflys wrote:I can confirm that once upgrading to 1.5.23, SMTP emails stopped working from WS-8-150-DC. Hoping that can get fixed soon. Thank you!


Hi clearwaterflys, please try putting just the domain in the FromAddress. This should work for now but it is scheduled to get fixed in the upcoming release.

User avatar
sakita
Experienced Member
 
Posts: 211
Joined: Mon Aug 17, 2015 2:44 pm
Location: Arizona, USA
Has thanked: 96 times
Been thanked: 83 times

Re: v1.5.23 Bug Reports and Comments

Tue Dec 17, 2024 11:04 am

NTP disable / save then NTP enable / save triggers the applying configuration prompt to stay up for the full timeout. This results in a "Reverting to last known good configuration" log entry. It does set the time but it also leaves NTP disabled.

I'm going to do some more experimenting with it to see if I can provide more useful information.
Today is an average day: Worse than yesterday, but better than tomorrow.

User avatar
yahel
Member
 
Posts: 78
Joined: Wed May 27, 2015 12:07 am
Location: Berkeley, CA
Has thanked: 20 times
Been thanked: 17 times

Re: v1.5.23 Bug Reports and Comments

Tue Dec 17, 2024 11:22 am

Stephen,

/etc/init.d/vtss_appl restart

Sadly, didn't change a thing.
Reboot didn't change things either - still no OSPF via P13 (Multicast problem).

Yahel.

User avatar
sakita
Experienced Member
 
Posts: 211
Joined: Mon Aug 17, 2015 2:44 pm
Location: Arizona, USA
Has thanked: 96 times
Been thanked: 83 times

Re: v1.5.23 Bug Reports and Comments

Tue Dec 17, 2024 12:55 pm

Some more NTP testing on a WS-8-150-AC with 1.5.23...

Enabling NTP and hitting save/apply: if the NTP server time is +1 hour or more off from the Netonix switch clock the revert timer will start. Hitting refresh and logging back in the configuration is reverted, the clock is set, but the NTP setting is reverted to disabled. At that point, re-enabling NTP and hitting save/apply again will update the configuration to have NTP enabled.

If the NTP server time is behind the Netonix switch clock it works fine (have tried time offsets from minutes to days off). The switch's clock doesn't seem to mind going backwards but going forward can get interesting.

If the NTP server time is close to the Netonix switch clock I can disable/apply enable/apply repeatedly without issue.

If NTP is enabled in the configuration, it works like it should on cold start and warm start if the NTP server is online and available to the Netonix switch.
Today is an average day: Worse than yesterday, but better than tomorrow.

PreviousNext
Return to Hardware and software issues

Who is online

Users browsing this forum: Google [Bot] and 11 guests