Well it's been close to four months since I deployed my first WS3-14-600-DC. Having received no response to my previous post, I am wondering if I'm posting in the correct forum in order to receive support...
In addition to the two minor issues I posted about previously, I am requesting support with regards to an additional (and more serious) issue with the same switch.
To reiterate, I'm running 2.0.8 (I'll try upgrading to the RC shortly). My switch has Board Rev D, Psup firmware 58, and Psup board rev A. Traffic graph still shows intermittent peaks to 20+Gbps with associated CPU spikes to 95+% as per my previous post. The GUI freezes temporarily in tandem with the traffic/CPU spikes. Total power is around 125W (although the bar graph at the top of the main page still shows 32W as per my previous post). VLANs are in use but no STP or LAG. Total traffic around 1.5-2Gbps tx and rx.
The issue is OCP on port 1. I did review the forum posts about this issue and I don't believe that I have the affected board rev, but I could be wrong. I have a mini-6 switch with around 30-35W load that was plugged in to port 1 and worked fine for over 3 months. When the site got rebooted the WS3 switch suddenly started reporting OCP on port 1. I tried everything I could think of including reset OCP... We eventually had to move the mini to a different port to resolve the OCP issue. Working fine on port 3 now. Does this mean there's a physical problem with port 1?
Any support would be greatly appreciated.
Thanks in advance,
Ben
v2.0.8 Bug Reports and Comments - WS3 Firmware
-
Stephen - Employee
- Posts: 1034
- Joined: Sun Dec 24, 2017 8:56 pm
- Has thanked: 86 times
- Been thanked: 182 times
Re: v2.0.8 Bug Reports and Comments - WS3 Firmware
Based on a series of testing done a few months ago on this subject, it is most likely an issue with that port. However, there has been some major firmware fixes on 2.0.9rc2, it should be worth checking that version to see if result's improve.
Re: v2.0.8 Bug Reports and Comments - WS3 Firmware
Ok thanks Stephen I will try it and report back.
Ben
Ben
Re: v2.0.8 Bug Reports and Comments - WS3 Firmware
Hi Stephen, I upgraded to 2.0.9rc2 and there is no change in behavior.
-Still seeing OCP on port 1, even though there's nothing plugged in to port 1
-Total current still reporting ~32W when it should be 150-200W
-Still observing occasional spikes to ~20Gbps in the total traffic graph
-CPU still spiking to >85%
-GUI still not updating correctly
Based on your previous response I presume it's time to RMA this unit?
Thanks,
Ben
-Still seeing OCP on port 1, even though there's nothing plugged in to port 1
-Total current still reporting ~32W when it should be 150-200W
-Still observing occasional spikes to ~20Gbps in the total traffic graph
-CPU still spiking to >85%
-GUI still not updating correctly
Based on your previous response I presume it's time to RMA this unit?
Thanks,
Ben
-
Stephen - Employee
- Posts: 1034
- Joined: Sun Dec 24, 2017 8:56 pm
- Has thanked: 86 times
- Been thanked: 182 times
Re: v2.0.8 Bug Reports and Comments - WS3 Firmware
For the OCP issue on port 1, yes probably the best bet is to RMA the unit so our technicians can take a look at it.
It's likely that the total power displayed is related to the OCP issue due to how the algorithm works (sometimes malfunctioning ports create noise and feed negative numbers into the sum).
However, we may want to take a closer look at some of the other issue's you're mentioning. They might be being affected by this but probably we need to take a closer look just to be sure.
Specifically,
-Still observing occasional spikes to ~20Gbps in the total traffic graph
This might be due to the network/config settings - need a bit more info about your setup to determine what is causing this.
-CPU still spiking to >85%
This can occur by runaway console sessions that happen from either manually remotely logging into a switch (via ssh) or from the Netonix Manager. These runaway instance's are cleaned up by the system, but if they are excessive it's possible for them to accumulate briefly before they are cleaned.
Another possibility is that some of the networking algorithms require CPU intervention, such as LAG's, and RSTP, also certain broadcast's like arp's. So a large flat network will occasionally see spikes as different device's broadcast at regular intervals.
Another big one in this same category is discovery, if you have discovery enabled it is almost entirely run on the CPU, so it can definitely result in this behavior if there are many device's in the broadcast domain.
-GUI still not updating correctly
As you've already noted, the occasional CPU spikes will block the server temporarily while handling a higher priority task.
With all this noted, if you could provide logs and screenshots of the following tabs:
Status, Ports, Device->Configuration and the logs and a bit more information about how your network is setup it might help.
You can also just private message me the switch configuration and logs if you prefer.
It's likely that the total power displayed is related to the OCP issue due to how the algorithm works (sometimes malfunctioning ports create noise and feed negative numbers into the sum).
However, we may want to take a closer look at some of the other issue's you're mentioning. They might be being affected by this but probably we need to take a closer look just to be sure.
Specifically,
-Still observing occasional spikes to ~20Gbps in the total traffic graph
This might be due to the network/config settings - need a bit more info about your setup to determine what is causing this.
-CPU still spiking to >85%
This can occur by runaway console sessions that happen from either manually remotely logging into a switch (via ssh) or from the Netonix Manager. These runaway instance's are cleaned up by the system, but if they are excessive it's possible for them to accumulate briefly before they are cleaned.
Another possibility is that some of the networking algorithms require CPU intervention, such as LAG's, and RSTP, also certain broadcast's like arp's. So a large flat network will occasionally see spikes as different device's broadcast at regular intervals.
Another big one in this same category is discovery, if you have discovery enabled it is almost entirely run on the CPU, so it can definitely result in this behavior if there are many device's in the broadcast domain.
-GUI still not updating correctly
As you've already noted, the occasional CPU spikes will block the server temporarily while handling a higher priority task.
With all this noted, if you could provide logs and screenshots of the following tabs:
Status, Ports, Device->Configuration and the logs and a bit more information about how your network is setup it might help.
You can also just private message me the switch configuration and logs if you prefer.
Re: v2.0.8 Bug Reports and Comments - WS3 Firmware
Stephen,
Thanks so much for taking the time to give me such a detailed reply. I will add some responses here, then I'll send you my config and logs via PM.
Regarding the OCP, I will swap out this switch and RMA it. Thanks for confirming.
Regarding the high traffic spikes, I agree; this is probably configurational or network related. I welcome any assistance you can give me in this regard. I will provide detailed config info etc....
Regarding CPU spikes, thanks for the info. I can confirm that there are no SSH console sessions but I am using the Netonix Manager. This particular switch does not have LAGs configured, and STP is turned off. As for Discovery, I did have "Broadcast on protocols" turned on, but I turned it off and it doesn't seem to have changed anything.
I can tell you that due to network growth this switch is in the one corner of my system that is still a fairly large flat segment (I see ~700 entries in the arp table on the switch) so this is definitely a possible factor. I do plan to break this part of the network up into smaller pieces, as no good can come of large broadcast domains.
I'm very happy to share screenshots, config, logs, etc. via PM.
Thanks again for such a great product!
Ben
Thanks so much for taking the time to give me such a detailed reply. I will add some responses here, then I'll send you my config and logs via PM.
Regarding the OCP, I will swap out this switch and RMA it. Thanks for confirming.
Regarding the high traffic spikes, I agree; this is probably configurational or network related. I welcome any assistance you can give me in this regard. I will provide detailed config info etc....
Regarding CPU spikes, thanks for the info. I can confirm that there are no SSH console sessions but I am using the Netonix Manager. This particular switch does not have LAGs configured, and STP is turned off. As for Discovery, I did have "Broadcast on protocols" turned on, but I turned it off and it doesn't seem to have changed anything.
I can tell you that due to network growth this switch is in the one corner of my system that is still a fairly large flat segment (I see ~700 entries in the arp table on the switch) so this is definitely a possible factor. I do plan to break this part of the network up into smaller pieces, as no good can come of large broadcast domains.
I'm very happy to share screenshots, config, logs, etc. via PM.
Thanks again for such a great product!
Ben
-
Stephen - Employee
- Posts: 1034
- Joined: Sun Dec 24, 2017 8:56 pm
- Has thanked: 86 times
- Been thanked: 182 times
Re: v2.0.8 Bug Reports and Comments - WS3 Firmware
Hello bpeach,
I reviewed the logs and screenshots you sent me.
I didn't see anything in the logs that would suggest the symptoms your device exhibits.
However, I did see that several ports are seeing over 2TB of cumulative data and there appears to be regular spikes of 15GB on the total throughput. The switchcore can handle this without issue but the CPU is just a MIPS32 that is somewhat more limited. So breaking up the broadcast domain is definitely the best bet in this case.
However, one option you can try that potentially could help is to disable pause frames. I believe that is controlled by the CPU so if it is pausing multiple frames due to bandwidth limitations that may reflect in the CPU spikes. Just be careful using this option as it may result in dropped frames - please use your own discretion for what is correct for your network to determine if this is acceptable or not. But it may be a potentially simple fix if it works.
Just in case, the pause frame option is located in the Device->Configuration Tab under the category of "Storm Control".
Thank you for the kind words - I hope you continue to find value in our product.
Stephen
I reviewed the logs and screenshots you sent me.
I didn't see anything in the logs that would suggest the symptoms your device exhibits.
However, I did see that several ports are seeing over 2TB of cumulative data and there appears to be regular spikes of 15GB on the total throughput. The switchcore can handle this without issue but the CPU is just a MIPS32 that is somewhat more limited. So breaking up the broadcast domain is definitely the best bet in this case.
However, one option you can try that potentially could help is to disable pause frames. I believe that is controlled by the CPU so if it is pausing multiple frames due to bandwidth limitations that may reflect in the CPU spikes. Just be careful using this option as it may result in dropped frames - please use your own discretion for what is correct for your network to determine if this is acceptable or not. But it may be a potentially simple fix if it works.
Just in case, the pause frame option is located in the Device->Configuration Tab under the category of "Storm Control".
Thank you for the kind words - I hope you continue to find value in our product.
Stephen
Re: v2.0.8 Bug Reports and Comments - WS3 Firmware
OK thanks Stephen! I really appreciate the advice; I'll give it a try right away.
Sincerely,
Ben
Sincerely,
Ben
- CrackerRiley
- Member
- Posts: 7
- Joined: Fri Apr 08, 2016 4:55 pm
- Has thanked: 0 time
- Been thanked: 1 time
Re: v2.0.8 Bug Reports and Comments - WS3 Firmware
I have a few switches running 2.0.8. I haven't tested the issues extensively so forgive me if any of this has been addressed.
Just wanted to report I also see the traffic spikes/GUI freezes.
Two things concerning wattage:
- Also seeing total wattage reported as low. ~15W total on both the AC switches. ~30W total on the DC switch. Per port the top bar shows 0.0.
- When turning a port off with cable still plugged into it, the port will perpetually show error "POE is disabled, power should be 0. I don't know if this continues after a reboot. Doesn't seem to happen every time, but most of the time, I'd say.
Screenshot showing both my points on this particular switch (WS3-14-600-DC):
Lastly I've had issue with what seems to be DHCP snooping blocking on ALL ports when enabled on a few desired ports. Disabling it on all ports seems to resolve it immediately. I've seen this happen on two separate switches on our network. Again, I haven't tested extensively.
Just wanted to report I also see the traffic spikes/GUI freezes.
Two things concerning wattage:
- Also seeing total wattage reported as low. ~15W total on both the AC switches. ~30W total on the DC switch. Per port the top bar shows 0.0.
- When turning a port off with cable still plugged into it, the port will perpetually show error "POE is disabled, power should be 0. I don't know if this continues after a reboot. Doesn't seem to happen every time, but most of the time, I'd say.
Screenshot showing both my points on this particular switch (WS3-14-600-DC):
Lastly I've had issue with what seems to be DHCP snooping blocking on ALL ports when enabled on a few desired ports. Disabling it on all ports seems to resolve it immediately. I've seen this happen on two separate switches on our network. Again, I haven't tested extensively.
Who is online
Users browsing this forum: No registered users and 19 guests