Frequent wireless disconnections occurring since upgrade to 10.21.01.0065

  • 0
  • 1
  • Problem
  • Updated 2 years ago
  • Solved
I apologize for the redundant question. It seems like "clients disconnecting" is a common topic. But I am starting a new one, because it seems that every situation is different.

I have a C5210 and a bunch of 3825i AP's all over my hospital. Just before Christmas I upgraded to the latest firmware, 10.21.01.0065. Since then I am getting complaints from users (across the board) that they are frequently getting dropped from WiFi. After it happens they are easily able to reconnect and all is well. But it's driving them mad as they will be in mid-sentence documenting in their various applications and then it sort of freezes up and KAPOW! Connection lost!

There is nothing newer to upgrade to. And I suppose I could downgrade to see if the problems goes away. But I would rather seek out the source of the problem and fix it. Especially if it's just something wacky in my configuration.

I plan on putting in a call and opening a ticket on Monday morning with support. But in the meantime - does anyone have any ideas for me? I use Netsight and Purview, and I have all of these logs and diagnostic data at my disposal, but I am not really sure where to start. I suppose the first question is - does anyone else have this issue with this firmware?
Photo of Steve Ballantyne

Steve Ballantyne

  • 5,806 Points 5k badge 2x thumb

Posted 2 years ago

  • 0
  • 1
Photo of Evan Kuckelheim

Evan Kuckelheim

  • 678 Points 500 badge 2x thumb
Check ATPC and DCS settings. Disable ATPC and change DCS to monitor mode to test. 
Post screenshot of radio settings and advanced radio settings. 
Photo of Steve Ballantyne

Steve Ballantyne

  • 5,806 Points 5k badge 2x thumb
Hello Evan, thanks for the help. It appears that my ATPC is already disabled, and DCS is already set to Monitor. 

Here are screenshots of my radio settings, and advanced radio settings.



Photo of JP

JP

  • 1,004 Points 1k badge 2x thumb
Just curious, what firmware were you on before 10.21.01.0065 ?
(Edited)
Photo of Steve Ballantyne

Steve Ballantyne

  • 5,806 Points 5k badge 2x thumb
Hello JP, I was on 10.11.02.0032.
Photo of Ronald Dvorak

Ronald Dvorak, Embassador

  • 51,186 Points 50k badge 2x thumb
Hey Steve,

here what I currently use as standard setting...

- I set the basic rate to 24Mb if the AP coverage allows it = dense deployment
- radio#2 g/n, protection mode none for 11g = I don't support any 802.11b clients
- disabled on the WLAN service = 802.11k, FT, MFP

Could you tell whether the issues is on 2.4Ghz or 5GHz ?



Photo of Steve Ballantyne

Steve Ballantyne

  • 5,806 Points 5k badge 2x thumb
Hey Ron, I can try these settings in an secluded area. We have a building that is off on it's own and is having a lot of these problems. I'm sure they won't mind if I tell them I am trying some different settings with them. I do have a very dense deployment of AP's, so I should be able to use these same settings as you.

I am not sure if the clients are 2.4Ghz or 5Ghz. They have all long gone home now and I am not sure how to view the history. I am betting that they are using these Fujitsu Lifebook tablets which are VERY old and would not support 5Ghz.
Photo of Steve Ballantyne

Steve Ballantyne

  • 5,806 Points 5k badge 2x thumb
From what I am hearing - the laptops are only about a year old (so they did get replaced at some point). They should at least be on the 'n' band. I have applied all of Ron's suggested settings to the AP's in this building and I am soliciting feedback from the users.

It sucks that all of these Advanced settings cannot be made upon multiple AP's at once using the admin pages. That took me a while!

*EDIT* Ron - do you think any of these settings were changed as a result of the upgrade?
(Edited)
Photo of JP

JP

  • 1,004 Points 1k badge 2x thumb
Unless its different in v10, you can make the changes on multiple AP's through:
AP-Bulk Config- AP multi-edit. Select multiple AP's and then change the settings on the right.  If they are greyed out, you have to start at the top and work your way down until they become active.  


***Edit*** -  It looks like Ron posted the same thing about making changes to multiple AP's and it does looks like its different in v10.  
(Edited)
Photo of Ronald Dvorak

Ronald Dvorak, Embassador

  • 51,186 Points 50k badge 2x thumb
You'd use multi-edit...

Set the checkmark on the APs (only APs from the same series i.e. 38xx in my example) and click on the Action button and choose multi-edit....

Photo of Steve Ballantyne

Steve Ballantyne

  • 5,806 Points 5k badge 2x thumb
I see it now. I feel stupid. I was looking under the Radio 1 and Radio 2 actions drop-downs. Well, nice to know where to find that because I might be applying those settings to another 70 access points!
Photo of Ronald Dvorak

Ronald Dvorak, Embassador

  • 51,186 Points 50k badge 2x thumb
no need to feel that way - the first time I was .. what the heck they removed multi-edit, should I change the config of 1000 APs one by one :-)

took me a while to find the option
Photo of FES

FES

  • 1,360 Points 1k badge 2x thumb
other improvements... (if somebody dont think like me let me know ;D)

set DTIM from 5 to 3.
enable MSDUs and MPDUs
enable LDPC
set power in radio1 to 10dbm or less if you have a lot of aps
set power in radio2 to 6 dbm or less if you have a lot of aps
set channel plan to 3 channels. if you set it in auto the channel plan sets in 4 channel. (more cochannel interferences).
set protection method CTS and RTS.
Enable all DFS channels in radio a ( not use 144 channel).
Use 20Mhz in radio a not 40 to avoid cochannel interferences (also in radio a with high density scenario)

you can see the AP report to know the APs that have high use of air. Above 50% are a lot of collisions that is no so good.
In order to fix aps channel you have to enable active mode in aps for 2 hours and after this change to monitor mode (do this during the night for example).

more ideas???
sorry for my english
(Edited)
Photo of Steve Ballantyne

Steve Ballantyne

  • 5,774 Points 5k badge 2x thumb
First off - thanks for all of this advice!

I have no idea what any of these settings are, so I trust you know a lot more than me when it comes to this stuff! Since I had to look them up, I figured I would paste it here from the documentation for the sake of discussion.

DTIM
Type the desired DTIM (Delivery Traffic Indication Message) period — the number of beacon intervals between two DTIM beacons. To ensure the best client power savings, use a large number. Use a small number to minimize broadcast and multicast delay. The default value is 5.

Aggregate MSDUs
Determines MAC Service Data Unit (MSDU) aggregation. Enable to increase the maximum frame transmission size.

Aggregate MPDUs
Determines MAC Protocol Data Unit (MPDU) aggregation. Enable to increase the maximum frame transmission size.

LDPC
Increases the reliability of the transmission resulting in a 2dB increased performance compared to traditional 11n coding.

Monitor Mode
An alarm is triggered and an information log is generated.
Active Mode
An alarm is triggered, an information log is generated, the AP stops operating on the current channel, and ACS automatically selects an alternate channel for the AP to operate on.

Protection Types
CTS
CTS (Clear to Send) Only.
RTS (Request to Send) and CTS
Recommended when a 40 MHz or 80 MHz channel is used. This protects high throughput transmissions on extension channels from interference from non-11n APs and clients.

 Question - when you said to check the high use of air, which report was that? When I run the Wireless Statistics by Wireless APs I am seeing a lot of failures. Here is a screenshot ...


Photo of Joshua Puusep

Joshua Puusep

  • 2,274 Points 2k badge 2x thumb
FES is referring to the Chnl Utilization lines.  You can also see this under the "AP Performance by Radio" Report.  Of all the settings mentioned, the only one that has previously caused client disconnects for us is having DCS set to active, which was on a 9.x release and you have already stated is not the case.  The other settings mentioned could improve performance, but i don't believe they should cause client disconnects unless there is a related bug.

Since you are already planning on opening a ticket with GTAC, i would ask them for recommended radio settings prior to making a slew of changes at once.  Their recommendations do change from time to time.

Have you replicated the issue with various types of devices and or confirmed the user devices have the latest available WAN drivers?
Photo of Jeremy

Jeremy, Embassador

  • 9,788 Points 5k badge 2x thumb
I am too having some issues with this also.  Please post any updates you find here, as this will be very useful to many people!  My radio settings are almost the same as Ronald Ds.  Using 12 and 24 for the MBR.. 12 in less dense and 24 in most dense areas.  

One thing I have noticed, sometimes band-steering tries to force users who don't have an 5 Ghz to 5 Ghz by constantly rejecting their association requests.  I have seen this in the new version of code, especially with XBOX 360 consoles.  I won't even see the MAC in NetSight until I turn off BS... If you do a tail -f /tmp/log/ap.log  | grep -i "mac address of client" you can see the logs for band-steering trying to force a non capable 2.4 client to 5 Ghz
Photo of Steve Ballantyne

Steve Ballantyne

  • 5,806 Points 5k badge 2x thumb
Wow, okay. Thanks for driving me to those options. I do not have anything configured there yet.
Photo of Jeremy

Jeremy, Embassador

  • 9,788 Points 5k badge 2x thumb
Since many clients won't connect to 5 Ghz, band steering tries to force 5 Ghz capable clients to the 5 Ghz radio.. Freeing up resources for those who only have a 2.4 Ghz radio.  Works well in areas of good 5 Ghz coverage.  If 5 Ghz is sparse, I would hold off.
Photo of Laura

Laura

  • 1,490 Points 1k badge 2x thumb
How do I view the ap.log file?  I tried to ssh into ap with admin/new2day, but couldn't get it to work.
Photo of Ronald Dvorak

Ronald Dvorak, Embassador

  • 50,700 Points 50k badge 2x thumb
ssh into the AP

#change directory
cd /tmp/log

# to show the complete content of the file - increase the window size as you'll get a
# lot of messages, or log it to a file with your ssh app 
cat ap.log

or

#to get the last messages "live" as they get written into the file by the AP
tail -f ap.log


The ssh timeout is per default very short so it's a good idea to increase it so you don't get logged out all the time...

cset sshtimeout 99999
capply
csave
Photo of Steve Ballantyne

Steve Ballantyne

  • 5,806 Points 5k badge 2x thumb
Laura, if the new2day password didn't work, it's because the AP is successfully pulling a config from the controller, and your controller is dictating the password.

You can set/change that password in the web GUI under AP tab > Global Settings (on the left) > Registration. Check the "SSH Access" password. Mine appears to be blank --- which is odd, because I know I have set a password and use it all the time.
Photo of Steve Ballantyne

Steve Ballantyne

  • 5,806 Points 5k badge 2x thumb
Also - for those following this issue, I finally got around to opening a case. That case number is: 01278662. I will let you all know what we figure out.
Photo of Ronald Dvorak

Ronald Dvorak, Embassador

  • 50,320 Points 50k badge 2x thumb
notepad row# 7174 - what is the reason code - it's not on the screenshot
Photo of Ronald Dvorak

Ronald Dvorak, Embassador

  • 50,320 Points 50k badge 2x thumb
LOL - fail, I'm not used to my new 34" widescreen monitor, just had to scroll to the right to see the reason code
Photo of Joshua Puusep

Joshua Puusep

  • 2,274 Points 2k badge 2x thumb
Can we trade monitors ? :-P
Photo of Steve Ballantyne

Steve Ballantyne

  • 5,806 Points 5k badge 2x thumb
Hey Ron, sadly "Station event collection on controller is disabled". I just turned it on and set it to "Major" events. Do you think that is good enough for troubleshooting or should I go toward "Informational". I don't want to bog the sucker down.
Photo of Ronald Dvorak

Ronald Dvorak, Embassador

  • 50,320 Points 50k badge 2x thumb
The setting for event level has nothing to do with the station events.

If you go to...

> Logs > EWC: Events = the event level in the log config will change what kind of message level you'd see/stored on the EWC - that is for system/AP events - if you set it to major only major&critical events are shown

> Logs > EWC: Station Events = if it's enabled you'd see the client/MU events here

I set my controller to info/info BUT be aware that in an big deployment that would mean that the log will overwrite older events very fast - could be that you only see the last day/hours.
I allways enable station events on the controller - I didn't read about any negative impact on the system if you do that - also get's overwritten very fast if you've many clients.

The other 2 log settings... send to Netsight / as trap should be used carefully as that could result in a flood of messages to Netsight or a 3rd party NMS.
Photo of Joshua Puusep

Joshua Puusep

  • 2,394 Points 2k badge 2x thumb
according to https://supportforums.cisco.com/document/141136/80211-association-status-80211-deauth-reason-codes 

deauth reason=2 is "Previous authentication no longer valid"

Could you tell us more about your authentication setup?  Are you utilizing Extreme access control?
Photo of Steve Ballantyne

Steve Ballantyne

  • 5,806 Points 5k badge 2x thumb
Nope. Nothing that fancy - yet. So the only authentication I can think of would be the PSK-TKIP passphrase. And why would that no longer be valid?

Perhaps there is some sort of session expiration value that I need to address?
Photo of Steve Ballantyne

Steve Ballantyne

  • 5,806 Points 5k badge 2x thumb
Oh boy. I might be eating my hat on this one as it may have nothing to do with Extreme and everything to do with Windows 10, a horrible update, and outdated drivers from the manufacturer.

It appears that these laptops (all of them that were reported to me) were taken out of the box and carried into production. Windows Update was on, and was left on. So it has been taking the updates and installing them on it's own (including that dreaded anniversary update). However - nobody ever updated the drivers.

The laptops use Intel Proset wireless, and the drivers were dated February of 2015. Ouch. Also the "allow this device to sleep to save power" appears to be enabled. 

So perhaps I have been fighting this from the wrong end!  I started doing a little more digging when I went into the event logs of one of the laptops and found these events ... the most telling of which say "WLAN AutoConfig detected limit connectivity, performing Reset/Recover.adapter".


Level Date and Time Source Event ID Task Category
Error 01/27/2017 10:06:04 AM Microsoft-Windows-WLAN-AutoConfig 4003 None "WLAN AutoConfig detected limit connectivity, performing Reset/Recover.adapter.

 Code: 8 0x0 0x0
"
Information 01/27/2017 10:06:03 AM NETwNb64 7036 None The \Device\NDMP71 service entered the Intel(R) Dual Band Wireless-N 7260 state.
Information 01/27/2017 10:06:01 AM NETwNb64 8000 None "The description for Event ID 8000 from source NETwNb64 cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer.

If the event originated on another computer, the display information had to be saved with the event.

The following information was included with the event: 

\Device\NDMP65
Intel(R) Dual Band Wireless-N 7260

The specified resource type cannot be found in the image file
"
Error 01/27/2017 10:05:59 AM Microsoft-Windows-WLAN-AutoConfig 4003 None "WLAN AutoConfig detected limit connectivity, performing Reset/Recover.adapter.

 Code: 2 0xDEADDEED 0xEEEC
"
Error 01/27/2017 10:05:59 AM Microsoft-Windows-WLAN-AutoConfig 4003 None "WLAN AutoConfig detected limit connectivity, performing Reset/Recover.adapter.

 Code: 1 0xC 0x4
"
Photo of Ronald Dvorak

Ronald Dvorak, Embassador

  • 50,700 Points 50k badge 2x thumb
Could you share in which section of the event viewer I'd find such information - I'm too lazy to search for it :-)
Photo of Steve Ballantyne

Steve Ballantyne

  • 5,806 Points 5k badge 2x thumb
Hello Ron, I found it rolling through the Application log (Event Viewer > Windows Logs > Application) filtering on types: Error, Critical, and Warning.
Photo of Jeremy

Jeremy, Embassador

  • 9,788 Points 5k badge 2x thumb
I don't think this would be the cause of your problem.  Just looks like the driver is trying to recover from a lack of connectivity. 
Photo of Steve Ballantyne

Steve Ballantyne

  • 5,806 Points 5k badge 2x thumb
Yeah, but I had another device doing this under excellent coverage and I think it was a roaming issue. When I used time-lapse under Netsight, I could see that the laptop was being rolled (on a cart) past two other access points before disconnecting and then trying to reconnect. When I updated the drivers (another from something like 2015) it started roaming properly.

Before I updated the drivers on that laptop which I think was a Samsung, I took screenshots of ALL the wireless settings. I wanted to go back and compare them after updating the drivers to see if the manufacturer tweaked anything. But I have yet to get my hands back on it.
Photo of Drew C.

Drew C., Community Manager

  • 40,690 Points 20k badge 2x thumb
Sounds like I can mark this one "Solved," right Steve?  :)
Photo of Steve Ballantyne

Steve Ballantyne

  • 5,806 Points 5k badge 2x thumb
Not just yet! Let me see what happens after I update the client drivers later today.
Photo of Steve Ballantyne

Steve Ballantyne

  • 5,806 Points 5k badge 2x thumb
I will be updating the wireless drivers today on the affected devices. But in the meantime, here is some more information.

I had a user who got dropped this morning at approximately 9:15AM EST. I think she may have been disconnected several minutes earlier, and didn't realize there was an issue until 9:15AM EST when the device was rejoining.

Here is a snapshot of the client events ....



And here are some log snippets.









What do you all think?
(Edited)
Photo of Steve Ballantyne

Steve Ballantyne

  • 5,806 Points 5k badge 2x thumb
I am awaiting feedback from the users - but so far, I am hearing nothing. And no news is usually good news. I have been checking the event logs on these laptops and I am not seeing any more of those strange dropping messages. I checked with a few users yesterday and none of them were having any problems. So I am holding my case open for a few days - but with any luck, this was just a case of a very old and broken wireless driver being used on 20 laptops.
Photo of Steve Ballantyne

Steve Ballantyne

  • 5,806 Points 5k badge 2x thumb
Another update on this issue - the users continue to get disconnected. My support guy at Extreme has been feeding me with ideas on adjustments and some ways to get better logging. Still, I can't ignore the fact that there are a lot of users of laptops that shipped with Intel Dual-Band wireless NIC's who have this same exact issue ... and some of these date back to 2012.

To try some Intel community advice, this morning I disabled the a/n/ac radio usage on all cards, disabled the "allow this device to sleep" on the drivers power options tab, and also disabled all power saving of the PCI-e slot in the OS's Power Options. Seems kind of lousy to castrate my users to lower speeds. But honestly, they don't need the bandwidth. All they run is a single web-based app, and an in-house instant messenger.

So far - I haven't seen a drop all morning. I am cautiously optimistic as I await user feedback and hawk the event logs!
Photo of Jeremy

Jeremy, Embassador

  • 9,788 Points 5k badge 2x thumb
I did last night.  I see an improvement roaming between 2.4 and 5ghz radios.
Photo of Steve Ballantyne

Steve Ballantyne

  • 5,806 Points 5k badge 2x thumb
Hello Ron, I updated the day it was available for download. It did not help my situation. However - disabling all but b/g on these Intel NIC's has proven successful for the past 9 hours!
Photo of Steve Ballantyne

Steve Ballantyne

  • 5,806 Points 5k badge 2x thumb
I probably spoke too soon. There was a single instance of a dropped user yesterday. This morning and this afternoon there have been several more. On with the troubleshooting!
Photo of Frank

Frank

  • 290 Points 250 badge 2x thumb
We are a school that runs 10.21.01.0065 since release with the same 3825i AP's. 1300 clients with a mixture of iPAD's and HP laptops. We have disabled the 2.4ghz range and only run 5ghz with 40Mhz channels.
No issues or dropouts at all.
If you want a detailed view of our settings let me know if that may help you.
Photo of Steve Ballantyne

Steve Ballantyne

  • 5,806 Points 5k badge 2x thumb
Frank, can you tell me what NIC's are installed to your HP laptops? I don't think there are any issues with Broadcom/Realtek NIC's, only Intel. And all of ours are Intel, sadly.
Photo of Steve Ballantyne

Steve Ballantyne

  • 5,806 Points 5k badge 2x thumb
Extreme Networks is off the hook for this issue.

Yesterday I took one of these laptops and installed Windows 10 from the product disc. I let it install Windows updates. Then I took it home. I installed Chrome. I started a continuous ping. And then I started watching a YouTube video. And BOOM, connection dropped. Same messages in the logs and everything. And this is in my *home*, on my residential grade TP-Link access point!

I have emailed my support guy at EN and let him know that this is officially "not your problem" but invited him to keep the ticket open if he is curious as to what we figure out. But I am not going to waste any more Extreme Networks resources on this one.

Later last night I installed all of the HP drivers (and none of the newer Intel drivers). Then I opened up YouTube and started up music videos and went to bed. Six hours later, it hadn't dropped at all.

So thinking back, I don't know that I ever tried installing the driver from the HP site. What was installed on the laptops when I started was a very old driver that was pre-installed. What I installed later was the very latest from Intel. What I am running with now is a "new-ish" driver, numbered 18.33.3.2 and dated 5/3/2016.

More to come once this has run a few days ...