10.2r4 firmware causing AP650's to lose connectivity?


Userlevel 2

Okay, I don’t know what’s happening here but this isn’t looking pretty. All of our AP650’s are losing connectivity multiple times throughout the day. 

 

Look at this! Some AP’s will only do this once or twice and others are doing it often, like this 🔼 There’s nothing indicating any type of error or issue. We didn’t have this problem on firmware 10.09rb.

I have a case open with GTAC, but are we the only ones seeing this issue?


113 replies

Userlevel 2

 All 145 AP’s are showing data like this. I’m waiting to see what GTAC says today, but I may be rolling back to 10.09rb tonight.

 

 

Userlevel 2

@javabomberman are you using 10.2r4? I know you were using 10.2r3 previously; have you had this issue?

Userlevel 2

I’m seeing the same thing on all our devices; I went to 10.2r4 on our entire fleet after having it at 2 locations without any reported issues.

 

This is a 630, but I am seeing this on any randomly selected 650s and 630s alike. I haven’t heard any issues with anyone’s connectivity the way I did with 10.0r10 though… Is this an actual issue with connectivity or is there something wrong with reporting?

@kevin.piazza can you let me know what GTAC says?

Userlevel 2

You most certainly are having the same issue. :frowning2:  In the moments when this happens: reporting goes down and disconnects all of the clients. Sometimes they are able to reconnect immediately and sometimes it can take a couple minutes. For us, this disconnects teachers and students from their video conferencing sessions. I’m rolling back all of our AP’s to 10.0r9b right now.

 

@javabomberman Yes, I’ll most certainly share with you what the GTAC engineer says. However, even having high priority support on this is taking way too long. I sent over two AP’s tech data at 9AM and never heard a thing all day.

Userlevel 2

I think I experience the same issue with my test AP 410C with 10.2r4
Dropped out about 3 times in 2 hours without moving the MacBookPro or reaching MacOS roam threshold of -75...

Userlevel 2

I think I experience the same issue with my test AP 410C with 10.2r4
Dropped out about 3 times in 2 hours without moving the MacBookPro or reaching MacOS roam threshold of -75...

Okay, I’m glad I’m not alone. We have Apple, Windows, and Chromebook devices; they were all dropping every couple of hours. I reverted all of the AP’s back to 10.09rb last night. I’ll report back with any new info.

Userlevel 2

@kevin.piazza I got a few dozens schools with separate ECIQ instances and multiple admins working on the wifi platform. I got no way of knowing who has updated the firmware. By the time the customers complain the reputation damage is done. That means now checking all instances:confounded:  Really looking forward to get an API where i can automate the deployment of firmware and central monitoring. There is a few solutions out there which offer better help in situations like these. The amount of unpaid break fix hours has been way too high with Extreme in the past 12 months. Lets hope it gets better soon ;) 

Userlevel 2

@a.huerzeler Oh wow, separate instances….that must be SOOOO much fun! 😬I 100% agree, the issues with CloudIQ and the firmware updates have taken their toll. There are so many features that don’t even work or have been advised to turn off because they cause more issues. The AP650’s are powerful units and I’ve been instructed by GTAC to disable 75-80% of the advanced features through the past 6 months. I honestly just want our services and products to work as advertised. Truth be told, I wish Extreme would start a buyback program 🤣

Userlevel 2

Do any EN techs look at this forum? :face_palm:

Userlevel 2

Do any EN techs look at this forum? :face_palm:

@javabomberman 🤣 

I seriously only think it’s only Sam Pirok helping everyone on the forums. I never see any other engineers/support staff replying or helping her.

I’m calling in Gandolf @Sam Pirok 

Userlevel 7

Hey guys, we do have a bunch of our engineers on here helping out, I’m just the most obsessed =) I also have to go bug other engineers for a lot of my answers, so there’s a lot of team contributions behind the scenes. 

I have been following this thread, and I’m looking in to what we can do for you all. I didn’t want to jump in before I had something useful to contribute here but I’m definitely working on getting some help for you all!

I really appreciate you guys bring attention to these kinds of issues, please don’t hesitate to keep letting us know about these pain points. 

 

Userlevel 2

Thank you @Sam Pirok :blush:

You’re the best, hence why you’re our only beacon of hope. There are too many issues, Sam. CloudIQ monitoring going nuts/dropping reporting, GTAC not responding quickly, firmware updates that keep causing too many issues. We just want stable reliable WiFi, and I know that you know this. When educators lose their connection daily multiple times a day, K-12, because of buggy AP firmware along with CloudIQ having its own issues; it’s not acceptable.

Like what @a.huerzeler said, “The amount of unpaid break-fix hours has been way too high with Extreme in the past 12 months.”

We’re all worn out by this cycle of things not working as advertised. Okay, lunchtime. ✌

Userlevel 2

@Sam Pirok 
@kevin.piazza 

@a.huerzeler 

 

I rolled back one of our 630s from 10.2r4 to 10.0r9b and it seems the reporting shows the same type of connectivity as before. (8 hour time range; it’s been on 10.0r9b since last reconnect ~12 hours previously)

 

Is there some way I can actively monitor the connected clients so I can see exactly what’s happening at each of those points when connectivity/memory/cpu all show 0? Is there an active log monitor via cli I can keep up possibly?

Userlevel 7

I’d recommend auth debugs for cli monitoring, that should show any disconnection messages to help narrow down what exactly is causing the loss of signal. This guide reviews how to enable auth debugs: https://extremeportal.force.com/ExtrArticleDetail?an=000065975&q=Auth%20debug

If you can record a couple MAC addresses of clients having issues during the down times, that will help us sort through the auth debug logs. 

You can also try setting up a client monitor in the XIQ GUI to see if that gives us any insights. This guide reviews how to set up a client monitor (apologies for the outdated screenshots, I’ll update that guide soon): https://extremeportal.force.com/ExtrArticleDetail?an=000056843&q=Client%20monitor

Userlevel 2

@Sam Pirok I’m trying to get auth debugs from two AP’s for you, along with some client monitoring info. It hasn’t been a pretty morning with WiFi 😫

Userlevel 7

Thanks Kevin, sorry to hear it’s been a rough morning :disappointed: It would be good to attach all that to your case so our engineers can all see it, but please also let me know when that’s available and I’ll take a look too to see if anything jumps out at me. Good luck, and please let me know if I can help with anything. I’m pretty open today if you need help and want to jump on a call. 

Userlevel 2

@Sam Pirok Thank you very much for your assistance. I’ve uploaded two AP’s worth of auth diags tech data, along with a GUI diag for one of devices. I hope we can have a better idea of what’s going on ASAP. 🍻 to you Sam!

Userlevel 7

@kevin.piazza Thank you for getting that data together for us! I took a look at the client monitor, but it’s all normal connection messages and some generic disconnection messages. I’m going through the tech data now, also talking to the tech on your case who is reviewing the same data too, so far we haven’t found anything helpful but we are still looking there. 

@javabomberman, I heard from the engineering team looking in to this for us and they’d like to compare data from before you rolled back the firmware, do you remember approximately what time you rolled the firmware back to 10.0r9 on your APs? 

Userlevel 2

Unfortunately, the rollback to 10.09rb didn’t correct the lag and random disconnections. After I rolled out 10.2r4 on 1/10/2021, things were quiet overall until maybe the 15th. Then more issues crept up with random disconnects last week. This week it’s worse and reintroducing 10.0r9b just added an extra layer of divine icing on top. I’m just trying to provide a timeline based on incidents and teacher feedback.

It’s so bad we’re having to hardwire teachers laptops :face_palm:

Userlevel 2

@javabomberman, I heard from the engineering team looking in to this for us and they’d like to compare data from before you rolled back the firmware, do you remember approximately what time you rolled the firmware back to 10.0r9 on your APs? 

@Sam Pirok 

I only rolled back one device so far: GW-MDF-2

This was at about 10am CST yesterday, 1/27.

Userlevel 2

@Sam Pirok This is from AP NCPS-AP650-RM305, one of the tech data AP’s that I sent. They were being disconnected and having performance issues during the times of the dips.

 

Userlevel 7

Thanks very much for the extra details guys! I hear the engineering team has found some “interesting things”, still waiting on details on that but progress is being made!

Userlevel 3

Of note, these graphs change drastically depending on which “Time Range” is selected. I’m seeing this across all of my models and firmware versions.

I’ve not been able to correlate issue with these dips, but we are having many reports of disconnects across multiple sites, models, firmware all with the symptom of - Connected “no internet” - Most of which started toward the beginning of Jan.

I’ve been delaying working with support until I had more info on my end as I wanted to ensure we weren’t having some other internal issue. Also it’s extremely difficult to catch such an intermittent issue in the act.

 

Looking just now I’ve confirmed this type of graph behavior (I pulled up all my stragglers that aren’t on latest FW to give a broader picture)

AP550 - 8.0.1.0, 10.0.8.1, 10.0.9.2

AP250 - 10.0.9.2

AP230 - 6.5.11.0, 10.0.8.1, 10.0.9.2

AP330 - 6.5.12.0

AP121 - 6.5.12.0

Userlevel 2

Good morning, @Sam Pirok I hope you’re doing well this Friday. How are things looking?

Userlevel 2

@Sam Pirok How about on this Monday instead? :eyes:

Reply