Troubleshooting a failed / disconnecting AP

  • 1
  • 1
  • Problem
  • Updated 2 years ago
  • Solved
I have a site with an AP that has disconnected several times now, seemingly randomly (it's happened after hours, with zero client connections). Since it's offline I am not able to pull a trace log using the EWC console. I have had the folks at the site go and reboot the access point to get it back online. They have reported that when it's in this state, there is a single amber light (pretty much like when power is first applied).

Is there anything I can do in the way of troubleshooting a failed and rebooted AP? Is there any sort of memory dump that would survive a reboot? I noticed that there is a /tmp/log/ap.logLastReboot.gz file. But if I gzip -d the file, and cat the ap.logLastReboot file, all it says is "---WARNING: ap-log is not valid (size 2298819153, tail 2261266449)".

This may be indicative of a completely different problem or bug.

For the time being I have ssh'd into the AP and run:
cset sshtimeout 0
capply
csave
tail -f /tmp/log/ap.log
And now I am just waiting for it to fail again and hoping that it dumps something to the ssh session before it bombs out. Although I suspect this is all over a flaky cable-modem connection that's occasionally dropping packets.

Anyone have other ideas?  :-)


Relevant details:
My AP is a 3825i, and I am running with version 10.11.02.0032.
Photo of Steve Ballantyne

Steve Ballantyne

  • 5,784 Points 5k badge 2x thumb

Posted 2 years ago

  • 1
  • 1
Photo of Ronald Dvorak

Ronald Dvorak, Embassador

  • 48,924 Points 20k badge 2x thumb
Looks like you've done all the right things...

I've not even once tried to tx a file from the AP via CLI :-) so no idea about that part.

To collect the ap.log is the right approach in my opinion but you haven't mentioned whether you'd reach the AP via IP in that state (if not the ssh session will disconnect).
In case you loose the link in that state make sure that the onsite guys do a cat ap.log via console before they restart the AP so you've more data.

Is the controller address set static on the AP or learned via DHCP option.
Photo of Craig Guilmette

Craig Guilmette, Employee

  • 2,670 Points 2k badge 2x thumb
Hello Steve

99% of the time when the ap.loglastreboot file is empty or corrupt the cause of the reboot is loss of power. I would try using a new POE injector or make sure if you are using a POE switch that the switch is not having issues. If the AP was rebooting due to a software bug or a reboot from the controller that log would be valid. Hope this helps!
 
Photo of Steve Ballantyne

Steve Ballantyne

  • 5,784 Points 5k badge 2x thumb
Hrm ... I hadn't even thought about the PoE. I am powering the AP off of one of the two PoE ports on a Cisco ASA 5505. The other PoE port is powering a phone. I don't know that there is any good reporting on that device for power utilization, but perhaps it's just getting drained.

I will see about getting an inline PoE in place to see if it improves connectivity.

Thanks for the feedback!
Photo of Umut Aydin

Umut Aydin, Escalation Support Engineer

  • 2,290 Points 2k badge 2x thumb
Hi Steve,

after the AP is back you could draw the AP Trace logs now.
And open a case with GTAC.
Maybe we can see more in the AP reboots.

If they had time to write the trace dump during the freeze then we would have maybe ideas what happened and if not we need to wait until the next time.

Regards
Umut
Photo of Steve Ballantyne

Steve Ballantyne

  • 5,784 Points 5k badge 2x thumb
Hello Umut,

I have the AP back online. Where would I find the trace logs?

For the time being when it drops offline, I am remotely connecting to the switch and downing/re-upping the port so that the AP reboots.
Photo of Ronald Dvorak

Ronald Dvorak, Embassador

  • 48,924 Points 20k badge 2x thumb
Hey Steve,

it's in > Logs > AP: Traces > select AP > retrieve traces