cancel
Showing results for 
Search instead for 
Did you mean: 

How Do YOU Troubleshoot?

How Do YOU Troubleshoot?

cpdecker
New Contributor

Hey all,

 

I was inclined to post this in Support, but this seems to be a much more active forum, so I'm posting it here.

 

In my quest to become a better troubleshooter, I want to know how everyone else troubleshoots. Ideally I would like to get better at remote troubleshooting and resolution of potential issues. What do you do when a user calls you up or submits a ticket to complain about a wireless issue? You know what kind of complaints we usually get: "Our wifi is slow!" What do you look for now? What questions do you ask now?

 

Here's what my current bag of tricks looks like, in NO PARTICULAR ORDER. For context, I work in a K-12 environment with an access point in each classroom, more or less. Just over 800 access points total across 20 sites.

 

1) Use the show acsp neighbor CLI command to look for other, potentially interfering SSIDS, and, in the MAPS section of HiveManager, use right click > ACSP Neighbor Information + the colored channel map option (check the Channels box) to verify that there is no significant channel overlap for each band (2.4 vs 5). I usually look for -88dBm or less (higher number) signal strength heard between two radios on the same channel.

 

2) Pull up the client list for the relevant AP (HiveManager > Monitor > Click on Client count) to verify that there aren't an excessive number of clients associated to the access points--ours seem to start disconnecting clients, seemingly at random, around 80+ clients, and I like to generally have <30 clients on an AP--and make sure that they have a decent RSSI coming from the clients--I look for about -70 dBm or better (lower number), which almost all clients tend to have in our environment. Conversely, if I can, I try to see that the clients have a good RSSI on their end too--coming FROM the AP--easy enough to do on a Mac by holding down Option and left-clicking the wifi icon, or by using my AirCheck device. I also check to see if any of the clients have downloaded a large amount of data in the past two hours on this UI screen. Sometimes I will see several gigabytes per client, which indicates that they have kicked off a large app install or something else that is sucking up airtime and bandwidth. 

 

3) Go through the list of clients for the relevant AP--same UI location as above--by clicking on the MAC address of each client in the list and checking "Latest Association History" to see if they are reassociating to the same AP or different APs over and over again in a short span of time.

 

4) Push a complete configuration update and/or reboot the access point, which forces channel selection to run again as well. 

 

5) Ask the user what they are trying to do. Are they trying to get to a particular website? Do other websites load okay? Are they trying to download Garageband--a 1.7 GB app--on 25 iPads at once? Are all the clients streaming YouTube videos in 1080p 60fps at the same time? Are there devices downloading huge OS updates? Does the problem happen only on a single device, or on some devices and not others? What happens if they connect the problem device to a different access point? This list could go on and on...

 

6) Use the Remote Sniffer feature (HiveManager > Monitor > Select an AP checkbox > Utilities > Diagnostics > Remote Sniffer) to get a packet capture of the live traffic to monitor for TCP issues / dropped packets / poor round-trip-time. I'm not sure this is an option that is going to work out for me at the moment since I seem to have pretty bad packet loss between the access point and my capture device, which I believe has nothing to do with the wired network conditions between the two. With the packet loss I'm not able to really diagnose any potential problems that might actually exist.

 

7) Use an RF spectrum analyzer to look for interference near the AP on the relevant channel. I'm pretty much out of my element at this phase, as I don't know how to do this properly. I have an AirCheck device that is supposed to show 802.11 vs non-802.11 airtime utilization. I'm not entirely sure I trust that particular information. I know the access points can do spectrum analysis, but it screws up their performance, so I never really use that feature.

 

😎 Check the connecting wired interface and backend network/WAN for errors, throughput, correct interface speeds. Check the patch cable and cabling run from the AP back to the switch. Run iPerf tests from a client connected to the AP and from a client directly wired in back to a server at our Central Office, which is the last hop for traffic before it heads to our ISP. Run a speedtest on fast.com. Verify interface configurations on the switch.

 

9) Make sure the access point has the correct network policy and radio profiles, and the correct power settings.

 

10) Replace the access point--if all else fails, or as a quick sanity check in the early stages of troubleshooting.

 

What do YOU do? What CLI commands or UI pages or reports do you use to squeeze the most useful information out of the environment?

 

Thanks in advance for any and all responses and comments.

1 REPLY 1

bruce_stahlin
Contributor III

Nice, thorough post. Not to detract from the tech side, but the very first thing I do when troubleshooting is isolating the actual issue. That is to say if a user calls to complain about "wifi issues," it is usually tied to their overall experience, not necessarily an actual layer two or wifi issue. For example, I've had tier one techs call to complain about users disconnecting from wifi and my first question is "are they actually getting kicked off wifi and losing "bars, or perhaps their main complaint internet latency?" When we drill down to find out exactly what the user is experiencing, we may find they got kicked out of a remote application as a result of 50-100G of iOS updates consuming Internet bandwidth, as you mention in point five.

 

With that out of the way, I work through the ISO model:

  1. Physical; Are the related APs up , connected to upstream switch, etc.
  2. Datalink (wifi/ethernet); *Client Monitor* and VLAN Probe are essential tools. Is the issue affecting single or multiple locations, etc.
  3. Network (IP); Capwap, pingable, is client(s) pingable, etc.
  4. Transport (TCP/UDP);
  5. Session; For our purposes, can be rolled with Presentation and Application
  6. Presentation; See above
  7. Application; If all other traffic is traversing smoothly, I start pointing fingers
  8. Political; How severe, e.g. Is it the Boss complaining, an entire site down, or someone can't access FB on company time?

 

Most of our troubleshooting is between layers two and three, as you have listed in your steps. A couple of things I would add;

  • Proper planning; Use maps to ensure proper coverage and capacity
  • Assessment of the client base; e.g. if you have a roaming client base, do not use load balancing on your APs

 

I like what you state in point one. Tuning 2.4 GHz radio power is essential to reduce co-channel interference. And last, the CWNX books are full of tips and tricks for troubleshooting:

https://www.amazon.com/Certified-Wireless-Network-Administrator-Study/dp/1119425786/ref=sr_1_1?keywords=cwna&qid=1572562180&s=books&sr=1-1

 

Best,

BJ

 

GTM-P2G8KFN