08-12-2019 08:10 PM
After moving from Classic to NG in our on-prem environment (about a month ago), we started seeing issues. Unable to push out configs, error in data reporting (# of clients connected, etc), a call to support brought up an obscure and rare hardware requirement for NG on prem. No SAN support, must be a directly connected drive, preferably SSD.
I have skepticism from my colleagues, this is a highly rare request for a VM environment in our experience. How did any of you else address this specifically?
thanks,
02-06-2020 08:04 AM
On 19.5.1.7-NGVA On Prem (VM on SAN SSD) with 300 AP, we had this problem for the first time on december.
The ElasticSearch service was stopped and can be restarted for only two hours max after reboot.
We had the same response from the support : SAN is not supported (we never had this information when we installed HiveManagerNG).
After a long negociating, the tech support accept to connect to the console and type this command : curl -X DELETE "localhost:9200/hm-*?pretty".
He said the problem will come back and he was right it came this morning after 50 days.
12-11-2019 02:27 AM
Hi Abe.
Depending on how far through this whole thread you read, and if your issue returns, we had somewhat similar issues and the fix ended up being a combination of about four things. These all got buried in my paragraphs...
And who knows, maybe this topic will help the next person who finds themselves stuck on 19.5.1.7.
After all of this we have what seems to be a stable, happy HMVA again. I am planning to purge indexes quarterly, sooner if I start to notice problems.
Recently, I turned 10 minute client stats back on (down from the 60 in Kevin's post) because I didn't like the massive gaps/spikes in my data. Things seem to be running fine. I left application stats and KDDR logs disabled. I hope to bring applications back since it was neat to know what was going on and makes the dashboard prettier. But "pretty" is not operationally important so I'm not in a rush to test fate again.
Alan
12-09-2019 06:45 PM
I totally agree with that last paragraph! We're also on 19.5.1.7 (which is the first update they put Client/Network 360 integrated into onprem I believe) with 282 devices.
I just got off the phone with support (1 hour on hold waiting for a pickup and another hour troubleshooting...wait time seems longer since Extreme purchase). We had an issue where a bunch of APs got a config months ago with an accidental email address autofilled from my browser in it. This caused "The CLI 'ssid [email address] qos-classifier [email address] execute failed, cause by: Unknown error".
While the config was nowhere in HiveManager, a CLI reset of the AP and complete config update would bring that misconfig right back. Only thing that fixes it is a HM GUI "reset to default" which clears everything and you have to assign it back policy, locations, etc. Then a complete config update.
To me, this very much seems like a bug in HiveManager where it's holding onto data somewhere and not setting configs properly.
To ATAC, this is magically caused by HiveManager running on a SAN. What kind of logic is that? We run our VM infrastructure from a SAN with SSD caching and that thing is not a slouch. We have DB applications that run just fine with just as much I/O going on. It seems totally counterintuitive that they would just throw a blanket statement across anything they can't figure out as...."must be your SAN".
I've checked HiveManager's IOPS and majority of the hits are all SSD cache hits. It's also not our highest running VM in terms of storage hits.
This really does seem to be trying to eliminate On-Prem and force Cloud as you mentioned. We too love the Aerohive product and have been running it since 2013/2014, but this kind of thing does make me question what to do at our next refresh cycle. If they as a company want to go this route, do so, but tell the customers that it's because of that reason rather than start not supporting a product they've put out. I understand it's hard to develop an on-prem product that does the same as cloud, but I would rather hear that instead of this cop-out on actual issues that might be happening.
11-28-2019 12:56 AM
Thanks Kevin. That's a helpful comparison.
We're all AP230s that were on 8.4r11. I reverted to 8.2r6 to try and get rid of another portion of the problems were having (seems better, but not fixed, so far). We have the default VA configuration running on Cisco UCS and Nimble storage. That's been fine up until the "not supported" change, and possibly exacerbated by the problems introduced during our last VA upgrade.
We also have 1 PPSK SSID, 1 802.1x with RADIUS/NPS, and an open guest network with speed and port limits. Looks like we're a bit heavier in clients but also a single campus college with fewer spaces an APs.
We'll see how these latest changes play out, what's next for the "Cloud IQ" VA (since HM is apparently gone), and when some of these HiveOS problems get fixed.
11-19-2019 07:34 PM
Alan, per your query...
We're running mostly AP230's at 8.2r6, then a mix of other 350's (6.5r12), 120's (6.5r10), 121's (6.5r10) , 130s (8.2r6) for a total count of 591 APs, on-prem 12.8.3.3-NGVAFEB19. We run this on a Nutanix virtual system, with RAID SSD & spindle. HM VM is 8 core, 40 gb RAM. The HM Virtual Appliance Management System shows 60% mem utilitization, and 4% on idle (ie, not pushing updates or other admin driven activity).
Our school district is 6K students over 11 schools. We're about at that point where mobile (wifi only) devices out number traditional, so perhaps 3K of wireless devices, plus staff & HS student BYOD (2K high school students). Our municipal wifi usage is minor in comparison.
We use 1 PPSK SSID, 1 radius SSID (NPS on 2 seperate domains), and the odd open guest and such in specific locations or times.