multicast packetloss
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
‎06-20-2017 12:23 PM
I believe my issue is related to xos0053644, but info on that specific issue seems to be limited to https://extremeportal.force.com/ExtrArticleDetail?an=000075074 no mention in release notes etc. Unless i'm missing a way to search bulk release notes/version history for bug numbers.
Also not 100% sure this is the issue, we do seem to have issues periodically with OSPF at some sites, but i'm unsure which of my switches is the root cause.
Our layout is
Remote Site Switch (ospf x450/x460-g2 etc) -> FIBER -> MLAG-Aggregation (l2-transit x670) -> FIBER -> MLAG-Core switches (ospf x670-g2)
The one page I found says "temporary/short outages" on ospf but honestly we've seen outages of many hours or days on ospf for some sites, and it doesn't happen to all the sites.
Should i just disable the to-cpu on all ports of our aggregation switches? Is their a draw back to doing that if those switches are only qinq and vlans no l3 beyond inband management ip? Can i do the commands on a live network or will it affect traffic flow on the transit switch?
I'm having issues understanding the problem, theirs 4 solutions listed but no explanation really of figuring out which switches are the issue actually, or what the draw back is to each of the solutions
None of the options seem to be for running on the actual OSPF switch (the ones with the ip interfaces) so is the issue only on switches that are layer 2 transit switches?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
‎06-20-2017 11:55 PM
enable igmp snooping forward-mcrouter-only
configure forwarding ipmc local-network-range fast-path
even followed your recommendation and did a full disable igmp snooping
but was still stuck...with sites dropping in and out of idle.
I guess next option unless extreme or you have another recommendation is upgrding from these releases to 21.x/22.x as i really starting to get the feeling that 15.6 was just a buggy branch and my agg switches and core switches are running on 15.6.2.12 (no-patch)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
‎06-20-2017 08:14 PM
If it makes you feel better you can do it during window but we have done it under an outage scenario with 89K macs 2 k vlans on those 8900 at peak without issues. You can also clear the tables I listed and see if that gives you some relief. If you have one broken and can do it one switch at a time till it starts working then you can maybe narrow down the culprit.
When all else fails and everything we try to restore the router adjacencies fails we have had to delete the vlan or vman and reprovision it to clean the hung table.. We have never had to reboot to fix this.
Also what cards are you running on the 8900's XL cards with MSM 128 need to match up. If you put one of the c cards in a chassis the whole chassis will drop down to the lesser card. Same thing for MSM. You cut your processing power in half by only running one card.
My problem has been we dont have any visibility or access in customer's routers so when they report a problem I have to get them back up now and have limited time to trouble shoot this kind of issue. We know there is an issue but it is impossible to replicate on demand... We will go 2 o3 months with no issues. With us it is always EIGRP because they are a Cisco shop ... Been one of those things we all are aware of including the NOC and know how to fix quickly when it is reported. It also seems to always be smaller less used services 1 to 5 mbs not any of the larger ones...
Disabling the L2-CPU has not seemed to make much difference with this problem. It may make your messages go away but not passing mcast router announcements is something different I believe.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
‎06-20-2017 07:17 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
‎06-20-2017 05:07 PM
Fix/workaround to help you find which switch you may be having issues with ( theory is a mcast entry get's corrupted in hardware and then not forwarded) ... You need to make sure that your two routers can at least ping their outer ip interfaces... If not then you have other issues.
One switch at a time when you have OSPF or router agency issues...
clear igmp snooping
clear fdb
clear ipmc fdb
check to see if your routers re-gain their agencies after each switch you clear in the path till you find the one that was at fault... GTAC will tell you which code you will need to be running for the switch and setup you have. 15. had some issues for sure.
We have found that this seems to be a very random thing and is usually triggered after a topology event where you have an EAPS failover. Seems to happen when we have port in the rings that flaps multiple times in a short time period.
Good luck, We have never found this to be a lack of resources so there are always open buckets in the memory for more entries. By default unless you have an ACL in place to block 224.0.0.0/24 all modern layer 2 switches should always forward the mcast traffic from router mcast ip's period. CPU only moves it to hardware first time... so good luck on your efforts... I will be tracking this one closely too for new info or ideas.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
‎06-20-2017 01:32 PM
In addition i just noticed we have a few sites that have 1 full and one in EX_START and on the core side its the same 1 full and one EX_START not sure if its related or the same issue. in the EX_START case i checked and i see the 224.0.0.5 on the vlan from what appears to be all switches
