Before I forget - from this location
https://msdn.microsoft.com/en-us/library/bb742455.aspx
check this clip out:
/snip/
Network Load Balancing's unicast mode has the side effect of disabling communication between cluster hosts using the cluster adapters. Since outgoing packets for another cluster host are sent to the same MAC address as the sender, these packets are looped back within the sender by the network stack and never reach the wire. This limitation can be avoided by adding a second network adapter card to each cluster host. In this configuration, Network Load Balancing is bound to the network adapter on the subnet that receives incoming client requests, and the other adapter is typically placed on a separate, local subnet for communication between cluster hosts and with back-end file and database servers. Network Load Balancing only uses the cluster adapter for its heartbeat and remote control traffic.
/snip/
That described behavior sounds like it may be a fit for some of the unicast results you've experienced.
If it sounds right and your choice of verification tests pan out, maybe we can take down another symptom today.
Regarding the traffic we're talking about as multicast:
I normally identify traffic associated with the old class d address block as "IP multicast".
the dmac of ip multicast belongs to the the IP mcast group address in the packet. Familiar behaviors such as you've described - join, leave, query are normally part of this environment.
On the NLB side the ip address (vip?) is typically a class A/B/C unicast address. Like Erik says, the tricky business when configuring cluster mac address in the switch is to force the static vlan/mac/port relationships into the forwarding behavior so the switch treats the traffic as mac multicast. The traffic should flood within the port scope you configure.
Another option would be to ignore calling out ports in the static switch config. Traffic will flood to all ports with vlan egress - which is probably what was happening anyway so no real loss. the difference is hopefully the optimized handling of the traffic by the switch after the new config.
About CPU:
You no doubt have benchmarks in your environment that serve as quick health checks for your network.
As far as a 'tell' for an improperly handled flooding condition I would normally use the switch CPU as a guide. It's not definitive as a diagnostic but significant soft path traffic will normally leave tracks in the switch OS.
Show system utilization process table output will include the switch packet processing task. This should be your canary during NLB performance tests from the switch's perspective.
So maybe we're on the right track. I hope we're able to eventually bring the various behaviors back in - identifying each as a known quantity. .
Jeeze I really need to find a way to have this discussion with 95%.fewer words. The long story just adds confusion.
Maybe links to documentation will improve usability, And would also improve my technical accuracy.
PS: Its been a long time since I boned up on NLB behavior. Thing is, Msoft used to include instructions for IP multicast/igmp support but I never heard of a network making that sort of config work.
Things may have changed for the better since then.
Yes, the next chapter will have fewer words, more links
Regards,
Mike