Troubleshooting LACP incompatibility

The other day I was working remotely directing an on-site technician with the install of two 24 port Ruckus ICX Fastiron switches in a secondary building that would replace some aging Adtrans. It would use a fiber LACP-lag to the primary building hosting the main data room and core Cisco stack… no problem, we have multi-vendor environments all over the place, Ruckus and Cisco are perfectly interoperable and they both speak LACP. The switches were preconfigured and logically stacked before the final deployment; all prerequisite configurations were checked and everything was looking great. Rather, things weren’t going to go as smoothly as we planned.

The time came to move the fiber patches from the old Adtran to the new Ruckus stack. The LAG came up without much hassle and I was able to get logged via SSH from the management VLAN. Great, everything appeared to be connected! Except now computers are having severe internet lag and when the local technician ran a speedtest that reported back normal download speeds—but 0mbps down…? Shortly after, everything on our PC and Phone VLANs dropped offline. I was still logged in via our management VLAN, so that’s odd. Double checking the LAG connection showed that LACP was healthy, and the interfaces were trunked correctly on either side; mac addresses were populating across on the connected LAGs as well.

Nothing made sense. We tested our management VLAN and it seemed to be connected just fine, even to the point that the on-site tech was able to get a connection on that VLAN (DHCP/DNS/Internet connectivity). A couple other VLANs seemed to be working somewhat fine as well, with ping tests to public DNS returning echo-replies. After significant digging and double checking, there was zero indication of any problems on the switches themselves—but clearly we had a major problem.

Swapping out the Fiber Modules (GBIC’s) for the original [older] ones solved the problem for a mere few minutes and we thought that was the issue. Maybe the new ones were faulty? Nope, the issue persisted. No connectivity for PC’s and Phones. Strangely, I’m still SSH’d into the switch just fine. Eventually, after discussing the issue with a colleague, he suggested that I try removing the LACP-variable—disable one of the LAG interfaces. Immediately things came up and DHCP was flowing across to workstations at the secondary building… It was that simple.

Long story short: When building Fiber LAGs, specifically LACP, between different model switches and different firmware versions, it’s possible for there to be incompatibilities that won’t manifest in any sort of switch log or show commands. There may be differences in the implementation of firmware or LACP processing. It’s best to revert to a single connection if there are phantom issues when utilizing LAGs between dissimilar switches. Remember, just because software is telling you that there are no issues does not mean that there aren’t any. When you’re troubleshooting, you must remember to follow through with the process of elimination to its completion. You can’t assume that something isn’t an issue if you haven’t verified. Always verify the entire flow of traffic.

Here’s Google’s summary.

LACP Firmware Switch Incompatibility

Based on the provided search results, it is possible for different switches to have incompatible LACP firmware. Here are some key points that support this conclusion:

  1. Proprietary protocols: Some vendors have developed proprietary protocols (e.g., MLAG) that allow for bonded Ethernet channels across different Ethernet switches. However, these protocols might not be compatible with LACP firmware from other vendors.
  2. Different implementations: The IEEE 802.3ad standard for LACP is not universally implemented or configured the same way across all switch vendors. This can lead to compatibility issues between switches from different manufacturers.
  3. Static LAGs vs. LACP: Some switches may support static link aggregation (LAG) configuration, while others require LACP. If a switch is configured for static LAGs, it may not be compatible with a switch that only supports LACP.
  4. Firmware versions: The search results mention firmware compatibility issues, such as the bug reported in the Meraki MS Switch Firmware regarding LACP. This highlights the importance of ensuring compatible firmware versions across multiple switches in a network.
  5. LACP configuration: Even if switches support LACP, their configuration options and settings might differ. This can lead to compatibility issues if the LACP configuration on one switch is not compatible with another switch’s LACP implementation.

Leave a comment