cancel
Showing results for 
Search instead for 
Did you mean: 

Link Aggregation on XT9 Creates a Loop

Mbrossart
Level 9

I’m trying to create a link aggregation between an XT9 acting as a router and a NETGEAR GS110EMX.

  • The switch reports that both ports in the aggregation are up.
  • When I do a cat /proc/net/bonding/bond0 on the router, it reports that both legs of the aggregation are up.
  • When both legs of the LAG are up, routing on my network deteriorates.  Nodes disconnect from the network altogether.  Nodes refuse to backhaul over Ethernet.  I can’t log into network switches, etc.
  • The switch link lights flutter for a bit as you would expect with normal traffic, then they slowly rock back and forth.  There’s a link light on either side of the port.  First one light lights up for about a second, then the other lights up, then back to the first.  After a couple seconds of this, they start to flutter again.  Then they rock back and forth again.  I don’t know exactly what this is, but I’m certain its an error state, and I’m confident it’s a loop detection.
  • When I turn off loop protection in the switch, the rocking back and forth stops, but the routing issues persist.
  • I have a NETGEAR GS108T that I can connect to the GS110EMX in LAG as a test and it runs fine.
  • When I connect the GS108T to the XT9 in LAG with loop protection, the ports flutter a bit then shut down, which I believe is its response when detecting a loop.  If I turn off loop protection, the ports stay up.
  • If I unplug one let of the LAG, it all runs fine, but of course, what’s the point?

Anyone see anything or have suggestions of what to check?  I’m happy to share additional information, but I’m not sure what to share.

Thanks.

565 Views
10 REPLIES 10

jzchen
Level 14

Google ASUS LAN aggregation and very few examples come up, BUT they all appear to use LAN 1 & 2.  I took a peek at my AXE16000 ASUSwrt and it says 1 & 2 also.  In your other thread you seemed to solve it moving to 1 & 2, but then switched back?  (I'm sure there's a reason for that but) you may be back to the wrong 2 & 3...

Mbrossart
Level 9

Thanks @jzchen.  It’s interesting in ASUS’ documentation, some units specify ports 1 & 2, others specify 2 & 3.  My XT9s say to use ports 2 & 3.  Yes, I thought ports 1 & 2 were the answer, but then when I ran cat /proc/net/bonding/bond0, it reports that one of the legs was down.  So, it looks like I was wrong when I assumed I should be using ports 1 & 2.  Interestingly, if I use port 2 or port 3, it works fine, but of course, this defeats the purpose.  It’s only when I use both ports 2 & 3 that the loop appears.  Also, when I use ports 2 & 3 and run cat /proc/net/bonding/bond0, it reports both legs of the LAG are up.

My current theories, but I’m just guessing are

  • My last firmware upgrade was somehow botched, but I don’t know how to reload it.  I suppose I could to a manual load.  Perhaps I’ll try that next.  I’ll be away from my infrastructure for a few days, so I guess it will have to wait.
  • I need a static route.  Problem here is, I see no documentation indicating this would be the case, and I don’t know what route I’d put in.  Again, just guessing.

Thanks again for your interest in helping.

Reloading firmware isn’t too much trouble, 1st download to your local machine and upload (manually) an older version via ASUSwrt.  Then upload the version you want to reload, either via upgrade or manually.

It’s my understanding that LACP is dynamic and required by ASUS, so static won’t work.

I really hate to be a party pooper but from what I’ve read through, those who got it to work saw performance degradation vs performance enhancement.  If that is the case don’t be surprised IF you see your nodes moving to wireless backhaul, the 4804 Mbps maximum speed can be pretty competitive to 2.5 Gbps wired backhaul.  (I have 2.5 Gbps backhaul between my AXE16000 and AXE11000, yet they uplinked via 6 GHz (also 4804 Mbps) for days on end.  I posted a screen shot of it on here somewhere.  They eventually went back to Ethernet)….

Mbrossart
Level 9

I heard back from NETGEAR support that the switch may in fact detect a loop when running link aggregation.  Their recommendation is to turn off loop detection.  Sounds like end of story.

@jzchen , thanks for the input.  Sounds like I’ll turn off loop detect, do a few tests and see if this venture has been worth it.

Mbrossart
Level 9

I thought I’d poke this thread to see if anyone has ever definitively gotten LAG to work on an XT9, or any ASUS router for that matter.  From what I’ve experienced and what I’ve read, it simply doesn’t work.  Unfortunately, tech support isn’t much help.  They just claim it works and won’t troubleshoot.  I should probably leave well enough alone.  My topology and use doesn’t really necessitate LAG.  The fact that ASUS says they support it, I geek out on it and want to get it running.

So, any success stories out there:

Why do I say it doesn’t work?  When it’s connected, I get:

  • The switch that my router is connected to detects a loop.
  • All Ethernet backhauls cease and run over WiFi.
  • Network performance goes through the floor.
  • Devices on the network become unreachable.
  • I have two switches that support LACP LAG.  When connected together, LAG works great.  When either is connected to my XT9 router, it creates a loop and everything deteriorates.

My Topology

  • I am connecting a Netgear GS110EMX switch to an ASUS XT9 router (not a node).
  • The GS110EMX is using LACP.  LACP priority on the LAG is 1.  Timeout is Long.
  • The XT9, from what I can tell is using LACP on ports 2 & 3.  Bonding/Link Aggregation is Enabled.  Not much to configure here.
  • The GS110EMX reports that LAG is up via the GUI, but it is also detecting loops.
  • Using cat /proc/net/bonding/bond0 on the XT9 reports that the LAG is up on both legs.

What I’ve tried

  • The GS110EMX is capable of LACP & Static LAG.  In Static LAG, the LAG also comes up on the switch side, but it also loops.
  • Connect to AT9 ports 1&2.  The switch does not detect a loop interestingly enough, but Ethernet backhaul drops off, devices become difficult to reach and performance tanks.  cat /proc/net/bonding/bond0 reports one leg of the LAG up and the other down.  I can’t reach the switch to see what it reports.

In our prior discussion I did not catch the LACP Priority nor Timeout settings.  Have you tried the default priority of 128?  Also timeout short?

At this point it seems to be a compatibility issue between ASUS and Netgear, because two Netgear switches “understand each other” and work together.  Maybe playing a little with the priority and timeout settings may get the two manufacturers to work together.  It’s good there is very little settings to play with on the ASUS side that means less combinations for you to try.  There is also a LACP System Priority setting with a default of 32768?  Do you understand what these are?  I haven’t gotten a chance to Google them…

I saw a successful example on SNBForums. I will try to link the thread here and also my takeaway:  all spare LAN ports on the ASUS XT9 should be empty, move any device to the switch.  Just the two LAN ports for the aggregation on the XT9 should be connected to.

https://www.snbforums.com/threads/ax86u-lan-link-aggregation-doesnt-seem-to-be-working.77664/

Mbrossart
Level 9

Thanks jzchen.

Yes, I’ve been all over the map on port and system LACP priorities.  I’ve had them at the default, the min, the max.  I’ve made sure they’re the same and not the same.  I’ve heard that LACP priority has little or no impact when you have only one LAG.

It may well be an incompatibility issue or even a setting issue that I can’t tune on one side or the other.  Both devices are pretty powerful, but neither are pro devices.

The piece I’m chasing now is the MTU on the LAN switch.  I read an article that believes it is especially important to align port speed, duplex, flow control and MTU in a LAG.  I’ve verified speed and duplex.  I’ve enabled and disabled flow control (to no affect).  I see that my Max MTU on my switch is 10240, but I have no way to adjust it.  I don’t have any visibility into MTU on my XT9.  I know how to get to the MTU on the WAN, but I need to see it on the LAN switch.  I’ve tried enabling jumbo frames on the XT9 because 10240 looks pretty big, but that had no impact.

If anyone has insight into the MTU on the LAN switch on the XT9 and if there’s any way to configure it, that’d at least help me verify or dismiss this as an issue.

Thanks,

Mike

The only thing I can think of that has seemed presumed is that you’ve tried another pair of same spec cables?  (The SNB forum thread made it seem easy, and we didn’t discuss this if my memory serves me)…

Mbrossart
Level 9

Yes.  I’ve tried multiple cables.  Most of them are specifically 5e, but I found a pair of 6.  None make a difference.