05-10-2023 04:55 PM
I’m trying to get Link Aggregation to work between an XT9 router and a NetGate GS110EMX switch. Both devices support 802.3ad.
After the router reboots
However, my XT9 nodes no longer backhaul over Ethernet. They backhauled over Ethernet just fine before I enabled LAG. Sometimes one node will backhaul over Ethernet but never both.
The instructions I find for ASUS LAG always refers to connecting a NAS device. Does LAG not work with network switches for ASUS? Are there special routing considerations? Has anyone else done this?
Solved! Go to Solution.
05-12-2023 12:07 PM
Thank you both so much for your time, attention and input. I think I may have it solved, and I think the root cause was bad documentation. When I set up LAG, my XT9 tells me to plug into ports 2 & 3. I am now plugged into ports 1 & 2, and everything is running fine. Until this configuration, I always either got the link lights pacing back and forth or I had to turn off loop prevention, and I still got a whole lot of anomalous routing issues. In this configuration, I can re-enable loop prevention and the switch doesn’t complain. I think plugging into 2 & 3 is what was causing the loop in the first place.
I did make a few changes too, but testing these out on ports 2 & 3 didn’t solve the problem.
For comparison, here’s my new readout.
mbrossart@ffw:/tmp/home/root# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer3+4 (1)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
802.3ad info
LACP rate: fast
Min links: 0
Aggregator selection policy (ad_select): count
System priority: 65535
System MAC address: a0:36:bc:62:be:e0
Active Aggregator Info:
Aggregator ID: 3
Number of ports: 1
Actor Key: 9
Partner Key: 1
Partner Mac Address: e0:46:ee:10:25:5f
Slave Interface: eth2
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: a0:36:bc:62:be:e0
Slave queue ID: 0
Aggregator ID: 3
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
system priority: 65535
system mac address: a0:36:bc:62:be:e0
port key: 9
port priority: 255
port number: 1
port state: 63
details partner lacp pdu:
system priority: 1
system mac address: e0:46:ee:10:25:5f
oper key: 1
port priority: 1
port number: 8
port state: 63
Slave Interface: eth3
MII Status: down
Speed: Unknown
Duplex: Unknown
Link Failure Count: 0
Permanent HW addr: a0:36:bc:62:be:e0
Slave queue ID: 0
Aggregator ID: 4
Actor Churn State: churned
Partner Churn State: churned
Actor Churned Count: 1
Partner Churned Count: 1
details actor lacp pdu:
system priority: 65535
system mac address: a0:36:bc:62:be:e0
port key: 0
port priority: 255
port number: 2
port state: 71
details partner lacp pdu:
system priority: 65535
system mac address: 00:00:00:00:00:00
oper key: 1
port priority: 255
port number: 1
port state: 1
mbrossart@ffw:/tmp/home/root#
05-10-2023 06:24 PM
I'm happily running a LACP LAG from a GT-AX6000 on the 9.0.0.6.102 beta to a Cisco CBS350 switch. I don't particularly stress the link, and I'm not running AiMesh, but it seems to work well.
I believe LACP is mandatory for ASUSWRT link aggregation, but it might be optional on your switch, so make sure it's enabled for your LAG. Some switches default to LACP disabled, for old-style static LAG, rather than 802.3ad LACP dynamic LAG. If ASUSWRT does not receive LACPDUs on a member port, I don't think it will activate that port.
I'm not sure about the XT9, but there's not an easy way to just see the full status of LACP on the GT-AX6000 in the web GUI. You can check the overall status on the command line as follows:
root@GT-AX6000:/tmp/home/root# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer3+4 (1)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
802.3ad info
LACP rate: fast
Min links: 0
Aggregator selection policy (ad_select): count
System priority: 65535
System MAC address: 04:42:1a:xx:xx:xx
Active Aggregator Info:
Aggregator ID: 1
Number of ports: 2
Actor Key: 1
Partner Key: 1007
Partner Mac Address: 34:b8:83:xx:xx:xx
Slave Interface: eth1
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 04:42:1a:xx:xx:xx
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
system priority: 65535
system mac address: 04:42:1a:xx:xx:xx
port key: 1
port priority: 255
port number: 1
port state: 63
details partner lacp pdu:
system priority: 1
system mac address: 34:b8:83:xx:xx:xx
oper key: 1007
port priority: 1
port number: 9
port state: 63
Slave Interface: eth2
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 1
Permanent HW addr: 04:42:1a:xx:xx:xx
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
system priority: 65535
system mac address: 04:42:1a:xx:xx:xx
port key: 1
port priority: 255
port number: 2
port state: 63
details partner lacp pdu:
system priority: 1
system mac address: 34:b8:83:xx:xx:xx
oper key: 1007
port priority: 1
port number: 10
port state: 63
root@GT-AX6000:/tmp/home/root#
You can check more briefly if the ports are active in the bonding, and some other basic status on the command line as follows (substitute eth1 and eth2 for your interfaces):
root@GT-AX6000:/tmp/home/root# cat /sys/class/net/bond0/bonding/all_slaves_active
1
root@GT-AX6000:/tmp/home/root# cat /sys/class/net/bond0/bonding/mode
802.3ad 4
root@GT-AX6000:/tmp/home/root# cat /sys/class/net/bond0/bonding/slaves
eth1 eth2
root@GT-AX6000:/tmp/home/root# cat /sys/class/net/bond0/lower_eth1/bonding_slave/state
active
root@GT-AX6000:/tmp/home/root# cat /sys/class/net/bond0/lower_eth2/bonding_slave/state
active
root@GT-AX6000:/tmp/home/root#
05-13-2023 09:27 AM
Hey @Murph_9000 , just checking your solution. You don’t have to bypass loop protection or tweak loop protection in any way do you? Any static routes on your router? Do you have any cool analysis tricks to check for loops?
05-10-2023 08:30 PM - edited 05-10-2023 08:35 PM
I don’t have these products in my household I’m afraid…
There’s a fairly recent firmware available for the XT9 as well as for your switch, have you updated to the latest? 3.0.0.4.388.23012 and 1.0.2.7 respectively.
It isn’t clear how you’ve connected the XT9 nodes, I’m gonna assume you’ve connected them to ports 9 and 10 of the switch to take advantage of the 2.5 G speed. If you haven’t yet: Have all the cables connected as you wish. Forget all the nodes and hard reset them. Re-add the nodes, hopefully they are found with an Ethernet port symbol meaning through the switch. Don’t be surprised if for a couple of days they drop connection. (I have had this happen to my nodes. AiMesh is searching for the best backhaul option. I leave Ethernet backhaul mode OFF and AUTO on all uplink priority settings, per default). If you find a node does not come back online try manually turning that node OFF and ON again via it’s power switch, (my RPs do not have a switch so I unplug them then plug them back in, which works also for my RT-AC68U which is perched up about 10 - 12 ft. I’d need a ladder to reach it so I just unplug the extension cord that is down at the UPS.
If they aren’t found then connect to LAN 1 on the XT9 router and try searching again. (This should work. Then reconnect it as preferred).
I hope I didn’t miss any steps/make sense.
05-11-2023 07:53 PM
Thanks for your input. I’ll have to try some commands in search of more details. I have a few more data points. The plot thickens, but I’m still unsuccessful with my LAG.
I’m treating my GS110EMX as a core switch. Yes, my nodes are plugged into ports 9 & 10. I’ve aggregated ports 7 & 8 to link to the router. The router, and all of the nodes for that matter, are running the current firmware. I’ve updated the switch to the current version and the issue persists.
I talked to tech support again today. They’ve asked some additional information and logs. They’re escalating my case. I’ll see if I can get some more data via some command line. I’ve only used the GUI. How do you access the command line?
05-11-2023 08:17 PM
I know SSH is a way that’s very popular, and I found some instructions:
https://www.tomshardware.com/how-to/use-ssh-connect-to-remote-computer
I haven’t tried…
05-11-2023 11:31 PM
Okay, figuring a few more things out:
Anyway, if it’s of interest, here are the results of my cat queries…
mbrossart@ffw:/tmp/home/root# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer3+4 (1)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
802.3ad info
LACP rate: fast
Min links: 0
Aggregator selection policy (ad_select): count
System priority: 65535
System MAC address: a0:36:bc:62:be:e0
Active Aggregator Info:
Aggregator ID: 3
Number of ports: 2
Actor Key: 9
Partner Key: 1
Partner Mac Address: e0:46:ee:10:25:5f
Slave Interface: eth2
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 1
Permanent HW addr: a0:36:bc:62:be:e0
Slave queue ID: 0
Aggregator ID: 3
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
system priority: 65535
system mac address: a0:36:bc:62:be:e0
port key: 9
port priority: 255
port number: 1
port state: 63
details partner lacp pdu:
system priority: 32768
system mac address: e0:46:ee:10:25:5f
oper key: 1
port priority: 128
port number: 6
port state: 61
Slave Interface: eth3
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 1
Permanent HW addr: a0:36:bc:62:be:e0
Slave queue ID: 0
Aggregator ID: 3
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 1
Partner Churned Count: 2
details actor lacp pdu:
system priority: 65535
system mac address: a0:36:bc:62:be:e0
port key: 9
port priority: 255
port number: 2
port state: 63
details partner lacp pdu:
system priority: 32768
system mac address: e0:46:ee:10:25:5f
oper key: 1
port priority: 128
port number: 7
port state: 61
mbrossart@ffw:/tmp/home/root# cat /sys/class/net/bond0/bonding/all_slaves_active
1
mbrossart@ffw:/tmp/home/root# cat /sys/class/net/bond0/bonding/mode
802.3ad 4
mbrossart@ffw:/tmp/home/root# cat /sys/class/net/bond0/bonding/slaves
eth2 eth3
mbrossart@ffw:/tmp/home/root# cat /sys/class/net/bond0/lower_eth2/bonding_slave/state
active
mbrossart@ffw:/tmp/home/root# cat /sys/class/net/bond0/lower_eth3/bonding_slave/state
active
mbrossart@ffw:/tmp/home/root#
05-12-2023 05:09 AM - edited 05-12-2023 05:15 AM
Is LAG Type set to Static or LACP on the switch settings? Default is Static and I think it should be LACP pg 146 of the GS108Tv3 manual.
That “63” is good, that “61” is concerning. Per:
https://hareshkhandelwal.blog/2022/07/28/lets-understand-lacp-state-machine-using-linux-bond/
All that I got for now….
05-12-2023 06:22 AM
State 61 for the member ports indicates LACP is active, but there's a mismatch in the config. That's 0x02 unset, which is "timeout". When that bit is set, the port will aggressively timeout on loss of LACPDUs, for the purpose of detecting link failure. On my Cisco switch, it's the difference between "lacp timeout short" (timeout bit set) and "lacp timeout long" (timeout bit unset) in the interface config for the member ports. In long timeout mode, it can take a couple of minutes for LACP to detect link failure, but just seconds in short mode.
Both ends of the LAG should be configured the same to avoid problems. ASUSWRT defaults to short timeout (0x02 set), and I don't think it has an official way to change that, so the switch should be set to fast/short timeout.
05-12-2023 12:07 PM
Thank you both so much for your time, attention and input. I think I may have it solved, and I think the root cause was bad documentation. When I set up LAG, my XT9 tells me to plug into ports 2 & 3. I am now plugged into ports 1 & 2, and everything is running fine. Until this configuration, I always either got the link lights pacing back and forth or I had to turn off loop prevention, and I still got a whole lot of anomalous routing issues. In this configuration, I can re-enable loop prevention and the switch doesn’t complain. I think plugging into 2 & 3 is what was causing the loop in the first place.
I did make a few changes too, but testing these out on ports 2 & 3 didn’t solve the problem.
For comparison, here’s my new readout.
mbrossart@ffw:/tmp/home/root# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer3+4 (1)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
802.3ad info
LACP rate: fast
Min links: 0
Aggregator selection policy (ad_select): count
System priority: 65535
System MAC address: a0:36:bc:62:be:e0
Active Aggregator Info:
Aggregator ID: 3
Number of ports: 1
Actor Key: 9
Partner Key: 1
Partner Mac Address: e0:46:ee:10:25:5f
Slave Interface: eth2
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: a0:36:bc:62:be:e0
Slave queue ID: 0
Aggregator ID: 3
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
system priority: 65535
system mac address: a0:36:bc:62:be:e0
port key: 9
port priority: 255
port number: 1
port state: 63
details partner lacp pdu:
system priority: 1
system mac address: e0:46:ee:10:25:5f
oper key: 1
port priority: 1
port number: 8
port state: 63
Slave Interface: eth3
MII Status: down
Speed: Unknown
Duplex: Unknown
Link Failure Count: 0
Permanent HW addr: a0:36:bc:62:be:e0
Slave queue ID: 0
Aggregator ID: 4
Actor Churn State: churned
Partner Churn State: churned
Actor Churned Count: 1
Partner Churned Count: 1
details actor lacp pdu:
system priority: 65535
system mac address: a0:36:bc:62:be:e0
port key: 0
port priority: 255
port number: 2
port state: 71
details partner lacp pdu:
system priority: 65535
system mac address: 00:00:00:00:00:00
oper key: 1
port priority: 255
port number: 1
port state: 1
mbrossart@ffw:/tmp/home/root#