Skip to content
Nov 29 / Greg

Mikrotik Layer3 Gateway Load Balancing

This is a topic I’m increasingly receiving questions on, so it seems time for a quick post about it 🙂 *EDIT – turns out it is not such a quick post :P*

I say Layer 3 in the heading because I’m talking about load balacing at the network layer, not at layer 2 (bonding).

The traditional method was to use ECMP (Equal Cost Multiple Path) routing. ECMP is adding a route and then adding multiple gateways for that route. This can be done either statically or dynamically via routing protocols.

ECMP would do a hash based on source/destination IP and send traffic out one gateway or another. With ECMP you can add multiple gateways as well as add a single gateway multiple times to give it preference in the selection process. This would generally work really well until about ROS V3.11.

After ROS v3.10 the linux kernal was updated. This update caused the connection table to be rebuilt every 10 minutes. When this occured, the hash algorithm that ran earlier to determine the traffic’s gateway might actually change results! This means that after 10 minutes your traffic might just use a different gateway. What are the possible repercussions? If you are doing standard routing and are not doing any natting in the mikrotik, then you are fine. The routing should be able to shift with no ill effects. However, if you are natting to multiple DSL or cable modem connections the results will be dramatic. If you have a TCP session established, then all of a sudden the traffic appears to be source from a new public IP, the connection will be severed. Now imagine this happening every ten minutes! You can read more about it in this forum post.

After this change happened, you can imagine there were quite a few upset people. There were some work arounds that arose to bridge the gap until a real solution was created. Mikrotik doesn’t alter the linux kernal, so adding an option to disable the feature wasn’t going to happen. So, they came up with Per Connection Classifier.

PCC is a mechanism they built in to do hashing for gateway balancing…or really balancing of any kind. PCC is used in conjunction with mangle rules to hash traffic into different groups that can then be applied with a connection mark. Once you have a connection mark on the traffic, you can do just about anything with it, including setting special route marks on it.

PCC Options

1
2
3
4
5
6
per-connection-classifier=
PerConnectionClassifier ::= [!]ValuesToHash:Denominator/Remainder
  Remainder ::= 0..4294967295    (integer number)
  Denominator ::= 1..4294967295    (integer number)
  ValuesToHash ::= both-addresses|both-ports|dst-address-and-port|
  src-address|src-port|both-addresses-and-ports|dst-address|dst-port|src-address-and-port

Step one is to figure out what values we want our hash to use…we have quite a collection of options. The above linked wiki article does a good job on explaining what each option entails. My default option will be both-addresses. I want all traffic moving from one host to another to always follow the same path, devoid of port or protocol.

Then we will choose our denominator. The denominator will generally be the number of gateways you have. So if you have 3 DSL modems, the denominator will be 3. An exception to this may be if you want to weight a specific gateway more heavily than the others. Say for example two of your modems have 4Mb rate limits, while the third has a 10Mb rate limit. You can set the denominator to be 4 and thus mark the 10Mb router twice while the other guys only get marked once.

The remainder is the outcome of the hash. It will return 0 through denominator -1. As in if we have a denominator of 3 it will return 0,1 or 2. If the denominator is 4 it will return 0,1,2 or 3.

Lets put it all together…assuming we have 3 gateways.

1
2
3
4
5
6
/ip firewall mangle add chain=prerouting action=mark-connection \
 new-connection-mark=CM-GW1 per-connection-classifier=src-address-and-port:3/0
/ip firewall mangle add chain=prerouting action=mark-connection \
  new-connection-mark=CM-GW2 per-connection-classifier=src-address-and-port:3/1
/ip firewall mangle add chain=prerouting action=mark-connection \
  new-connection-mark=CM-GW3 per-connection-classifier=src-address-and-port:3/2

Same thing, only we want to weight GW3 higher.

1
2
3
4
5
6
7
8
/ip firewall mangle add chain=prerouting action=mark-connection \
 new-connection-mark=CM-GW1 per-connection-classifier=src-address-and-port:4/0
/ip firewall mangle add chain=prerouting action=mark-connection \
  new-connection-mark=CM-GW2 per-connection-classifier=src-address-and-port:4/1
/ip firewall mangle add chain=prerouting action=mark-connection \
  new-connection-mark=CM-GW3 per-connection-classifier=src-address-and-port:4/2
/ip firewall mangle add chain=prerouting action=mark-connection \
  new-connection-mark=CM-GW3 per-connection-classifier=src-address-and-port:4/3

After we mark the connections, we need to add the mangle rules for route marks, and add the routes.

Here’s a complete example.

IP addressing:

1
2
3
4
5
6
7
8
9
/ip address
add address=192.168.1.1/24 broadcast=192.168.1.255 comment="" disabled=no \
    interface=ether1 network=192.168.1.0
add address=192.168.2.1/24 broadcast=192.168.2.255 comment="" disabled=no \
    interface=ether2 network=192.168.2.0
add address=192.168.0.1/24 broadcast=192.168.0.255 comment="" disabled=no \
    interface=ether4 network=192.168.0.0
add address=192.168.3.1/24 broadcast=192.168.3.255 comment="" disabled=no \
    interface=ether3 network=192.168.3.0

In the above you see that we just added the IP addresses to our interfaces…simple enough.

Masquerade rules

1
2
3
4
5
6
7
/ip firewall nat
add action=masquerade chain=srcnat comment="Masq for GW1" disabled=no \
    out-interface=ether1
add action=masquerade chain=srcnat comment="Masq for GW2" disabled=no \
    out-interface=ether2
add action=masquerade chain=srcnat comment="Masq for GW3" disabled=no \
    out-interface=ether3

Since these are just standard DSL modems, we are doing PAT on these guys, AKA masquerade.

Static routes:

1
2
3
4
5
6
7
8
9
10
11
12
13
/ip route
add check-gateway=ping comment="" disabled=no distance=1 dst-address=\
    0.0.0.0/0 gateway=192.168.1.2 routing-mark=GW1 scope=30 target-scope=10
add comment="" disabled=no distance=10 dst-address=0.0.0.0/0 gateway=\
    192.168.2.2 routing-mark=GW1 scope=30 target-scope=10
add check-gateway=ping comment="" disabled=no distance=1 dst-address=\
    0.0.0.0/0 gateway=192.168.2.2 routing-mark=GW2 scope=30 target-scope=10
add comment="" disabled=no distance=10 dst-address=0.0.0.0/0 gateway=\
    192.168.3.2 routing-mark=GW2 scope=30 target-scope=10
add check-gateway=ping comment="" disabled=no distance=1 dst-address=\
    0.0.0.0/0 gateway=192.168.3.2 routing-mark=GW3 scope=30 target-scope=10
add comment="" disabled=no distance=10 dst-address=0.0.0.0/0 gateway=\
    192.168.1.2 routing-mark=GW3 scope=30 target-scope=10

Here you see that we added our static routes. Notice that we added a routing-mark to all of them. What this essentially does is to create multiple routing tables. We will later tell specific portions of traffic which specific routing table to use based on this routing-mark. You will also notice that in every routing table, there are two routes. First there is the standard gateway specified, but there is also a backup route that will pick the next sequential gateway should the main gateway fail. This is generally referred to a floating static route. It will technically be in the route table, but since I set the distance to 10 on the backup route it is not used unless the main route is considered invalid. On the main route I’ve got check gateway enabled to make sure the main gateway is up, and if it fails, the backup route will be used. You can better see this in the below picture:

Mangle that does our user traffic PCC connection mark

1
2
3
4
5
6
7
8
9
10
/ip firewall mangle
add action=mark-connection chain=prerouting comment="CM for GW1" disabled=no \
    in-interface=ether4 new-connection-mark=GW1 passthrough=yes \
    per-connection-classifier=both-addresses:3/0
add action=mark-connection chain=prerouting comment="CM for GW2" disabled=no \
    in-interface=ether4 new-connection-mark=GW2 passthrough=yes \
    per-connection-classifier=both-addresses:3/1
add action=mark-connection chain=prerouting comment="CM for GW3" disabled=no \
    in-interface=ether4 new-connection-mark=GW3 passthrough=yes \
    per-connection-classifier=both-addresses:3/2

This is the meat of our article. Here you see we are doing our PCC connection marking for the user traffic. This basically runs the PCC algorithm and divides the traffic into three pieces. It will mark the connections GW1, GW2 or GW3. We will later mark routing based on these values.

Mangle that does our output chain PCC connection mark

1
2
3
4
5
6
7
8
9
10
/ip firewall mangle
add action=mark-connection chain=output comment="CM for GW1 - output" \
    connection-mark=no-mark disabled=no new-connection-mark=GW1 passthrough=\
    yes per-connection-classifier=both-addresses:3/0
add action=mark-connection chain=output comment="CM for GW2 - output" \
    connection-mark=no-mark disabled=no new-connection-mark=GW2 passthrough=\
    yes per-connection-classifier=both-addresses:3/1
add action=mark-connection chain=output comment="CM for GW3 - output" \
    connection-mark=no-mark disabled=no new-connection-mark=GW3 passthrough=\
    yes per-connection-classifier=both-addresses:3/2

We are also running the PCC algorithm for the traffic leaving the router. This isn’t necessary, unless you are say running hotspot and proxying all traffic through the router. In the hotspot case all traffic will be sourced from the router and thus will use the output/input chain.

Mangle that does our input connection mark

1
2
3
4
5
6
7
8
9
10
/ip firewall mangle
add action=mark-connection chain=input comment="CM input GW1" \
    connection-mark=no-mark disabled=no in-interface=ether1 \
    new-connection-mark=GW1 passthrough=yes
add action=mark-connection chain=input comment="CM input GW2" \
    connection-mark=no-mark disabled=no in-interface=ether2 \
    new-connection-mark=GW2 passthrough=yes
add action=mark-connection chain=input comment="CM input GW3" \
    connection-mark=no-mark disabled=no in-interface=ether3 \
    new-connection-mark=GW3 passthrough=yes

This basically marks traffic entering a specific GW as that gateway’s. This will ensure that internet sourced traffic will return via the proper interface.

Mangle that does our actual route marking

1
2
3
4
5
6
7
8
9
add action=mark-routing chain=prerouting comment="RM for GW1" \
    connection-mark=GW1 disabled=no in-interface=ether4 new-routing-mark=GW1 \
    passthrough=yes
add action=mark-routing chain=prerouting comment="RM for GW2" \
    connection-mark=GW2 disabled=no in-interface=ether4 new-routing-mark=GW2 \
    passthrough=yes
add action=mark-routing chain=prerouting comment="RM for GW3" \
    connection-mark=GW3 disabled=no in-interface=ether4 new-routing-mark=GW3 \
    passthrough=yes

Now that we have all of our traffic, user and router sourced, connection marked, it’s time to apply our routing marks. This basically says that if we have a connection mark of GWX then route the traffic via the GWX route table.

This completes our load-balancing example.

QoS with load balancing

I would now like to discuss doing QoS based on these configurations. You guys can stop reading if you don’t care about any of that QoS junk.

If you ask me one of the truly core components of writing a QoS policy is using connection marking in mangles. This allows you to identify traffic going in one direction and all subsequent traffic associated with that connection moving in both directions is marked. You can then do packet marks based off of these connection marks. I love it because it makes life easier, but from what I understand it really drops the CPU load on marking traffic…a win win 😉 My real issue comes into play when you realize that if you are using PCC, you can no longer use any other connection marks.

You can only have a single connection mark on traffic. If you connection mark a subset of traffic and then subsequently apply a new connection mark, the original is overwritten. In our PCC example, if I was to packet mark GW1, then route-mark the traffic, everything will be working fine. You mark PCC in the prerouting chain for obvious reasons…it happens BEFORE the routing decision…hence the pre. If I remark the connection in the prerouting chain, then the routing marks are no longer applied and the traffic hits the floor because it isn’t making it into the proper route table any longer. “But why not just use connection mark in the forward chain?” Oh you guys are so clever! However, if you try and remark the connection, even in the forward chain it will completely overwrite from that point forward. Connection mark mangles only check the first packet in the connection, so subsequent packets are no longer checked!

As for QoS, you will also need a set of queues (in and out) for EACH gateway. Each gateway will have it’s own rate-limit and it will also have varying levels of congestion, so each requires its own set of queues.

So, lets add this all up: I can’t use connection marks anymore + I have to create a set of queues for each gateway will all of my precedences in each one + I have mangles that are virtual duplicates of each other just to filter traffic levels for the varying gateways/queues per gateway = Angry Greg.

What’s my ultimate solution to the issue? In my mind, the best option in reference to our configuration example is one of two things.
1. Add an additional router for each GW with standard QoS applied to each one using my beloved connection marking. Then add a single router that all of these GW routers connect to…lets call it a backbone router. This BB router will run the PCC load balancing and not worry about QoS. It’s a win win…other than the fact that you have to buy several routers…hehe.
2. Since we are going to be doing some hairy stuff on the routers with heavy QoS, we need some equipment that has good processing power or “needs more pow-a”. If there was a real processing giant in the routerboard line we could use meta router and just run the GW routers virtually and let the physical box be the BB. Alas, there is no RB with high power, so we will use an X86. I’m not a big advocate of the off the Mikrotik branded X86 boxes that are on the market right now, so I would suggest building your own. X86 has “KVM”, which is a virtualization mechanism designed for X86. The only catch is that you have to have virtualization support in the CPU. This has gotten far too long of a rant. In essence, we do it with virtual GW routers and use the core as the BB. Again, this is a win win, only in this case we require a single box instead of multiple routers.

I know this is a somewhat complex topic to explain, but I hope you guys made it out alive. Any thoughts or suggestions, please leave me some feedback. Despite my fancified airs I put on, I still really like to hear what you guys think.

14 Comments

leave a comment
  1. Omegatron / Nov 29 2010

    Your timing is impeccable as usual 😉

    Great explanation of how the system works along with a simple example config.

    One thing I’d like to add, I personally prefer to use the “both-addresses” option in PCC rather than “src-address-and-port” given that some sites (and game servers) may expect if you connect to them from one address and port, a connection on another port should come from same address.

    Since I did my own PCC post a couple of months ago I’ve actually been working on a way to combine PCC + a QoS setup, will keep you posted 😉 The simplest way thou is as you stated above; just have multiple routers.. costs a bit more but is much easier for someone unfamiliar with the setup to troubleshoot.

  2. J.J. Boyd / Nov 29 2010

    Great Read Greg!

  3. Greg / Nov 29 2010

    @Andrew

    I use “both-addresses” as my default for the very same reason. 🙂 It really aids in troubleshooting also…always uses the same gateway.

    I know it’s not actually that bad to configure just using packet marks, but I have my own *already completed* QoS policy that uses connection marks…hehehe. More a disappointment that I can’t use the power of the feature. 😛

  4. Greg / Nov 29 2010

    @JJ

    Congrats on making it through these gauntlet…you are now a man.

  5. Omegatron / Nov 29 2010

    @Greg

    I used to use a similar packet + connection marking however I don’t really notice much difference in CPU usage marking one way vs the other. That said, I’m using x86 boxes as the main routers on most sites now.

  6. Greg / Nov 30 2010

    @Andrew,

    +1 to that…I’m doing just about everything with X86 these days. It seems like MTK would take a hint and start selling their own X86…I know I require the power. Not want the power, but require it.

  7. Rob / Nov 30 2010

    hmm….this subject seems awfully familiar! 😀 Good read ol boy and I’m sure I’ll be using some of this in the near future.

  8. Omegatron / Nov 30 2010

    @Greg

    Regarding suitable x86 mikrotik-compatible boxes

    Been investigating these recently:
    http://www.cas-well.com/products/ca_1.html
    Need to find a distributer who’s willing to deal locally thou 🙂

    Have also ordered a FitPC2i to test out. It’s a similar size to the RBX50’s but with 2 GigE ports, 2Ghz processor and 2GB of ram.
    http://www.fit-pc.com/web/fit-pc2/fit-pc2i-specifications/

    At the moment we normally go with 1RU IBM or Dell servers with remote access cards, I’d be interested to know what you use if you’re happy to divulge.

  9. Greg / Dec 1 2010

    @Andrew

    Those fit machines are cute, but somewhat pricey. Let me know how they go…pretty cool looking for sure.

    That caswell stuff looks like some other stuff I saw a while back…let me see if I can dig it up. Here it is…accrosser http://www.acrosser.com/products/Rackmount_list_classid_65.html

  10. O! / Dec 1 2010

    Strange 😉
    I’m participating on the MTCRE course this week somewhat similar topics 😉

    Nice article though.

  11. Greg / Dec 1 2010

    @O!

    Thanks 😉

  12. RobertM / Dec 5 2010

    Good read, Greg. Thanks.

  13. m3tr0mini / Dec 15 2010

    @greg
    i’ve done set up load balancing 3 conection and the i’ve implemented Qos too
    i’m not using mark conection but i’m using directly mark packet.

    add action=mark-packet chain=prerouting comment=”CM for GW1″ disabled=no \
    in-interface=ether4 new-packet-mark=GW1 passthrough=yes \
    i’m using RB750G

  14. Greg / Dec 16 2010

    @m3tromini

    Indeed sir. This was more my rant about not being able to use connection marks because I like them so much 😛

Leave a Comment

 

*