Skip to content
Jul 27 / Greg

Automated Network Troubleshooting With Ansible Tower And Zabbix


In the role of a network engineer one of the most common tickets/requests we receive is “I can’t reach this thing.” Well that or “the internet is down.” It’s at this point that I ask for the destination IP, then I start the standard pings/traceroutes…but what if I could do all of this before the user ever contacts me? That’s the plan; a device goes down and the monitoring system initiates troubleshooting through Ansible. I actually went further and have it put my support line on DND to supply a blanket message to customers, and create a ServiceNow incident!

Demo Video Of System In Action

I’ve recently been experimenting with Zabbix, which is an opensource monitoring tool(I think it’s made over in Latvia too 🙂 I’ve been looking at it more since there is a plethora of Ansible modules that can make administration of the system much easier. It’s also possible to use the Zabbix servers as your Ansible dynamic inventory(see the .py script and ini file here), which I need to do a little more testing with to get just the way I want.

Configure Zabbix

First things first; configure a Host in Zabbix to be monitored however you like. I did just a simple ICMP ping test to 8.8.8.8 following these instructions. While that tutorial gets you most of the way there, here’s a reminder for myself on some specific configs:

The Key should be exactly “icmpping” without all of the extra options. Then be sure to setup your update interval, change the show value to service state and choose the new application you created.

I then create a trigger on this same host using the ping item just created. This trigger is activated when the ping fails to 8.8.8.8:

I’ll now create an action that will be fired once the trigger is activated:

The trigger is set as the ping down trigger I specified above:

Last the operation is set to run a remote command via the zabbix server(*Don’t forget to change “Execute on” to “Zabbix server”*):

The script being run is this:

1
curl -X POST --user zabbix-tower:zabbix -H 'Content-Type: application/json' -d '{"extra_vars": "{\"destIP\": \"{HOST.IP}\"}"}' https://192.168.51.6/api/v2/job_templates/27/launch/ -k -s

The curl command is using user zabbix-tower with password zabbix. It is launching job 27 and passing via extra_vars the variable destIP as the zabbix monitored host IP(in this case 8.8.8.8).

The tower template is this one on my git hub. This playbook connects to a couple of my mikrotik probes and does a ping and traceroute to the supplied destIP from each probe. After it gathers that information it will then email or slack the results to you.

Configure Ansible Tower

One of the brilliant things about Tower is its ability to create custom credentials and pass that info over to playbooks. Here I’ll show how to create custom creds for gmail used in the playbook(I also have custom creds created for slack, but I’ll let you try this one on your own based on my example).
First under admin choose credential type then hit the green plus to add new:

Next fill out the credential type to meet your needs:

The “input configuration” portion is what you collect from the user. Here’s my configuration in YAML:

1
2
3
4
5
6
7
8
9
10
11
fields:
  - id: supp_email_user
    type: string
    label: Email Username
  - id: supp_email_pword
    type: string
    label: Email Password
    secret: true
required:
  - supp_email_user
  - supp_email_pword

In the above notice the following fields:
– id: This is what the variable will be temporarily stored as during collection.
– type: This defines what the input type will be. For username/pass/tokens string is perfect.
– label: This is what is the prompt the user sees when entering information.
– secret: When set to true, this tells Tower to encrypt this value and also hide it in log output. Good for passwords or tokens.
The “required” section indicates these aren’t optional fields.

The “injector configuration” section is how you send this information over to your playbooks:

1
2
3
extra_vars:
  email_pword: '{{ supp_email_pword }}'
  email_user: '{{ supp_email_user }}'

In my case I’ll be sending them as extra_vars. In my playbook the variables “email_pword” and “email_user” are used to set email creds, which is why that’s what is being set via the supplied information.

Now that I’ve got the custom credential type created, I’ll add the custom cred:

I’m next going to tell Tower to use my git repo:

I’ll then create a job template for my troubleshooting probe playbook:

Notice that in extra_vars I supplied it with method of slack. This tells it to use slack rather than email. Also keep in mind that I told extra_vars to “prompt on launch”. This is required for the API call to pass in additional extra_vars.
Also note that I added my credentials for both my slack token as well as my email addresses. These are securely passed to the playbook at runtime!

Extras

In the demo video I put one of my Yealink phones into DND to enable a call handler. It’s all based off of this blog post where I demo using Yealink’s API. I use this playbook on my github to do the work.

What the sequence is up to this point:
1. Zabbix is setup to ping 8.8.8.8.
2. When the ping item returns 0(failed state), the trigger is activated.
3. An action is fired when the trigger is activated.
4. The action calls the Ansible API to run a job passing the IP being monitored.
5. Tower takes the IP supplied and pings/traceroutes to that destination from multiple probes.
6. Tower will then take this aggregated information and send it to you either via slack or email.

Slack Output

Once the tshooting completes, this is what message is supplied to slack:

I’ve now updated the playbook so that it creates an incident in ServiceNow and adds the tshooting as comments:

The idea behind all this is that your monitoring system isn’t just telling you there’s an issue, it’s actively performing troubleshooting for you. It can do more than just that; imagine it also takes this information and opens an ticket. Imagine that it also notifies the customer that there’s an issue identified and that you are actively working on it.

By the time the issue notification arrives, there will also be some advanced troubleshooting already performed for you and waiting.

Let me know how you envision using something like this!

Jul 26 / thebrotherswisp

Greg Talks 15 – Technical Sales, Vulnerability In Selling, Selling Techniques

Greg talks to Tony Owens a lifelong technical sales leader.

This week we talk about:
Technical sales
Consultitave selling
Learning to be a manager
Learning to be a leader
Age = respect
Pattern interrupt
Showing vulnerability in your job
People will rescue you, they want to save you
Upfront contracts
3 seconds of guts
Sandler training – seemed like you really liked it.
Sports and your relationship with them.
Sports is the only real thing on TV
Why do you drink so much water?

Join the patron only slack at http://patreon.com/thebrotherswisp

Here’s the video:(if you don’t see it, hit refresh)

Jul 19 / thebrotherswisp

The Brothers WISP 115 – WISP After Death, Unifi Video EOL, RPKI Basics



This week we have Greg, Mike Dave, and Wilson…all the familiar faces coming back!
**Sponsors**
Sonar.software
Cambium ePMP Bundle
**/Sponsors**

This week we talk about:
WISP Virtual Summit July 28th
Save Dave’s brain
RIP Ubiquiti Unifi Video – EOL 1/1/2021
zwift.com
Cloudflare DNS outage
David – Arduino, PHP programming, cycling and weight loss, new kid,
Wilson’s RPKI
Mike got some new hardware; a stent!
I’m done with my sales training – I’m a real boy.
WLED esp8266 library

Here’s the video:(if you don’t see it, hit refresh)

Jul 11 / Greg

Ansible Terminal Expansion With Mikrotik

Mikrotik routers are, I’m finding, well suited to be used with Ansible as infrastructure as code.

I was recently working on a project where I was pulling “/ip firewall nat print without-paging terse”, but the returned output kept adding in \n (carriage returns) on the 81st position…*sigh*.

1
2
"stdout": [
            "0    comment=ReverseNAT chain=srcnat action=src-nat to-addresses=2.1.\n25.64 src-address=1.1.1.1 \n 1    comment=Mail_Reverse_NAT chain=srcnat action=src-nat to-addresses=1.1.1\n25.64 src-address=1.1.2.25 \n 2 X  comment=VPN_Traffic chain=srcnat action=masquerade src-address=1.1.9.0/24 \ndst-address=1.1.2.0/24 \n 3    comment=VPN_Traffic chain=srcnat action=masquerade src-address=10.1.9.0/24 \n\n 4

It turns out that when connected via ssh, Mikrotik assumes a smaller window size on the terminal. The trick here is to edit the username used to connect with a special set of instructions:

1
ansible_user=Tacos+cet512w

+cet512w tells ansible the default terminal width is equal to 512 cols and enables “dumb” terminal mode. After this, all is right with the world 🙂

It took me about 2 hours to suss this out, then when I presented it to Jimmy he said “Oh yeah, that’s why I’ve got “+cet512w” in the user name, so really he gave me the fix. Another lesson hard earned hehehe.

If you are using ansible with Mikrotik and the routeros module is inserting carriage returns, give this a go. Oh, it also helps to have an Ansible Ninja on your team when you need a little help 😉

Jul 5 / thebrotherswisp

The Brothers WISP 114 – T-Mobile Outage, Hypervisor Routing, Edge/Core Flexibility



This week we have Greg, Mike have a 4th of July blowout LOL
**Sponsors**
Sonar.software
Cambium ePMP Bundle
**/Sponsors**

This week we talk about:
FS Box
T-mobile voice outage.
Pedro figured out to have STP BPDUs filtered he had to STP on the switch LOL
Greg wrote an Ansible playbook to backup a router based Dude install.
Greg’s ansible role to backup network devices to git.
Zach made some playbooks that pull backup files to a folder and do diffs.
Hypervisor Comparison
More Hypervisor Stuff
FB LINX
FB Datacenter
Mikrotik RPKI
Pedro found out you shouldn’t delete link local addresses on your BGP peers
What do you do with two links when one doesn’t have capacity to carry load during failure?
WISP Virtual Summit July 28th
How to design the edge/core for maximum flexibility with Mikrotik…should I do X, Y, or Z.

Here’s the video:(if you don’t see it, hit refresh)

Jun 22 / thebrotherswisp

Greg Talks 14 – Nick Arellano – CI/CD, Software Dev, GIT

Greg talks to Nick Arellano, a consultant and software developer.

This week we talk about:
How a consultant looks at your network
CI/CD
Software dev
Some anxiety I think partially feeling trapped; little access to other humans.
ostriches are devil spawn
GIT

Join the patron only slack at http://patreon.com/thebrotherswisp

Here’s the video:(if you don’t see it, hit refresh)

Jun 18 / Greg

Install An SSL Cert For Ansible Tower Using LetsEncrypt

This can be done in the span of about 5 minutes(it’s almost tooooo easy).

First, ensure that you have public access to TCP ports 80/443 to your tower server(it’s likely you’ve already done that, though).

Tower auto installs and uses nginx as its webserver. Step one is to tell nginx what your FQDN is for this server(make sure you’ve already created a valid/working DNS entry for this):
Edit the nginx config file at: /etc/nginx/nginx.conf
This is the section of the config prior to manipulation:

1
2
3
4
5
6
# If you have a domain name, this is where to add it
server_name _;
keepalive_timeout 65;
 
ssl_certificate /etc/tower/tower.cert;
ssl_certificate_key /etc/tower/tower.key;

This is my config with the server name configured:

1
2
3
4
5
6
# If you have a domain name, this is where to add it
server_name towerofpower.gregsowell.com;
keepalive_timeout 65;
 
ssl_certificate /etc/tower/tower.cert;
ssl_certificate_key /etc/tower/tower.key;

Now restart the nginx server:

1
systemctl reload nginx.service

Now download the LetsEncrypt certbot auto installer and set it to executable:

1
2
wget -P /usr/local/bin https://dl.eff.org/certbot-auto
chmod +x /usr/local/bin/certbot-auto

Now run the certbot installer:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
certbot-auto
 
Bootstrapping dependencies for RedHat-based OSes that will use Python3... (you can skip this with --no-bootstrap)
dnf is /usr/bin/dnf
dnf is hashed (/usr/bin/dnf)
Last metadata expiration check: 2:31:18 ago on Thu 18 Jun 2020 08:35:47 AM CDT.
Package openssl-1:1.1.1c-15.el8.x86_64 is already installed.
Package ca-certificates-2019.2.32-80.0.el8_1.noarch is already installed.
Package python36-3.6.8-2.module_el8.1.0+245+c39af44f.x86_64 is already installed.
Dependencies resolved.
=================================================================================================================================================
 Package                              Architecture         Version                                                 Repository               Size
=================================================================================================================================================
Installing:
 augeas-libs                          x86_64               1.12.0-5.el8                                            BaseOS                  436 k
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Transaction Summary
=================================================================================================================================================
Install  44 Packages
 
Total download size: 52 M
Installed size: 135 M
Is this ok [y/N]: y
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Which names would you like to activate HTTPS for?
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
1: towerofpower.gregsowell.com
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Select the appropriate numbers separated by commas and/or spaces, or leave input
blank to select all options shown (Enter 'c' to cancel): 1
Obtaining a new certificate
Performing the following challenges:
http-01 challenge for towerofpower.gregsowell.com
Waiting for verification...
Cleaning up challenges
Deploying Certificate to VirtualHost /etc/nginx/nginx.conf
Redirecting all traffic on port 80 to ssl in /etc/nginx/nginx.conf

So when you run the installer you are prompted to pull down required packages, to which I said yes. It will then find your nginx config and locate the server name that was specified. After that I chose option 1 and let it rip.
It then creates the certs and modifies the nginx config with the new certs.

Here’s the nginx config after the above command:

1
2
3
4
5
        # If you have a domain name, this is where to add it
        server_name towerofpower.gregsowell.com;
        keepalive_timeout 65;
    ssl_certificate /etc/letsencrypt/live/towerofpower.gregsowell.com/fullchain.pem; # managed by Certbot
    ssl_certificate_key /etc/letsencrypt/live/towerofpower.gregsowell.com/privkey.pem; # managed by Certbot

Now restart the nginx server:

1
systemctl reload nginx.service

After that you should be able to browse to your tower install with a valid cert!

Good luck and happy automating.