Skip to content
May 10 / Greg

Ansible Config As Code With Network Backups To Git And Point-In-Time Rollback


The ask for me to create this was “We want to see network configs backed up to git.” That’s easy, because I’ve done that in the past…but it was done before Execution Environments(EEs), so I need to figure that out. “Oh yeah, and if the configuration changes go sideways we want to be able to rollback easily from the backup git repo.” That required a little research and noodling. Fortunately I posed the question to the team of folks I work with and Adam Mack said maybe I could do something with git tagging; which turned out to be exactly what made it all work. It’s good to know people 🙂

This demo also does config as code. This means that the configuration for my network kit is stored in my git repository in standard CLI format. When I invoke ansible to configure the equipment it will first do a backup, then it will clone the configuration repo and update the devices. If, for any reason, I want to rollback the configuration on all or any of the devices in that change window, all I have to do is specify a backup tag, and the automation will grab the correct snapshot from git and perform the restore.

Video Demo

Playbooks – Backup To Git

I’ll start with playbooks and then move to how I configured my Ansible Automation Platform(AAP) Controller server. The full flow will make sense when I put it all together.
The playbooks are broken into two repos depending on function. I’ll cover just the backup repo here, but I’ll supply the CAC repo anyway:
Backup to git playbooks are here
Configuration As Code(CAC) playbooks are here

I’ll start with the backup playbook(network_backup_git_playbook.yml):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
---
- name: network device backup to git
  hosts: nexus9k3
  gather_facts: false
  vars:
    backup_with_tags: false
    backup_dir: "{{ playbook_dir }}/net_backups"
    backup_file: "{{ backup_dir }}/{{ inventory_hostname }}"
    backup_repo: [email protected]:gregsowell/backups
    git_name: Git Backup
    git_email: [email protected]
 
  tasks:
  - import_role:
      name: network_backup_git

This is a very simple playbook. Really I have a handful of variables setup.
backup_with_tags: Do I want to create a tag in the repo(point in time snapshot)
backup_dir: Where to stick the backup files
backup_file: What to name the backup files I get from the network devices
backup_repo: Name of my backup repo
git_name: Commit name
git_email: Commit email

Next I’m calling my role “network_backup_git” that can be found in the same directory.
Here’s a shot of my role task folder:

A role always starts with the main.yml file, but in mine I do some work, then call the other task files to pull the backups from differing vendor’s kit.
Here’s my main.yml:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
---
# this is used for tagging a repo
  - name: get timestamp
    set_fact: time="{{lookup('pipe','date \"+%Y-%m-%d-%H-%M\"')}}"
    run_once: true
 
# this and the following task add a private ssh key to the execution environment so it can connect to git
  - name: create .ssh folder
    become: true
    run_once: true
    delegate_to: localhost
    ansible.builtin.file:
      path: /root/.ssh
      state: directory
      mode: '0777'
 
  - name: create the ssh key file based on the supplied cred
    become: true
    run_once: true
    delegate_to: localhost
    ansible.builtin.copy:
      dest: ~/.ssh/id_rsa
      content: "{{ cert_key }}"
      mode: '0600'
#    no_log: true
 
  - name: create the backup dir
    become: true
    run_once: true
    delegate_to: localhost
    ansible.builtin.file:
      path: "{{ backup_dir }}"
      state: directory
      mode: '0777'
 
  - name: clone the repo
    ansible.builtin.shell: "git config --global core.sshCommand 'ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no'; git clone {{ backup_repo }} ."
    args:
      chdir: "{{ backup_dir }}"
 
  - name: include cisco task when Cisco device
    include_tasks: "{{ role_path }}/tasks/network_backup_cisco.yml"
    when: ansible_network_os == "asa" or ansible_network_os == "ios" or ansible_network_os == "nxos"
 
  - name: include arista task when Arista device
    include_tasks: "{{ role_path }}/tasks/network_backup_arista.yml"
    when: ansible_network_os == "eos"
 
  - name: include junos task when Junos device
    include_tasks: "{{ role_path }}/tasks/network_backup_juniper.yml"
    when: ansible_network_os == "junos"
 
  - name: include routeros task when Mikrotik device
    include_tasks: "{{ role_path }}/tasks/network_backup_mikrotik.yml"
    when: ansible_network_os == "routeros"
 
  - name: Copy the backup to repo
    ansible.builtin.copy:
      src: "{{ bup_temp_file }}"
      dest: "{{ backup_file }}"
    register: copy_result
    delegate_to: localhost
 
  - name: Delete the temp file
    ansible.builtin.file:
      path: "{{ bup_temp_file }}"
      state: absent
    changed_when: False
    delegate_to: localhost
 
  - name: push the repo back with tags
    when: backup_with_tags
    ansible.builtin.shell: "git add *; git commit -m '{{ time }}'; git config --global core.sshCommand 'ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no'; git tag -a {{ time }} -m '{{ time }}'; git push; git push --tags"
    args:
      chdir: "{{ backup_dir }}{{ backup_repo_folder }}"
    environment:
      GIT_COMMITTER_NAME: "{{ git_name | default(omit) }}"
      GIT_COMMITTER_EMAIL: "{{ git_email | default(omit) }}"
      GIT_AUTHOR_NAME: "{{ git_name | default(omit) }}"
      GIT_AUTHOR_EMAIL: "{{ git_email | default(omit) }}"
    delegate_to: localhost
    changed_when: git_return.stderr != "Everything up-to-date"
    run_once: true
    connection: local
    register: git_return
    become: True
 
  - name: push the repo back with no tags
    when: not backup_with_tags
    ansible.builtin.shell: "git add *; git commit -m '{{ time }}'; git config --global core.sshCommand 'ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no'; git push"
    args:
      chdir: "{{ backup_dir }}{{ backup_repo_folder }}"
    environment:
      GIT_COMMITTER_NAME: "{{ git_name | default(omit) }}"
      GIT_COMMITTER_EMAIL: "{{ git_email | default(omit) }}"
      GIT_AUTHOR_NAME: "{{ git_name | default(omit) }}"
      GIT_AUTHOR_EMAIL: "{{ git_email | default(omit) }}"
    delegate_to: localhost
    changed_when: git_return.stderr != "Everything up-to-date"
    run_once: true
    connection: local
    register: git_return
    become: True

There’s a lot going on here, so let me break it down into chunks.

The first section will be setting everything up:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
---
# this is used for tagging a repo
  - name: get timestamp
    set_fact: time="{{lookup('pipe','date \"+%Y-%m-%d-%H-%M\"')}}"
    run_once: true
 
# this and the following task add a private ssh key to the execution environment so it can connect to git
  - name: create .ssh folder
    become: true
    run_once: true
    delegate_to: localhost
    ansible.builtin.file:
      path: /root/.ssh
      state: directory
      mode: '0777'
 
  - name: create the ssh key file based on the supplied cred
    become: true
    run_once: true
    delegate_to: localhost
    ansible.builtin.copy:
      dest: ~/.ssh/id_rsa
      content: "{{ cert_key }}"
      mode: '0600'
#    no_log: true
 
  - name: create the backup dir
    become: true
    run_once: true
    delegate_to: localhost
    ansible.builtin.file:
      path: "{{ backup_dir }}"
      state: directory
      mode: '0777'
 
  - name: clone the repo
    ansible.builtin.shell: "git config --global core.sshCommand 'ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no'; git clone {{ backup_repo }} ."
    args:
      chdir: "{{ backup_dir }}"

First I create a timestamp. I’ll be using this when making the git commits(I use this as the message), but I also will use it if I’m making a git commit with a tag.
The next two tasks create a /root/.ssh folder and install the SSH private key inside of the EE(the container that the automation runs in). This is what allows git to authenticate to read/write to my repository. The private key is actually stored in a custom credential inside of AAP, so that I can securely store it, then inject it into the EE when needed.
Last I’m creating the directory I’ll clone my backups into(and ultimately write the backup files to) and then I git clone the backup repo.

Next I call the task file that is associated with my specific piece of kit. For example if it is a Cisco IOS device it will call the IOS task file. If it is Juniper, it will call the Junos task file.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
  - name: include cisco task when Cisco device
    include_tasks: "{{ role_path }}/tasks/network_backup_cisco.yml"
    when: ansible_network_os == "asa" or ansible_network_os == "ios" or ansible_network_os == "nxos"
 
  - name: include arista task when Arista device
    include_tasks: "{{ role_path }}/tasks/network_backup_arista.yml"
    when: ansible_network_os == "eos"
 
  - name: include junos task when Junos device
    include_tasks: "{{ role_path }}/tasks/network_backup_juniper.yml"
    when: ansible_network_os == "junos"
 
  - name: include routeros task when Mikrotik device
    include_tasks: "{{ role_path }}/tasks/network_backup_mikrotik.yml"
    when: ansible_network_os == "routeros"

The last part of the playbook is where it gets interesting:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
  - name: Copy the backup to repo
    ansible.builtin.copy:
      src: "{{ bup_temp_file }}"
      dest: "{{ backup_file }}"
    register: copy_result
    delegate_to: localhost
 
  - name: Delete the temp file
    ansible.builtin.file:
      path: "{{ bup_temp_file }}"
      state: absent
    changed_when: False
    delegate_to: localhost
 
  - name: push the repo back with tags
    when: backup_with_tags
    ansible.builtin.shell: "git add *; git commit -m '{{ time }}'; git config --global core.sshCommand 'ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no'; git tag -a {{ time }} -m '{{ time }}'; git push; git push --tags"
    args:
      chdir: "{{ backup_dir }}{{ backup_repo_folder }}"
    environment:
      GIT_COMMITTER_NAME: "{{ git_name | default(omit) }}"
      GIT_COMMITTER_EMAIL: "{{ git_email | default(omit) }}"
      GIT_AUTHOR_NAME: "{{ git_name | default(omit) }}"
      GIT_AUTHOR_EMAIL: "{{ git_email | default(omit) }}"
    delegate_to: localhost
    changed_when: git_return.stderr != "Everything up-to-date"
    run_once: true
    connection: local
    register: git_return
    become: True
 
  - name: push the repo back with no tags
    when: not backup_with_tags
    ansible.builtin.shell: "git add *; git commit -m '{{ time }}'; git config --global core.sshCommand 'ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no'; git push"
    args:
      chdir: "{{ backup_dir }}{{ backup_repo_folder }}"
    environment:
      GIT_COMMITTER_NAME: "{{ git_name | default(omit) }}"
      GIT_COMMITTER_EMAIL: "{{ git_email | default(omit) }}"
      GIT_AUTHOR_NAME: "{{ git_name | default(omit) }}"
      GIT_AUTHOR_EMAIL: "{{ git_email | default(omit) }}"
    delegate_to: localhost
    changed_when: git_return.stderr != "Everything up-to-date"
    run_once: true
    connection: local
    register: git_return
    become: True

The first two tasks simply copy the backup files into the newly cloned repo folder, then delete the temporary working file.
The last two tasks are where it does the merge back to the repo.
The first task checks if the “backup_with_tags” boolean is set to true(which means I want to create a point in time backup). If it is, then you can see the long shell command it sends. I will git add everything in the repo and commit it. I then add a git tag and use the timestamp from before as the source of the tag. I did some testing and this can be numbers or letters, so this tag could be what change is about to happen or what project this is associated with…it doesn’t simply have to be numbers. I then do a git push which will put any updated files into the repo as normal and last do a git push with –tags, which will create that point in time tag on the repo. Keep in mind that if I ONLY did a –tag push then the standard repo wouldn’t see the changes, only the tag itself, which is why I do a standard push and a tag push.
The last task does a standard git push…nothing abnormal there.

Playbooks – Configuration As Code

Configuration As Code(CAC) playbooks are here

I won’t cover every playbook in this repo, rather I’ll just cover one or two and show how the role is utilized.
I’ll start by showing the role and what it does because the very first task called in my playbooks invokes the role:

clone-repo/tasks/main.yml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
---
# this and the following task add a private ssh key to the execution environment so it can connect to git
  - name: create .ssh folder
    become: true
    run_once: true
    delegate_to: localhost
    ansible.builtin.file:
      path: /root/.ssh
      state: directory
      mode: '0777'
 
  - name: create the ssh key file based on the supplied cred
    become: true
    run_once: true
    delegate_to: localhost
    ansible.builtin.copy:
      dest: ~/.ssh/id_rsa
      content: "{{ cert_key }}"
      mode: '0600'
#    no_log: true
 
  - name: clone the repo with tags
    when: repo_tag is defined and repo_tag != ""
    become: true
    run_once: true
    delegate_to: localhost
    ansible.builtin.shell: "git config --global core.sshCommand 'ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no'; mkdir /tmp/working; git clone --depth 1 --branch {{ repo_tag }} {{ backup_repo }} /tmp/working; cp /tmp/working/* {{ playbook_dir }}"
    args:
      chdir: "{{ playbook_dir }}"
 
  - name: clone the repo without tags
    when: repo_tag | default("") == "" and config_repo is defined
    become: true
    run_once: true
    delegate_to: localhost
    ansible.builtin.shell: "git config --global core.sshCommand 'ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no'; mkdir /tmp/working; git clone {{ config_repo }} /tmp/working; cp /tmp/working/* {{ playbook_dir }}"
    args:
      chdir: "{{ playbook_dir }}"

Exactly like the backup scripts, the first two tasks are just setting up the SSH keys in the EE for authentication.
The last two tags are where I do work.
The first checks if there is a “repo_tag” supplied, and if it is, it knows it has to clone the tagged entry from said repo. The git clone command responsible is “git clone –depth 1 –branch {{ repo_tag }} {{ backup_repo }} /tmp/working”. This will stick the files in the working folder, then they will be copied to the playbook directory for later processing.
If there is no tag it will clone the config repo and then stick it into the playbook directory.

Notice I have it build so that if there is a tag it will know “I’m doing a rollback so pull from the backup folder.”
If there is no tag it knows “Oh, no tag so I’m pulling from the config repo to perform this configuration change.”
I have a dedicated repo just for backups, then the user specifies the repo they want to use if it is a regular change precedure.

Now I’ll take a look at the NTP configuration playbook:

cac-nexus-ntp.yml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
---
- name: update NTP servers on nexus kit
  hosts: nexus9k3
  gather_facts: false
  vars:
    save_now: false #reset this here or via extravars if this should be saved
  tasks:
  - name: clone repo with config files
    ansible.builtin.include_role: 
      name: clone-repo
 
  - name: create the parse file variable manually
    ansible.builtin.set_fact:
      parse_file: "{{ inventory_hostname }}"
 
  ### BLOCK START ###
  - name: block if a parse file is specified
    when: parse_file is defined
    block: 
    - name: Parse config
      cisco.nxos.nxos_ntp_global:
        running_config: "{{ lookup('file', parse_file) }}"
        state: parsed
      register: parsed
 
    - name: Replace NTP config with the provided.  Not merge, but replace
      cisco.nxos.nxos_ntp_global:
        config: "{{ parsed.parsed }}"
        state: replaced
      register: result
 
  ### BLOCK STOP ###
 
  ### BLOCK START ###
  - name: block if a parse file is not present
    when: parse_file is not defined
    block: 
    - name: Replace NTP config with the provided.  Not merge, but replace
      cisco.nxos.nxos_ntp_global:
        config:
          logging: True
          servers: "{{ ntp_servers }}"
        state: replaced
      register: result
 
  ### BLOCK STOP ###
 
  - name: save when required
    when: save_now == true and result.changed == true
    cisco.nxos.nxos_config:
      save_when: always

This playbook takes the standard “show run” style configuration file and parses it with the NTP resource module. I LOVE how resource modules are increasingly supporting parse states that will take a standard CLI config and grab just the parts it needs and putting them into a usable data model. In my case it is grabbing the config file that it just cloned from the repo(either backup of config change).
After it parses the config it will then use that information to use the same NTP module to apply the changes to the device.
The very last task will save the config change to the device if the save_now flag is set to true.

I’ll give one more example playbook:
Now I’ll take a look at the NTP configuration playbook:

cac-nexus-dns.yml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
---
- name: update dns servers on nexus kit
  hosts: nexus9k3
  gather_facts: false
  vars:
    save_now: false #reset this here or via extravars if this should be saved
  tasks:
  - name: clone repo with config files
    ansible.builtin.include_role: 
      name: clone-repo
 
  - name: parse config file for name servers
    ansible.builtin.set_fact:
      config_line: "{{ lookup('ansible.builtin.file', inventory_hostname) | regex_search('^.*ip name-server.*$', multiline=True) }}"
 
  - name: parse config file for name servers
    ansible.builtin.set_fact:
      dns_servers: "{{ lookup('ansible.builtin.template', 'cac-extract-ip.j2') | trim}}"
 
  - name: configure name servers
    cisco.nxos.nxos_system:
      name_servers: "{{ dns_servers }}"
    register: result
 
  - name: save when required
    when: save_now == true and result.changed == true
    cisco.nxos.nxos_config:
      save_when: always

This module(nxos_system) doesn’t have a parse option, so to make it all work I have to do some kung-fu.
First I create a variable named config_line by doing a regex search through the file to find the name-server entry…nothing too bad there.
Now I need to parse out the IP addresses, so I kindly asked ChatGPT to write me a jinja2 template that would do it for me:

1
2
3
4
5
6
7
8
9
{% set ip_list = [] %}
{% for word in config_line.split() %}
  {% if not word.startswith('ip') %}
    {% if word | ipaddr %}
      {% set _ = ip_list.append(word) %}
    {% endif %}
  {% endif %}
{% endfor %}
{{ ip_list }}

Taking a look at the template it, in short, splits the line into pieces and looks for IP addresses. When it finds them it adds them to the ip_list variable and passes it back.
Unfortunately it passes a lot of funky white spaces and such, so in the task where I call it I pipe the output to the trim plugin and it smartly cleans it all up.
I then hand the newly parsed dns_servers variable over to the nxos_system module for application of the configuration.

AAP Configuration

I had to create a custom credential to store and supply my SSH keys. I named it “SSH Certificate Key”.
It’s Input Configuration:

1
2
3
4
5
6
7
8
fields:
  - id: supp_cert_key
    type: string
    label: Certificate Key
    secret: true
    multiline: true
required:
  - supp_cert_key

It’s Injector Configuration:

1
2
extra_vars:
  cert_key: '{{ supp_cert_key }}'

This provides an interface to easily add my private key:

I have several job templates and a couple of workflow templates to make it all happen:

In essence I have a job template for each component. So one for VLAN, ACL, NTP, DNS, Etc. By keeping these as discrete pieces I can reuse them in other workflows to easily build different configs!

My standard config workflow looks as follows:

In here I start by doing a backup with tags. That way I have a steady state point-in-time backup before I make my changes.
The workflow then in parallel will configure each portion based on the supplied configuration file.

If, for some reason, I need to roll back the changes I run the same hosts against the same config tasks, but tell my automation to do it from the backup tag:

The difference between the two workflows is there is no backup process on the this rollback workflow, but other than that, it’s the same.

Conclusion

This was a little finicky to figure out the first time round, but now that I’ve got the pieces together it’s actually quite reliable. In fact, I read/write from repos for configs a LOT, and this has become an invaluable tool.

If you would change this to fit your needs, what would that look like? I appreciate all questions and comments.

Thanks and happy automating!

Leave a Comment

 

*