Ansible Config As Code With Network Backups To Git And Point-In-Time Rollback
The ask for me to create this was “We want to see network configs backed up to git.” That’s easy, because I’ve done that in the past…but it was done before Execution Environments(EEs), so I need to figure that out. “Oh yeah, and if the configuration changes go sideways we want to be able to rollback easily from the backup git repo.” That required a little research and noodling. Fortunately I posed the question to the team of folks I work with and Adam Mack said maybe I could do something with git tagging; which turned out to be exactly what made it all work. It’s good to know people 🙂
This demo also does config as code. This means that the configuration for my network kit is stored in my git repository in standard CLI format. When I invoke ansible to configure the equipment it will first do a backup, then it will clone the configuration repo and update the devices. If, for any reason, I want to rollback the configuration on all or any of the devices in that change window, all I have to do is specify a backup tag, and the automation will grab the correct snapshot from git and perform the restore.
Video Demo
Playbooks – Backup To Git
I’ll start with playbooks and then move to how I configured my Ansible Automation Platform(AAP) Controller server. The full flow will make sense when I put it all together.
The playbooks are broken into two repos depending on function. I’ll cover just the backup repo here, but I’ll supply the CAC repo anyway:
– Backup to git playbooks are here
– Configuration As Code(CAC) playbooks are here
I’ll start with the backup playbook(network_backup_git_playbook.yml):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | --- - name: network device backup to git hosts: nexus9k3 gather_facts: false vars: backup_with_tags: false backup_dir: "{{ playbook_dir }}/net_backups" backup_file: "{{ backup_dir }}/{{ inventory_hostname }}" backup_repo: [email protected]:gregsowell/backups git_name: Git Backup git_email: [email protected] tasks: - import_role: name: network_backup_git |
This is a very simple playbook. Really I have a handful of variables setup.
backup_with_tags: Do I want to create a tag in the repo(point in time snapshot)
backup_dir: Where to stick the backup files
backup_file: What to name the backup files I get from the network devices
backup_repo: Name of my backup repo
git_name: Commit name
git_email: Commit email
Next I’m calling my role “network_backup_git” that can be found in the same directory.
Here’s a shot of my role task folder:
A role always starts with the main.yml file, but in mine I do some work, then call the other task files to pull the backups from differing vendor’s kit.
Here’s my main.yml:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 | --- # this is used for tagging a repo - name: get timestamp set_fact: time="{{lookup('pipe','date \"+%Y-%m-%d-%H-%M\"')}}" run_once: true # this and the following task add a private ssh key to the execution environment so it can connect to git - name: create .ssh folder become: true run_once: true delegate_to: localhost ansible.builtin.file: path: /root/.ssh state: directory mode: '0777' - name: create the ssh key file based on the supplied cred become: true run_once: true delegate_to: localhost ansible.builtin.copy: dest: ~/.ssh/id_rsa content: "{{ cert_key }}" mode: '0600' # no_log: true - name: create the backup dir become: true run_once: true delegate_to: localhost ansible.builtin.file: path: "{{ backup_dir }}" state: directory mode: '0777' - name: clone the repo ansible.builtin.shell: "git config --global core.sshCommand 'ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no'; git clone {{ backup_repo }} ." args: chdir: "{{ backup_dir }}" - name: include cisco task when Cisco device include_tasks: "{{ role_path }}/tasks/network_backup_cisco.yml" when: ansible_network_os == "asa" or ansible_network_os == "ios" or ansible_network_os == "nxos" - name: include arista task when Arista device include_tasks: "{{ role_path }}/tasks/network_backup_arista.yml" when: ansible_network_os == "eos" - name: include junos task when Junos device include_tasks: "{{ role_path }}/tasks/network_backup_juniper.yml" when: ansible_network_os == "junos" - name: include routeros task when Mikrotik device include_tasks: "{{ role_path }}/tasks/network_backup_mikrotik.yml" when: ansible_network_os == "routeros" - name: Copy the backup to repo ansible.builtin.copy: src: "{{ bup_temp_file }}" dest: "{{ backup_file }}" register: copy_result delegate_to: localhost - name: Delete the temp file ansible.builtin.file: path: "{{ bup_temp_file }}" state: absent changed_when: False delegate_to: localhost - name: push the repo back with tags when: backup_with_tags ansible.builtin.shell: "git add *; git commit -m '{{ time }}'; git config --global core.sshCommand 'ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no'; git tag -a {{ time }} -m '{{ time }}'; git push; git push --tags" args: chdir: "{{ backup_dir }}{{ backup_repo_folder }}" environment: GIT_COMMITTER_NAME: "{{ git_name | default(omit) }}" GIT_COMMITTER_EMAIL: "{{ git_email | default(omit) }}" GIT_AUTHOR_NAME: "{{ git_name | default(omit) }}" GIT_AUTHOR_EMAIL: "{{ git_email | default(omit) }}" delegate_to: localhost changed_when: git_return.stderr != "Everything up-to-date" run_once: true connection: local register: git_return become: True - name: push the repo back with no tags when: not backup_with_tags ansible.builtin.shell: "git add *; git commit -m '{{ time }}'; git config --global core.sshCommand 'ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no'; git push" args: chdir: "{{ backup_dir }}{{ backup_repo_folder }}" environment: GIT_COMMITTER_NAME: "{{ git_name | default(omit) }}" GIT_COMMITTER_EMAIL: "{{ git_email | default(omit) }}" GIT_AUTHOR_NAME: "{{ git_name | default(omit) }}" GIT_AUTHOR_EMAIL: "{{ git_email | default(omit) }}" delegate_to: localhost changed_when: git_return.stderr != "Everything up-to-date" run_once: true connection: local register: git_return become: True |
There’s a lot going on here, so let me break it down into chunks.
The first section will be setting everything up:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | --- # this is used for tagging a repo - name: get timestamp set_fact: time="{{lookup('pipe','date \"+%Y-%m-%d-%H-%M\"')}}" run_once: true # this and the following task add a private ssh key to the execution environment so it can connect to git - name: create .ssh folder become: true run_once: true delegate_to: localhost ansible.builtin.file: path: /root/.ssh state: directory mode: '0777' - name: create the ssh key file based on the supplied cred become: true run_once: true delegate_to: localhost ansible.builtin.copy: dest: ~/.ssh/id_rsa content: "{{ cert_key }}" mode: '0600' # no_log: true - name: create the backup dir become: true run_once: true delegate_to: localhost ansible.builtin.file: path: "{{ backup_dir }}" state: directory mode: '0777' - name: clone the repo ansible.builtin.shell: "git config --global core.sshCommand 'ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no'; git clone {{ backup_repo }} ." args: chdir: "{{ backup_dir }}" |
First I create a timestamp. I’ll be using this when making the git commits(I use this as the message), but I also will use it if I’m making a git commit with a tag.
The next two tasks create a /root/.ssh folder and install the SSH private key inside of the EE(the container that the automation runs in). This is what allows git to authenticate to read/write to my repository. The private key is actually stored in a custom credential inside of AAP, so that I can securely store it, then inject it into the EE when needed.
Last I’m creating the directory I’ll clone my backups into(and ultimately write the backup files to) and then I git clone the backup repo.
Next I call the task file that is associated with my specific piece of kit. For example if it is a Cisco IOS device it will call the IOS task file. If it is Juniper, it will call the Junos task file.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | - name: include cisco task when Cisco device include_tasks: "{{ role_path }}/tasks/network_backup_cisco.yml" when: ansible_network_os == "asa" or ansible_network_os == "ios" or ansible_network_os == "nxos" - name: include arista task when Arista device include_tasks: "{{ role_path }}/tasks/network_backup_arista.yml" when: ansible_network_os == "eos" - name: include junos task when Junos device include_tasks: "{{ role_path }}/tasks/network_backup_juniper.yml" when: ansible_network_os == "junos" - name: include routeros task when Mikrotik device include_tasks: "{{ role_path }}/tasks/network_backup_mikrotik.yml" when: ansible_network_os == "routeros" |
The last part of the playbook is where it gets interesting:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 | - name: Copy the backup to repo ansible.builtin.copy: src: "{{ bup_temp_file }}" dest: "{{ backup_file }}" register: copy_result delegate_to: localhost - name: Delete the temp file ansible.builtin.file: path: "{{ bup_temp_file }}" state: absent changed_when: False delegate_to: localhost - name: push the repo back with tags when: backup_with_tags ansible.builtin.shell: "git add *; git commit -m '{{ time }}'; git config --global core.sshCommand 'ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no'; git tag -a {{ time }} -m '{{ time }}'; git push; git push --tags" args: chdir: "{{ backup_dir }}{{ backup_repo_folder }}" environment: GIT_COMMITTER_NAME: "{{ git_name | default(omit) }}" GIT_COMMITTER_EMAIL: "{{ git_email | default(omit) }}" GIT_AUTHOR_NAME: "{{ git_name | default(omit) }}" GIT_AUTHOR_EMAIL: "{{ git_email | default(omit) }}" delegate_to: localhost changed_when: git_return.stderr != "Everything up-to-date" run_once: true connection: local register: git_return become: True - name: push the repo back with no tags when: not backup_with_tags ansible.builtin.shell: "git add *; git commit -m '{{ time }}'; git config --global core.sshCommand 'ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no'; git push" args: chdir: "{{ backup_dir }}{{ backup_repo_folder }}" environment: GIT_COMMITTER_NAME: "{{ git_name | default(omit) }}" GIT_COMMITTER_EMAIL: "{{ git_email | default(omit) }}" GIT_AUTHOR_NAME: "{{ git_name | default(omit) }}" GIT_AUTHOR_EMAIL: "{{ git_email | default(omit) }}" delegate_to: localhost changed_when: git_return.stderr != "Everything up-to-date" run_once: true connection: local register: git_return become: True |
The first two tasks simply copy the backup files into the newly cloned repo folder, then delete the temporary working file.
The last two tasks are where it does the merge back to the repo.
The first task checks if the “backup_with_tags” boolean is set to true(which means I want to create a point in time backup). If it is, then you can see the long shell command it sends. I will git add everything in the repo and commit it. I then add a git tag and use the timestamp from before as the source of the tag. I did some testing and this can be numbers or letters, so this tag could be what change is about to happen or what project this is associated with…it doesn’t simply have to be numbers. I then do a git push which will put any updated files into the repo as normal and last do a git push with –tags, which will create that point in time tag on the repo. Keep in mind that if I ONLY did a –tag push then the standard repo wouldn’t see the changes, only the tag itself, which is why I do a standard push and a tag push.
The last task does a standard git push…nothing abnormal there.
Playbooks – Configuration As Code
– Configuration As Code(CAC) playbooks are here
I won’t cover every playbook in this repo, rather I’ll just cover one or two and show how the role is utilized.
I’ll start by showing the role and what it does because the very first task called in my playbooks invokes the role:
clone-repo/tasks/main.yml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 | --- # this and the following task add a private ssh key to the execution environment so it can connect to git - name: create .ssh folder become: true run_once: true delegate_to: localhost ansible.builtin.file: path: /root/.ssh state: directory mode: '0777' - name: create the ssh key file based on the supplied cred become: true run_once: true delegate_to: localhost ansible.builtin.copy: dest: ~/.ssh/id_rsa content: "{{ cert_key }}" mode: '0600' # no_log: true - name: clone the repo with tags when: repo_tag is defined and repo_tag != "" become: true run_once: true delegate_to: localhost ansible.builtin.shell: "git config --global core.sshCommand 'ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no'; mkdir /tmp/working; git clone --depth 1 --branch {{ repo_tag }} {{ backup_repo }} /tmp/working; cp /tmp/working/* {{ playbook_dir }}" args: chdir: "{{ playbook_dir }}" - name: clone the repo without tags when: repo_tag | default("") == "" and config_repo is defined become: true run_once: true delegate_to: localhost ansible.builtin.shell: "git config --global core.sshCommand 'ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no'; mkdir /tmp/working; git clone {{ config_repo }} /tmp/working; cp /tmp/working/* {{ playbook_dir }}" args: chdir: "{{ playbook_dir }}" |
Exactly like the backup scripts, the first two tasks are just setting up the SSH keys in the EE for authentication.
The last two tags are where I do work.
The first checks if there is a “repo_tag” supplied, and if it is, it knows it has to clone the tagged entry from said repo. The git clone command responsible is “git clone –depth 1 –branch {{ repo_tag }} {{ backup_repo }} /tmp/working”. This will stick the files in the working folder, then they will be copied to the playbook directory for later processing.
If there is no tag it will clone the config repo and then stick it into the playbook directory.
Notice I have it build so that if there is a tag it will know “I’m doing a rollback so pull from the backup folder.”
If there is no tag it knows “Oh, no tag so I’m pulling from the config repo to perform this configuration change.”
I have a dedicated repo just for backups, then the user specifies the repo they want to use if it is a regular change precedure.
Now I’ll take a look at the NTP configuration playbook:
cac-nexus-ntp.yml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 | --- - name: update NTP servers on nexus kit hosts: nexus9k3 gather_facts: false vars: save_now: false #reset this here or via extravars if this should be saved tasks: - name: clone repo with config files ansible.builtin.include_role: name: clone-repo - name: create the parse file variable manually ansible.builtin.set_fact: parse_file: "{{ inventory_hostname }}" ### BLOCK START ### - name: block if a parse file is specified when: parse_file is defined block: - name: Parse config cisco.nxos.nxos_ntp_global: running_config: "{{ lookup('file', parse_file) }}" state: parsed register: parsed - name: Replace NTP config with the provided. Not merge, but replace cisco.nxos.nxos_ntp_global: config: "{{ parsed.parsed }}" state: replaced register: result ### BLOCK STOP ### ### BLOCK START ### - name: block if a parse file is not present when: parse_file is not defined block: - name: Replace NTP config with the provided. Not merge, but replace cisco.nxos.nxos_ntp_global: config: logging: True servers: "{{ ntp_servers }}" state: replaced register: result ### BLOCK STOP ### - name: save when required when: save_now == true and result.changed == true cisco.nxos.nxos_config: save_when: always |
This playbook takes the standard “show run” style configuration file and parses it with the NTP resource module. I LOVE how resource modules are increasingly supporting parse states that will take a standard CLI config and grab just the parts it needs and putting them into a usable data model. In my case it is grabbing the config file that it just cloned from the repo(either backup of config change).
After it parses the config it will then use that information to use the same NTP module to apply the changes to the device.
The very last task will save the config change to the device if the save_now flag is set to true.
I’ll give one more example playbook:
Now I’ll take a look at the NTP configuration playbook:
cac-nexus-dns.yml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | --- - name: update dns servers on nexus kit hosts: nexus9k3 gather_facts: false vars: save_now: false #reset this here or via extravars if this should be saved tasks: - name: clone repo with config files ansible.builtin.include_role: name: clone-repo - name: parse config file for name servers ansible.builtin.set_fact: config_line: "{{ lookup('ansible.builtin.file', inventory_hostname) | regex_search('^.*ip name-server.*$', multiline=True) }}" - name: parse config file for name servers ansible.builtin.set_fact: dns_servers: "{{ lookup('ansible.builtin.template', 'cac-extract-ip.j2') | trim}}" - name: configure name servers cisco.nxos.nxos_system: name_servers: "{{ dns_servers }}" register: result - name: save when required when: save_now == true and result.changed == true cisco.nxos.nxos_config: save_when: always |
This module(nxos_system) doesn’t have a parse option, so to make it all work I have to do some kung-fu.
First I create a variable named config_line by doing a regex search through the file to find the name-server entry…nothing too bad there.
Now I need to parse out the IP addresses, so I kindly asked ChatGPT to write me a jinja2 template that would do it for me:
1 2 3 4 5 6 7 8 9 | {% set ip_list = [] %} {% for word in config_line.split() %} {% if not word.startswith('ip') %} {% if word | ipaddr %} {% set _ = ip_list.append(word) %} {% endif %} {% endif %} {% endfor %} {{ ip_list }} |
Taking a look at the template it, in short, splits the line into pieces and looks for IP addresses. When it finds them it adds them to the ip_list variable and passes it back.
Unfortunately it passes a lot of funky white spaces and such, so in the task where I call it I pipe the output to the trim plugin and it smartly cleans it all up.
I then hand the newly parsed dns_servers variable over to the nxos_system module for application of the configuration.
AAP Configuration
I had to create a custom credential to store and supply my SSH keys. I named it “SSH Certificate Key”.
It’s Input Configuration:
1 2 3 4 5 6 7 8 | fields: - id: supp_cert_key type: string label: Certificate Key secret: true multiline: true required: - supp_cert_key |
It’s Injector Configuration:
1 2 | extra_vars: cert_key: '{{ supp_cert_key }}' |
This provides an interface to easily add my private key:
I have several job templates and a couple of workflow templates to make it all happen:
In essence I have a job template for each component. So one for VLAN, ACL, NTP, DNS, Etc. By keeping these as discrete pieces I can reuse them in other workflows to easily build different configs!
My standard config workflow looks as follows:
In here I start by doing a backup with tags. That way I have a steady state point-in-time backup before I make my changes.
The workflow then in parallel will configure each portion based on the supplied configuration file.
If, for some reason, I need to roll back the changes I run the same hosts against the same config tasks, but tell my automation to do it from the backup tag:
The difference between the two workflows is there is no backup process on the this rollback workflow, but other than that, it’s the same.
Conclusion
This was a little finicky to figure out the first time round, but now that I’ve got the pieces together it’s actually quite reliable. In fact, I read/write from repos for configs a LOT, and this has become an invaluable tool.
If you would change this to fit your needs, what would that look like? I appreciate all questions and comments.
Thanks and happy automating!