Skip to content
Mar 21 / Greg

VMware To Proxmox VE Migration Via Ansible And Ascender

Once we choose a system, we often feel married to it. The idea of migrating our infrastructure often seems more expensive than the new price the vendor just handed us for what we are using. I’ve felt/seen this with software, cloud providers, and hypervisors. While this can be the truth at first glance, automation can often level the playing field. Case and point, I’m about to demonstrate how to migrate migrate a Rocky VM from VMWare to Proxmox VE completely via automation 🙂

Video Demo

Overview

I thought I might quickly begin by describing the process with graphics.

First, we have a VMWare cluster and our Proxmox cluster. As you can see I have a virtual machine(VM1) that I want to migrate from VMware over to Proxmox.

My first step is to use automation to connect into the running VM and prep it for migration. My Rocky host didn’t have the Virtio drivers in the initramfs, so I need to add them. The initramfs is a file system that boots into ram that allows the kernel to mount the disks and do the actual file system boot. Your mileage may vary…perhaps these are already in your boot system, so give it a test as is and see how your system responds. Once the prep work is done, the VM is shutdown.

Next we install VMware’s OVFTool somewhere to extract the VMs as OVFs from the VMware system. For my small demo I’m going to install it on my Proxmox host and do the migrations directly from there. You will do the migrations to a temporary storage location either directly on the Proxmox host or via something like an NFS share mapped to the host. If you are doing this to an NFS share you could easily be running the OVFTool from a completely separate server(this would cause the least impact to your Proxmox host).

Now on the Proxmox host I’ll use the qm importovf command to bring the newly created VM1 OVF into the system as a VM.

Before starting the VM we’ll need to modify any hardware settings required for this specific VM. Again, this is a “your mileage may vary” scenario. In my case, I had to update the bios, add an efi disk for the new bios, modify the CPU type, and reconfigure the networking.

Now the last step is pretty simple; all that needs to be done is to start the new VM.

Playbooks

All of my playbooks can be found here in my git repository.

The whole process actually took a good bit of work to figure out(really down to driver/bios issues), so hopefully I can save you some time and effort.

preconfigure-hosts.yml
This playbook’s purpose is to do any preconfiguration work necessary. In my case the prework is to update the initramfs with virtio drivers and then to shut the machine down(OVFTool needs the VM to be shutdown before it can migrate it). Depending on your environment and OS, there may be different things required. Here you could also do things like backup network configuration settings so that when the migration is complete you can apply them.

1
2
3
4
5
6
7
8
9
10
11
12
---
- name: Preconfigure the VMWare hosts for migration
  hosts: all
  gather_facts: false
  vars:
  tasks:
    - name: Connect to hosts and add virtio drivers to boot(using virtio in proxmox)
      ansible.builtin.shell: 'dracut --add-drivers "virtio_scsi virtio_pci" -f -v /boot/initramfs-`uname -r`.img `uname -r`'
      register: driver_install
 
    - name: Shutdown the host
      community.general.shutdown:

In this playbook I create the new boot file based on the name of the kernel currently running( the uname -r portion).

migrate-vms.yml
Next the migrate playbooks is run. This is what actually grabs the OVF, imports into Proxmox, and updates all settings.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
---
- name: Use ovftool to pull vms from vmware and import into proxmox
  hosts: all
  gather_facts: false
  vars:
 
    proxmox_auth: &proxmox_auth
      api_host: proxmox.gregsowell.com
      api_user: "{{ gen2_user }}"
      # api_user: root@pam # username format example
      # Use standard password
      api_password: "{{ gen2_pword }}"
      # Use api token and secret - example format
      # api_token_id: gregisa5.0
      # api_token_secret: 72a72987-ff68-44f1-9fee-c09adaaecf4d
 
    #Full path to ovftool directory
    ovftool_path: /root/ovftool
 
    #Hostname or IP of vmware vcenter
    vmware_host: 10.0.2.10
 
    #Hostname or IP of proxmox
    proxmox_host: proxmox.gregsowell.com
 
    #Datacenter the VMs are in
    vmware_datacenter: MNS
 
    #Directory path to vms
    vmware_directory: Greg/ciq
 
    #VMware username
    # vmware_username: administrator
    vmware_username: "{{ gen1_user }}"
 
    #VMware password
    # vmware_password: test
    vmware_password: "{{ gen1_pword }}"
 
    #Proxmox migration folder
    proxmox_migration_dir: /root/migration
 
    #Storage proxmox will import new vms into with qm import command
    proxmox_storage: local-lvm
 
  tasks:
    - name: Add proxmox host to inventory
      ansible.builtin.add_host:
        name: "{{ proxmox_host }}"
        ansible_host: "{{ proxmox_host }}"
      run_once: true
 
    - name: Create the proxmox migration directory if it doesn't exist
      ansible.builtin.file:
        path: "{{ proxmox_migration_dir }}"
        state: directory
        mode: '0755'
      delegate_to: "{{ proxmox_host }}"
      run_once: true
 
    - name: Check for each host and see if they already have an export folder created
      stat:
        path: "{{ proxmox_migration_dir }}/{{ hostvars[inventory_hostname].config.name }}"
      register: directory_status
      delegate_to: "{{ proxmox_host }}"
 
    - name: Block for exporting/importing hosts
      when: inventory_hostname != proxmox_host and ( not directory_status.stat.exists or not directory_status.stat.isdir )
      block:
        - name: Run ovftool to export vm from vmware to proxmox server if folder isn't already there
          ansible.builtin.shell: "{{ ovftool_path }}/ovftool --noSSLVerify vi://{{ vmware_username }}:{{ vmware_password }}@{{ vmware_host }}:443/{{ vmware_datacenter }}/vm/{{ vmware_directory }}/{{ hostvars[inventory_hostname].config.name }} ."
          args:
            chdir: "{{ proxmox_migration_dir }}/"
          delegate_to: "{{ proxmox_host }}"
          no_log: true
 
        - name: Call task file to provision new proxmox vm.  Loop over hosts in play
          ansible.builtin.include_tasks:
            file: import-ovf.yml
          run_once: true
          loop: "{{ play_hosts }}"
 
        - name: Modify proxmox vms to have required settings
          community.general.proxmox_kvm:
            <<: *proxmox_auth
            name: "{{ hostvars[inventory_hostname].config.name }}"
            node: proxmox
            scsihw: virtio-scsi-pci
            cpu: x86-64-v2-AES
            bios: ovmf
            net:
              net0: 'virtio,bridge=vmbr0,firewall=1'
            efidisk0:
              storage: "{{ proxmox_storage }}"
              format: raw
              efitype: 4m
              pre_enrolled_keys: false	
            update: true
            update_unsafe: true
          register: vm_update
          delegate_to: localhost
 
        - name: Start VM
          community.general.proxmox_kvm:
            <<: *proxmox_auth
            name: "{{ hostvars[inventory_hostname].config.name }}"
            node: proxmox
            state: started
          delegate_to: localhost
          # ignore_errors: true

I’ll attempt to cover the highlights instead of the entire playbook(which I do cover in the video).

Task number 2(Check for each host and see if they already have an export folder created) checks to see if the directory for the current host has already been migrated. This is used in the migration conditional later, so that if it already exists it won’t be exported/imported again.

Once the OVF has been created the (Call task file to provision new proxmox vm. Loop over hosts in play) task is run. This task runs once as a loop over a variable that contains all of the hosts in the current play(play_hosts which is a magic variable). For each host it will locate the next available Proxmox vid and then it will import the OVF.

import-ovf.yml

1
2
3
4
5
6
7
8
9
10
11
12
---
- name: Find the next available vmid
  ansible.builtin.shell: pvesh get /cluster/nextid
  register: next_vmid
  delegate_to: "{{ proxmox_host }}"
 
- name: Run qm import with the new next_vmid
  when: item != proxmox_host
  ansible.builtin.shell: "qm importovf {{ next_vmid.stdout }} {{ hostvars[item].config.name }}/{{ hostvars[item].config.name }}.ovf {{ proxmox_storage }}"
  args:
    chdir: "{{ proxmox_migration_dir }}/"
  delegate_to: "{{ proxmox_host }}"

Once all OVFs have been imported we need to modify their hardware settings via this task(Modify proxmox vms to have required settings).
You can see with the proxmox_kvm module I’m using the update: true option. This works for most of the options I’m trying to modify, but it DOESN’T allow me to update my network settings or create an EFI disk, which is required for proper operation. To fix this you have to use the update_unsafe: true option. They have this because in the update I could break configs that result in data loss, but in this case I’m only adding required components on my new install, so I’m good to go.

Manual Testing

While the automation is awesome, you will likely still want/need to do some manual testing. I’ve got the various steps here in brief so you can do just that.

Install OVFTool
Visit this VMware page and grab whichever version of OVFTool is appropriate for you. I grabbed the 4.4.3 archive. I extracted it with tar and made the ovftool and ovftool.bin files executable. I put this on my Proxmox host as I was running all the commands from there.

VM Prep
On the VMs you intend to migrate, you can run the following command to see if you have the virtio drivers on your machine:

1
find /lib/modules/`uname -r` -type f -name virtio_*

If they are present(which has always been the case for me), you can run the manual steps to create the new initramfs file

1
dracut --add-drivers "virtio_scsi virtio_pci" -f -v /boot/initramfs-`uname -r`.img `uname -r`

Browse For VMs With OVFTool
There’s a specific folder structure your VMs will follow. To browse around and make sure things are where you expect them to be, you can use the following command:

1
ovftool --noSSLVerify vi://username:password@host:443/YourDatacenter/vm

My environment doesn’t have a valid cert on the vcenter, so I’m using the noSSLVerify option.
You can also leave off :password in the command and it will prompt you for your password.
YourDatacenter will be the case sensitive datacenter name and /vm will always need to be there(it’s where VMs show up).
This will begin to start allowing you to browse to your VM paths.

Export OVFs
The command to actually export the OVFs should be executed from inside of the folder you want things saved to:

1
ovftool --noSSLVerify vi://[email protected]:443/MNS/vm/Greg/ciq/Greg-rocky8 .

Notice in the command I followed the folder structure Greg/ciq, then added the name of my VM.
At the very end of the command is a space and a period. If you leave off the period it won’t do the export, so make sure “ .” is on the end.

Import OVFs
The command to import an OVF is as follows:

1
qm importovf <vid> <contained folder>/<name of vm>.ovf <proxmox storage location>

An example would be:

1
qm importovf 400 Greg-rocky8-to-9-conversion/Greg-rocky8-to-9-conversion.ovf local-lvm

Notice you have to specify a unique vid for each imported VM. You can look at the Proxmox interface to find the next one, or you can issue this CLI command and it will tell you what it is:

1
pvesh get /cluster/nextid

Conclusion

Migrating from one product to another can feel impossible at times, but automation is here to be your secret super power. We are quickly becoming migration experts, and we’d love to assist your team with complete migrations that have automation leave behinds…or we can also just do validation; as much or as little as you require.

As always, I appreciate you reading through, and would love your feedback.

Thanks and happy migrating!

Feb 20 / Greg

Automating VMware Alternatives With Ansible And Ascender

Automating VMware Alternatives With Ansible And Ascender
I have personally used and had good success with VMware for nearly two decades. While it is a good product, I occasionally have folks talk about some alternatives, and how viable they are. For this article/demo I’m going to use Proxmox VE, which is a competent/user friendly hypervisor. Really, I’m going to examine this question from an automation perspective.

First, and most importantly, the user experience is exactly the same no matter what hypervisor you use. This means your users can smoothly transition from one platform to the other with no knowledge that anything has changed.

The backend, using Ansible and Ascender, will have different playbooks, which means you will be using different modules, and often some slightly different procedures, but frequently they aren’t so different.

Demo Video

Interfaces Compared

VMware’s Vcenter has been a pretty consistent interface for a while now:

Proxmox VE may be new to you, but the interface should look and feel awfully familiar:

They both will very similarly allow you to access and edit settings, add/delete VMs and templates…BUT, who wants to manage infrastructure manually?

Playbooks Compared

You can find my Proxmox playbooks here, and my VMware playbooks are here.

The two playbooks in question, one for Proxmox and one for VMware, are structured very similarly, but use their respective modules.

I’m going to breakdown and comment on each playbook below.
Proxmox Playbook(proxmox-vm.yml)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
---
- name: Create and control VMs on Proxmox using Ascender
  hosts: localhost
  gather_facts: false
  vars:
    # Configure the api connection info for Proxmox host
    proxmox_auth: &proxmox_auth
      api_host: proxmox.gregsowell.com
      api_user: "{{ gen1_user }}"
      # api_user: root@pam # username format example
      # Use standard password
      api_password: "{{ gen1_pword }}"
      # Use api token and secret - example format
      # api_token_id: gregisa5.0
      # api_token_secret: 72a72987-ff68-44f1-9fee-c09adaaecf4d

Above I use a concept in YAML known as an anchor. In every task below I will reference this anchor with an alias. This essentially allows me to reference a set of options and inject them in various other places in my playbook. An anchor is set via the & symbol, which says “everything else after this is part of the anchor.”

Here I’m setting up the authentication. I have examples of doing username / password or using an api token. Notice that for username and password I’m using variables. This is because I’m passing in a custom credential via Ascender to the playbook at run time. Doing this allows me to securely maintain those credentials in the Ascender database.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
    # Configure template to clone for new vm
    clone_template: rocky9-template
 
    # Configure location on Host to store new vm
    storage_location: local-lvm
 
    # Linked Clone needs format=unspecified and full=false
    # Format of new VM
    vm_format: qcow2
 
    # Default name to use for the VM
    vm_name: test-rocky9
 
    # How many cores
    # vm_cores: 2
 
    # How many vcpus:
    # vm_vcpus: 2
 
    # Options for specifying network interface type
    #vm_net0: 'virtio,bridge=vmbr0'
 
    #vm_ipconfig0: 'ip=192.168.1.1/24,gw=192.168.1.1'

Above here you can see that I’m setting up defaults for my standard variables that can be overridden at run time. Remember that if you pass in variables as extra_vars they have the highest level of precedence.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
    # Switches for playbook control
    # vm_action: provision
    # vm_action: start
    # vm_action: stop
    # vm_action: delete
 
 
  tasks:
  - name: Block for creating a vm
    when: vm_action == "provision"
    block:
    - name: Create a VM based on a template
      community.general.proxmox_kvm:
        # api_user: root@pam
        # api_password: secret
        # api_host: helldorado
        <<: *proxmox_auth
        clone: "{{ clone_template }}"
        name: "{{ vm_name }}"
        node: proxmox
        storage: "{{ storage_location }}"
        format: "{{ vm_format }}"
        timeout: 500
        cores: "{{ vm_cores | default(omit) }}"
        vcpus: "{{ vm_vcpus | default(omit) }}"
        net:
          net0: "{{ vm_net0 | default(omit) }}"
        ipconfig:
          ipconfig0: "{{ vm_ipconfig0 | default(omit) }}"
      register: vm_provision_info
 
    - name: Pause after provision to give the API a chance to catch up
      ansible.builtin.pause:
        seconds: 10

Here you can see my first two tasks. I’m looking for the vm_action variable to be set to provision, and if it is I will run this block of code. I’ll then clone a template. Here I’m doing a full clone, but I could also set it to be a linked clone if I wanted. You can also see I have some customization variables here, that when not set are omitted. After the VM provisions, it will pause for about 10 seconds to give the system time to register the new VM before it proceeds on.

This is also where you see the anchor being referenced. The “<<: *proxmox_auth” is the alias with override. This will take the variable chunk from above and inject it into this module(which makes modification to it easier and saves several lines of code in each task).

1
2
3
4
5
6
7
  - name: Start VM
    when: (vm_action == "provision" and vm_provision_info.changed) or vm_action == "start"
    community.general.proxmox_kvm:
      <<: *proxmox_auth
      name: "{{ vm_name }}"
      node: proxmox
      state: started

You can see in this start task that I have two conditionals. One if the vm_action is set to start and one if the action is set to provision and it actually made a change when doing the provision. It does this on the provision task because by default when you clone a template it won’t be started. We also check for the changed status because the clone operation is idempotent, which means if that VM already exists, it won’t do anything, it will simply report back “ok”.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
  - name: Stop VM
    when: vm_action == "stop"
    community.general.proxmox_kvm:
      <<: *proxmox_auth
      name: "{{ vm_name }}"
      node: proxmox
      state: stopped
 
  - name: Delete VM block
    when: vm_action == "delete"
    block:
    - name: Stop VM with force
      community.general.proxmox_kvm:
        <<: *proxmox_auth
        name: "{{ vm_name }}"
        node: proxmox
        state: stopped
        force: true
 
    - name: Pause to allow shutdown to complete
      ansible.builtin.pause:
        seconds: 10
 
    - name: Delete VM
      community.general.proxmox_kvm:
        <<: *proxmox_auth
        name: "{{ vm_name }}"
        node: proxmox
        state: absent

The delete block isn’t too complex. It first stops the VM, pauses so the system will register the change, then performs the delete from disk.

VMware Playbook(vmware-vm.yml)
Compare and contrast the two playbooks: they are laid out almost identically, the tasks are in the same order, and they are configured almost the same. This means transitioning from one to the other should be pretty seamless.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
---
- name: Create and control VMs on VMware using Ascender
  hosts: localhost
  gather_facts: false
  vars:
    # below are all of the details for the VM.  I'm overriding these at runtime.
    vm_datacenter: MNS
    vm_name: snowtest2
#    vm_template: Windows2016
    vm_template: Rocky8.6
#    vm_template: Rocky8
#    vm_template: Rocky9
    vm_folder: /Greg/ciq
    vm_disksize: 50
    vm_datastore: SSD
    # minimum of 4GB of RAM
    vm_memory: 4096
    vm_cpus: 4
    vm_netname: Greg
    vm_ip: 10.1.12.56
    vm_netmask: 255.255.255.0
    vm_gateway: 10.1.12.1
 
    vmware_auth: &vmware_auth
      hostname: "{{ vcenter_hostname }}"
      username: "{{ gen1_user }}"
      password: "{{ gen1_pword }}"
      validate_certs: no
 
    # Switches for playbook control
    # vm_action: provision
    # vm_action: start
    # vm_action: stop
    # vm_action: delete
 
 
  tasks:
  - name: Provision a VM
    when: vm_action == "provision"
    community.vmware.vmware_guest:
      <<: *vmware_auth
      folder: "{{ vm_folder }}"
      name: "{{ vm_name }}"
      datacenter: "{{ vm_datacenter }}"
      state: poweredon
#      guest_id: centos64Guest
      template: "{{ vm_template }}"
  # This is hostname of particular ESXi server on which user wants VM to be deployed
      disk:
      - size_gb: "{{ vm_disksize }}"
        type: thin
        datastore: "{{ vm_datastore }}"
      hardware:
        memory_mb: "{{ vm_memory }}"
        num_cpus: "{{ vm_cpus}}"
        scsi: paravirtual
      networks:
      - name: "{{ vm_netname}}"
        connected: true
        start_connected: true
        type: dhcp
        # type: static
        # ip: "{{ vm_ip }}"
        # netmask: "{{ vm_netmask }}"
        # gateway: "{{ vm_gateway }}"
        # dns_servers: "{{ vm_gateway }}"
#      wait_for_ip_address: true
#        device_type: vmxnet3
    register: deploy_vm
 
  - name: Start a VM
    when: vm_action == "start"
    community.vmware.vmware_guest:
      <<: *vmware_auth
      folder: "{{ vm_folder }}"
      name: "{{ vm_name }}"
      datacenter: "{{ vm_datacenter }}"
      state: poweredon
 
  - name: Stop a VM
    when: vm_action == "stop"
    community.vmware.vmware_guest:
      <<: *vmware_auth
      folder: "{{ vm_folder }}"
      name: "{{ vm_name }}"
      datacenter: "{{ vm_datacenter }}"
      # state: poweredoff
      state: shutdownguest 
 
  - name: Delete a VM
    when: vm_action == "delete"
    community.vmware.vmware_guest:
      <<: *vmware_auth
      folder: "{{ vm_folder }}"
      name: "{{ vm_name }}"
      datacenter: "{{ vm_datacenter }}"
      state: absent
      force: true

Interface Comparison In Ascender

For my job templates(how you pull your playbook and required components together in Ascender) I’m using something called a survey. This allows you to quickly/easily configure a set of questions for the user to answer before the automation is run. Here’s an example of my VMware survey:

Adding an entry is as simple as clicking add and filling in the blanks:

Notice in the above there is a blank space for “Answer variable name”. This is the extra_vars variable that the info will be passed to the playbook with. That’s how these surveys work; they simply pass info to the playbook as an extra_var.

So for the sake of comparison, here’s what it looks like when you launch the two different job templates:
VMware Job Template:

Proxmox Job Template:

Notice how they are almost identical. I added all of the options to the VMware JT, but I kept the Proxmox one a little cleaner. I could have just as easily added all of the knobs, but I wanted to show how you can make them full featured or simple. In the end they accomplish the same task. You put in the VM’s name, choose an action to perform, and add any additional data required. That’s literally all there is to it!

When I say choose an action it’s as easy as this:

Conclusion

Now this should just serve to show you how easy it is to automate different hypervisors. Keep in mind that this example is simple, but can easily be expanded to easily automate virtually all of the functions of your virtual environment.

As always, I’m looking for feedback. How would you use this in your environment…how would you tweak or tune this?

CIQ also does professional services, so if you need help building, configuring, or migrating your automation strategy, environments, or systems, please reach out!

Thanks and happy automating!

Feb 19 / Greg

Why Am I Ryan Shacklett

Hey everybody, I’m Greg Sowell and this is Why Am I, a podcast where I talk to interesting people and try to trace a path to where they find themselves today.  My guest this go around is Ryan Shacklett.  Ryan has multiple fursonas with his main as Wild Acai.  For the uninitiated, that means he attends conventions and events dressed in an insanely impressive animal suite…often referred to as a “furry”.  Not only does it provide a place to explore parts of your personality, but also allows for a lot of creative expression.  Not only does he participate, but he also gets to give these experiences to others through his own multi employee company that creates these impressive suites.. Help us grow by sharing with someone!

Youtube version here:

Please show them some love on their socials here: http://waggerycos.com/,

https://twitter.com/waggerycos.

If you want to support the podcast you can do so via https://www.patreon.com/whyamipod (this gives you access to bonus content including their Fantasy Restaurant!)

Feb 11 / Greg

Why Am I Dan De Leon

Hey everybody, I’m Greg Sowell and this is Why Am I, a podcast where I talk to interesting people and try to trace a path to where they find themselves today.  My guest this go around is Dan De Leon.  Dan lives his life the way he leads his church, open and affirming.  That was a term new to me, but essentially it means all people are welcome and supported…no matter how able bodied you are, or where you are in the LGBTQIA+ box of crayons.  Dan gives me hope.  Hope that the religious people I care about can come to the one real truth; loving others unconditionally is the only thing that matters.  You follow this principle first, and have your faith fit in around that, not the other way around.  I learn a LOT in this conversation, and I hope you do to.  Please enjoy this chat with Dan. Help us grow by sharing with someone!

Youtube version here:

Please show them some love on their socials here: https://www.friends-ucc.org/,

https://www.facebook.com/friendschurchucc,

https://www.instagram.com/friends_ucc.

If you want to support the podcast you can do so via https://www.patreon.com/whyamipod (this gives you access to bonus content including their Fantasy Restaurant!)

Feb 4 / Greg

Why Am I 2024-02-04 10:48:50

Hey everybody, I’m Greg Sowell and this is Why Am I, a podcast where I talk to interesting people and try to trace a path to where they find themselves today.  My guest this go around is Scott Jones.  He’s a British born Aussie who has always felt a bit like a fish out of water.  Part of that feeling came to light when he recently realized he’s gay, and now the world literally looks different.  I mean imagine waking up one day, and getting to experience the world through a beautiful new lens.  I hope you enjoy this chat with Scott.  Help us grow by sharing with someone!

Youtube version here:

If you want to support the podcast you can do so via https://www.patreon.com/whyamipod (this gives you access to bonus content including their Fantasy Restaurant!)

Jan 29 / Greg

Install Performance Co-pilot Via Ansible And Ascender


Performance co-pilot(PCP) is a suite of tools used for performance monitoring for a variety of things. We see it used quite a bit in the HPC space to either squeeze as much performance out of a system as possible or to troubleshoot performance issues. It can often be tedious to install and manage…unless, of course, you use automation!

I’ll describe my architecture, review my playbooks, and have a look at it all working.

Video Demo

How It Works

PCP has a LOT of components and options, I really intend to just describe how I’m configuring it.

First, what is a “Collection host”? Any regular server or VM running PCP to gather info on itself is considered a collection host. So most of the configured hosts will be collection hosts.


Once a collector is configured an admin will generally SSH into it to access the PCP data. These hosts can also run something like redis with grafana to graph info, which means the admin is going straight to the host either way.


When your environment begins to grow it can be a bit tedious to connect to each host to access PCP info.


This is where a “Monitoring Host” comes in. A monitoring host stores info from multiple collection hosts. This means an admin only needs to connect to the monitoring host to gain insight about any of the collection hosts…a one-stop-shop as it were.

You can either push or pull data. If you push data from the collectors they will incur some additional overhead. If you pull from the monitoring host, it will incur the additional cost, which is less likely to skew your performance metrics from the collection hosts.

I’ve also seen some data saying that a monitoring host should be capped somewhere around a thousand collectors.

Playbooks

All of my playbooks can be found here in my git repository.

pcp-install.yml. This playbook connects to PCP collectors and configures them to both collect locally and prepares them to allow monitoring hosts to access them:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
---
- name: Install/configure PCP on various hosts
  hosts: pcp-hosts
  gather_facts: false
  vars:
    # Services to be enabled/started
    enable_services: 
      - pmcd
      - pmlogger
 
    # The subnets or ranges of hosts allowed to connect to clients to fetch info
    remote_subnets:
      - 10.0.*
      - 192.168.5.10
 
  tasks:
  # dnf install required pcp packages
  - name: Install pcp packages
    ansible.builtin.dnf:
      name: "{{ item }}"
      state: latest
    loop:
      - pcp
      - pcp-system-tools
    notify: restart pcp
 
  - name: Configure the pmcd process(add all of the allowed subnets)
    ansible.builtin.blockinfile:
      path: /etc/pcp/pmcd/pmcd.conf
      block: "{{ lookup('ansible.builtin.template', 'pmcd-access.j2') }}"
      insertafter: "\\[access\\]"
    notify: restart pcp
 
  - name: Configure the pmcd options to listen on the correct IP
    ansible.builtin.lineinfile:
      path: /etc/pcp/pmcd/pmcd.options
      line: "-i {{ hostvars[inventory_hostname].ansible_host }}"
 
  - name: Enable pmcd listening ports on firewall
    ansible.posix.firewalld:
      port: 44321/tcp
      permanent: true
      immediate: true
      state: enabled
    ignore_errors: true
 
  - name: Enable selinux for pmcd services
    ansible.builtin.shell: "{{ item }}"
    ignore_errors: true
    loop:
      - setsebool -P pcp_read_generic_logs on
      - setsebool -P pcp_bind_all_unreserved_ports on
 
  - name: Start and enable pcp services
    ansible.builtin.service:
      name: "{{ item }}"
      state: started
      enabled: true
    loop: "{{ enable_services }}"
 
  handlers:
  - name: restart pcp
    ansible.builtin.service:
      name: "{{ item }}"
      state: restarted
    loop: "{{ enable_services }}"

I’m going to point out some things of note in the above playbook.
First is the remote_subnets variable. This should be populated with the IP or subnet of your monitoring hosts. It’s essentially an access list of who is allowed to connect in to retrieve PCP data.

Most of the tasks are pretty straightforward, but I thought I would have a look at one that includes a jinja2 template:

1
2
3
4
5
6
  - name: Configure the pmcd process(add all of the allowed subnets)
    ansible.builtin.blockinfile:
      path: /etc/pcp/pmcd/pmcd.conf
      block: "{{ lookup('ansible.builtin.template', 'pmcd-access.j2') }}"
      insertafter: "\\[access\\]"
    notify: restart pcp

This replaces a block of code using the blockinfile module, but I’m pulling that block from a dynamic j2 template(in the templates folder) named pmcd-access.j2:

1
2
3
{% for item in remote_subnets %}
allow hosts {{ item }} : fetch;
{% endfor %}

Taking a look at the template above you can see I have a simple “for loop”. I loop over the contents of remote_subnets and fill out the allow hosts section based on it. Anything inside of {% %} is omitted from the actual output of the template.

Now that the PCP collectors are installed and configured, I’ll run the pcp-monitor.yml playbook to configure the monitor host:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
---
- name: Install/configure PCP monitor host
  hosts: pcp-monitor
  gather_facts: false
  vars:
    # Services to be enabled/started
    enable_services: 
#      - pmcd
      - pmlogger
 
    collection_directory: /var/log/pcp/pmlogger/
 
    # Do you want to set the pmlogger config files to use host IP address instead of inventory_hostname
    config_via_host: true
 
  tasks:
  # - name: debug data
  #   ansible.builtin.debug:
  #     var: hostvars[item]
  #   loop: "{{ groups['pcp-hosts'] }}"
 
  - name: Install pcp packages
    ansible.builtin.dnf:
      name: "{{ item }}"
      state: latest
    loop:
      - pcp
      - pcp-system-tools
    notify: restart pcp
 
  - name: Create config file for each pcp-host
    ansible.builtin.template:
      src: pmlogger-monitor.j2
      dest: "/etc/pcp/pmlogger/control.d/{{ item }}"
    loop: "{{ groups['pcp-hosts'] }}"
    notify: restart pcp
 
  - name: Create collector host directories by looping over pcp-hosts group
    ansible.builtin.file:
      path: "{{ collection_directory }}{{ item }}"
      state: directory
      mode: '0777'
    loop: "{{ groups['pcp-hosts'] }}"
 
  - name: Start and enable pcp services
    ansible.builtin.service:
      name: "{{ item }}"
      state: started
      enabled: true
    loop: "{{ enable_services }}"
 
  handlers:
  - name: restart pcp
    ansible.builtin.service:
      name: "{{ item }}"
      state: restarted
    loop: "{{ enable_services }}"

Again, I’ll try and point out the less obvious or perhaps more interesting parts of the above playbook.

The variable collection_directory is where the collected PCP data from the collectors will be stored.

The config_via_host variable is one I put in especially for my lab environment. When the config files are created, they point to a host to collect. If this variable is set to true, then the host’s IP address will be used. If it’s set to false, then the inventory_hostname will be used(generally a Fully Qualified Domain Name(FQDN)).

In the previous playbook I used a template, and I’m using one here in the monitor host configuration also in the following task:

1
2
3
4
5
6
  - name: Create config file for each pcp-host
    ansible.builtin.template:
      src: pmlogger-monitor.j2
      dest: "/etc/pcp/pmlogger/control.d/{{ item }}"
    loop: "{{ groups['pcp-hosts'] }}"
    notify: restart pcp

Here I’m using the template module directory rather than the template lookup plugin. Let’s examine the reference pmlogger-monitor.j2 template:

1
2
3
4
5
{% if config_via_host %}
{{ hostvars[item].ansible_host }} n n PCP_LOG_DIR/pmlogger/{{ item }} -r -T24h10m -c config.{{ item }}
{% else %}
{{ item }} n n PCP_LOG_DIR/pmlogger/{{ item }} -r -T24h10m -c config.{{ item }}
{% endif %}

This one uses a conditional “if else” statement, rather than just a loop. This is where I check if the collector host should be referenced via the inventory_hostname or via the ansible_host.

Executing/Troubleshooting Automation

Configure/Install/Troubleshoot Collector
Once you’ve added your inventories, projects, credentials and job templates, you can execute the automaton for installing the collectors:

If you want to test the collector host, you can pretty easily do it by SSHing in and issuing the “pcp” command:

If the monitor is getting “connection refused”, be sure the check the listening ports on the collector with “ss -tlp | grep 44321”:

Configure/Install/Troubleshoot Monitor
Once you run the monitor playbook you should see the successful message:

Now, if you want to test the monitor host, you can SSH into it and check the collection_directory. In my case I had it as “/var/log/pcp/pmlogger/”:

You can see here my PCP collector Greg-rocky9 folder is showing up, but is there data inside?:

This folder is full of data. If it wasn’t I would do a “tail pmlogger.log” in that folder to get an idea of what was happening:

Conclusion

While PCP data may not be for everyone, it can, quite easily, be configured. The trick about performance data is that if you have a performance issue, you can’t go back in time and enable the data collection, so why not go ahead and start collecting BEFORE there’s an issue 🙂.

As always, thanks for reading. If you have any questions or comments, I’d love to hear them. If you use PCP in your environment, I’d love to hear about that also! If we can help you on your automation journey, please reach out to me.

Good luck and happy PCP automating!

Jan 28 / Greg

Why Am I Geoffrey Mark

Hey everybody, I’m Greg Sowell and this is Why Am I, a podcast where I talk to interesting people and try to trace a path to where they find themselves today.  My guest this go around is Geoffrey Mark.  This cat started in show business at 15, and hasn’t stopped since.  He done Broadway, he dances, sings, is a comedian, and just to fill some time has also been a best selling author.  He has some valuable advice on grief, life, and talent.  Oh, and don’t forget to sparkle like Geoff.  Help us grow by sharing with someone!

Youtube version here:

Please show them some love on their socials here: https://www.instagram.com/geoffreymarkshowbiz/?hl=en,

https://twitter.com/thegeoffreymark,

https://www.facebook.com/groups/478789255814114/,

https://a.co/d/5c1P6Wv.

If you want to support the podcast you can do so via https://www.patreon.com/whyamipod (this gives you access to bonus content including their Fantasy Restaurant!)