Hey everybody, I’m Greg Sowell and this is Why Am I, a podcast where I talk to interesting people and try to trace a path to where they find themselves today. My guest this go around is Scott Jones. He’s a British born Aussie who has always felt a bit like a fish out of water. Part of that feeling came to light when he recently realized he’s gay, and now the world literally looks different. I mean imagine waking up one day, and getting to experience the world through a beautiful new lens. I hope you enjoy this chat with Scott. Help us grow by sharing with someone!
If you want to support the podcast you can do so via https://www.patreon.com/whyamipod (this gives you access to bonus content including their Fantasy Restaurant!)
Performance co-pilot(PCP) is a suite of tools used for performance monitoring for a variety of things. We see it used quite a bit in the HPC space to either squeeze as much performance out of a system as possible or to troubleshoot performance issues. It can often be tedious to install and manage…unless, of course, you use automation!
I’ll describe my architecture, review my playbooks, and have a look at it all working.
Video Demo
How It Works
PCP has a LOT of components and options, I really intend to just describe how I’m configuring it.
First, what is a “Collection host”? Any regular server or VM running PCP to gather info on itself is considered a collection host. So most of the configured hosts will be collection hosts.
Once a collector is configured an admin will generally SSH into it to access the PCP data. These hosts can also run something like redis with grafana to graph info, which means the admin is going straight to the host either way.
When your environment begins to grow it can be a bit tedious to connect to each host to access PCP info.
This is where a “Monitoring Host” comes in. A monitoring host stores info from multiple collection hosts. This means an admin only needs to connect to the monitoring host to gain insight about any of the collection hosts…a one-stop-shop as it were.
You can either push or pull data. If you push data from the collectors they will incur some additional overhead. If you pull from the monitoring host, it will incur the additional cost, which is less likely to skew your performance metrics from the collection hosts.
I’ve also seen some data saying that a monitoring host should be capped somewhere around a thousand collectors.
Playbooks
All of my playbooks can be found here in my git repository.
pcp-install.yml. This playbook connects to PCP collectors and configures them to both collect locally and prepares them to allow monitoring hosts to access them:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 | --- - name: Install/configure PCP on various hosts hosts: pcp-hosts gather_facts: false vars: # Services to be enabled/started enable_services: - pmcd - pmlogger # The subnets or ranges of hosts allowed to connect to clients to fetch info remote_subnets: - 10.0.* - 192.168.5.10 tasks: # dnf install required pcp packages - name: Install pcp packages ansible.builtin.dnf: name: "{{ item }}" state: latest loop: - pcp - pcp-system-tools notify: restart pcp - name: Configure the pmcd process(add all of the allowed subnets) ansible.builtin.blockinfile: path: /etc/pcp/pmcd/pmcd.conf block: "{{ lookup('ansible.builtin.template', 'pmcd-access.j2') }}" insertafter: "\\[access\\]" notify: restart pcp - name: Configure the pmcd options to listen on the correct IP ansible.builtin.lineinfile: path: /etc/pcp/pmcd/pmcd.options line: "-i {{ hostvars[inventory_hostname].ansible_host }}" - name: Enable pmcd listening ports on firewall ansible.posix.firewalld: port: 44321/tcp permanent: true immediate: true state: enabled ignore_errors: true - name: Enable selinux for pmcd services ansible.builtin.shell: "{{ item }}" ignore_errors: true loop: - setsebool -P pcp_read_generic_logs on - setsebool -P pcp_bind_all_unreserved_ports on - name: Start and enable pcp services ansible.builtin.service: name: "{{ item }}" state: started enabled: true loop: "{{ enable_services }}" handlers: - name: restart pcp ansible.builtin.service: name: "{{ item }}" state: restarted loop: "{{ enable_services }}" |
I’m going to point out some things of note in the above playbook.
First is the remote_subnets variable. This should be populated with the IP or subnet of your monitoring hosts. It’s essentially an access list of who is allowed to connect in to retrieve PCP data.
Most of the tasks are pretty straightforward, but I thought I would have a look at one that includes a jinja2 template:
1 2 3 4 5 6 | - name: Configure the pmcd process(add all of the allowed subnets) ansible.builtin.blockinfile: path: /etc/pcp/pmcd/pmcd.conf block: "{{ lookup('ansible.builtin.template', 'pmcd-access.j2') }}" insertafter: "\\[access\\]" notify: restart pcp |
This replaces a block of code using the blockinfile module, but I’m pulling that block from a dynamic j2 template(in the templates folder) named pmcd-access.j2:
1 2 3 | {% for item in remote_subnets %} allow hosts {{ item }} : fetch; {% endfor %} |
Taking a look at the template above you can see I have a simple “for loop”. I loop over the contents of remote_subnets and fill out the allow hosts section based on it. Anything inside of {% %} is omitted from the actual output of the template.
Now that the PCP collectors are installed and configured, I’ll run the pcp-monitor.yml playbook to configure the monitor host:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 | --- - name: Install/configure PCP monitor host hosts: pcp-monitor gather_facts: false vars: # Services to be enabled/started enable_services: # - pmcd - pmlogger collection_directory: /var/log/pcp/pmlogger/ # Do you want to set the pmlogger config files to use host IP address instead of inventory_hostname config_via_host: true tasks: # - name: debug data # ansible.builtin.debug: # var: hostvars[item] # loop: "{{ groups['pcp-hosts'] }}" - name: Install pcp packages ansible.builtin.dnf: name: "{{ item }}" state: latest loop: - pcp - pcp-system-tools notify: restart pcp - name: Create config file for each pcp-host ansible.builtin.template: src: pmlogger-monitor.j2 dest: "/etc/pcp/pmlogger/control.d/{{ item }}" loop: "{{ groups['pcp-hosts'] }}" notify: restart pcp - name: Create collector host directories by looping over pcp-hosts group ansible.builtin.file: path: "{{ collection_directory }}{{ item }}" state: directory mode: '0777' loop: "{{ groups['pcp-hosts'] }}" - name: Start and enable pcp services ansible.builtin.service: name: "{{ item }}" state: started enabled: true loop: "{{ enable_services }}" handlers: - name: restart pcp ansible.builtin.service: name: "{{ item }}" state: restarted loop: "{{ enable_services }}" |
Again, I’ll try and point out the less obvious or perhaps more interesting parts of the above playbook.
The variable collection_directory is where the collected PCP data from the collectors will be stored.
The config_via_host variable is one I put in especially for my lab environment. When the config files are created, they point to a host to collect. If this variable is set to true, then the host’s IP address will be used. If it’s set to false, then the inventory_hostname will be used(generally a Fully Qualified Domain Name(FQDN)).
In the previous playbook I used a template, and I’m using one here in the monitor host configuration also in the following task:
1 2 3 4 5 6 | - name: Create config file for each pcp-host ansible.builtin.template: src: pmlogger-monitor.j2 dest: "/etc/pcp/pmlogger/control.d/{{ item }}" loop: "{{ groups['pcp-hosts'] }}" notify: restart pcp |
Here I’m using the template module directory rather than the template lookup plugin. Let’s examine the reference pmlogger-monitor.j2 template:
1 2 3 4 5 | {% if config_via_host %} {{ hostvars[item].ansible_host }} n n PCP_LOG_DIR/pmlogger/{{ item }} -r -T24h10m -c config.{{ item }} {% else %} {{ item }} n n PCP_LOG_DIR/pmlogger/{{ item }} -r -T24h10m -c config.{{ item }} {% endif %} |
This one uses a conditional “if else” statement, rather than just a loop. This is where I check if the collector host should be referenced via the inventory_hostname or via the ansible_host.
Executing/Troubleshooting Automation
Configure/Install/Troubleshoot Collector
Once you’ve added your inventories, projects, credentials and job templates, you can execute the automaton for installing the collectors:
If you want to test the collector host, you can pretty easily do it by SSHing in and issuing the “pcp” command:
If the monitor is getting “connection refused”, be sure the check the listening ports on the collector with “ss -tlp | grep 44321”:
Configure/Install/Troubleshoot Monitor
Once you run the monitor playbook you should see the successful message:
Now, if you want to test the monitor host, you can SSH into it and check the collection_directory. In my case I had it as “/var/log/pcp/pmlogger/”:
You can see here my PCP collector Greg-rocky9 folder is showing up, but is there data inside?:
This folder is full of data. If it wasn’t I would do a “tail pmlogger.log” in that folder to get an idea of what was happening:
Conclusion
While PCP data may not be for everyone, it can, quite easily, be configured. The trick about performance data is that if you have a performance issue, you can’t go back in time and enable the data collection, so why not go ahead and start collecting BEFORE there’s an issue 🙂.
As always, thanks for reading. If you have any questions or comments, I’d love to hear them. If you use PCP in your environment, I’d love to hear about that also! If we can help you on your automation journey, please reach out to me.
Good luck and happy PCP automating!
Hey everybody, I’m Greg Sowell and this is Why Am I, a podcast where I talk to interesting people and try to trace a path to where they find themselves today. My guest this go around is Geoffrey Mark. This cat started in show business at 15, and hasn’t stopped since. He done Broadway, he dances, sings, is a comedian, and just to fill some time has also been a best selling author. He has some valuable advice on grief, life, and talent. Oh, and don’t forget to sparkle like Geoff. Help us grow by sharing with someone!
Please show them some love on their socials here: https://www.instagram.com/geoffreymarkshowbiz/?hl=en,
https://twitter.com/thegeoffreymark,
https://www.facebook.com/groups/478789255814114/,
If you want to support the podcast you can do so via https://www.patreon.com/whyamipod (this gives you access to bonus content including their Fantasy Restaurant!)
Migrating from one Linux major version to another never seems to be a simple task, but through the magic of automation, it can be a lot simpler and reproducible. I’m going to cover the Ansible playbooks I created to do the work, then I’ll execute it using our enterprise automation platform called Ascender.
Our recommended method is to:
– Backup configuration and data from the old server
– Provision a brand new server with the required apps
– Restore configurations and data to the new server
– Test services to the new server
– Sunset the old server
Video Demo
Playbooks
First, I’m using resources from the community.general collection found here. I actually have a version of it included in my git repository.
All of my playbooks can be found here in my git repository.
I’ll cover some of the playbooks here… mostly discussing the highlights. The discover-backup.yml playbook is the first playbook run:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 | --- - name: Discover/backup hosts to be migrated hosts: migration-hosts gather_facts: false vars: # The host to store backup info to backup_storage: backup-storage # The location on the backup host to store info backup_location: /tmp/migration tasks: - name: Execute rpm to get list of installed packages ansible.builtin.command: rpm -qa --qf "%{NAME} %{VERSION}-%{RELEASE}\n" register: rpm_query - name: Populate service facts - look for running services ansible.builtin.service_facts: # - name: Print service facts # ansible.builtin.debug: # var: ansible_facts.services - name: Create backup directory on backup server - unique for each host ansible.builtin.file: path: "{{ backup_location }}/{{ inventory_hostname }}" state: directory mode: '0733' delegate_to: "{{ backup_storage }}" # - name: Backup groups # ansible.builtin.include_tasks: # file: group-backup.yml - name: Backup Apache when httpd is installed and enabled when: item is search('httpd ') and ansible_facts.services['httpd.service'].status == 'enabled' ansible.builtin.include_tasks: file: apache-backup.yml loop: "{{ rpm_query.stdout_lines }}" |
In the above, the first task I run uses the RPM command to gather information on all of the installed packages. Generally, I prefer to use a purpose-built module if one exists. In this instance, the ansible.builtin.package_facts module is designed to do this, but I found it didn’t always report correctly for Centos7 servers, so I went with the RPM command as it always works. This list of apps will be used towards the bottom.
Next, I create a directory for each host on a backup server. This will be the repository for all of my configs and data backed up from the old server.
The last task is where the real work happens. I loop over the list of the installed packages on the server and check if one is the Apache service and if it is enabled. If those conditions are met, it will pull in the apache-backup.yml task file. This task file is something I created to backup things from my environment. If I had FTP services on some of my servers, I would also need an ftp-backup task file and an additional matching task, just like the apache-backup file.
The apache-backup.yml file is actually fairly simple:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | # Task file for backing up apache # Backup apache config files - name: Create an archive of the config files community.general.archive: path: /etc/httpd/con* dest: "/tmp/{{ inventory_hostname }}-httpd.tgz" - name: Copy apache config files to ansible server ansible.builtin.fetch: src: "/tmp/{{ inventory_hostname }}-httpd.tgz" dest: "/tmp/{{ inventory_hostname }}-httpd.tgz" flat: true # Changes default fetch so it will save directly in destination - name: Copy config archive to backup server from local ansible server ansible.builtin.copy: src: "/tmp/{{ inventory_hostname }}-httpd.tgz" dest: "{{ backup_location }}/{{ inventory_hostname }}/{{ inventory_hostname }}-httpd.tgz" delegate_to: "{{ backup_storage }}" # Backup apache data files - name: Create an archive of the data directories community.general.archive: path: /var/www dest: "/tmp/{{ inventory_hostname }}-httpd-data.tgz" - name: Copy apache data files to ansible server ansible.builtin.fetch: src: "/tmp/{{ inventory_hostname }}-httpd-data.tgz" dest: "/tmp/{{ inventory_hostname }}-httpd-data.tgz" flat: true # Changes default fetch so it will save directly in destination - name: Copy data archive to backup server from local ansible server ansible.builtin.copy: src: "/tmp/{{ inventory_hostname }}-httpd-data.tgz" dest: "{{ backup_location }}/{{ inventory_hostname }}/{{ inventory_hostname }}-httpd-data.tgz" delegate_to: "{{ backup_storage }}" |
Taking a look at the above task file, you can see that it first creates an archive of the Apache configuration files. Really, it’s more or less a zip file.
It pulls the archive off the server, then pushes it over to a backup server.
It then repeats these actions for the data directories.
The next playbook is called provision-new-server.yml. I’ll leave you to look at it if you like, but it:
Connects to vcenter and provisions a new server
Waits for the server to pull an IP address
Adds the new host to the inventory via the Ascender API
Now that the old server is backed up and the new server has been provisioned, it’s time to restore some services on the new. This is done with the restore.yml playbook:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 | --- - name: Playbook to restore configs on new servers hosts: migration-hosts gather_facts: false vars: # The host to store backup info to backup_storage: backup-storage # The location on the backup host to store info backup_location: /tmp/migration tasks: - name: Set the restore server variables ansible.builtin.set_fact: restore_server: "new-{{ inventory_hostname }}" # - name: Debug restore_server # ansible.builtin.debug: # var: restore_server # grab a list of the files on the backup server for this host - name: Find all files in hosts' backup directories ansible.builtin.find: paths: "{{ backup_location }}/{{ inventory_hostname }}" # recurse: yes delegate_to: "{{ backup_storage }}" register: config_files # - name: Debug config_files # when: item.path is search(inventory_hostname + '-httpd.tgz') # ansible.builtin.debug: # var: config_files # loop: "{{ config_files.files }}" # for each task type, loop through backup files and see if they exist - call restore task file - name: If apache is installed, call install task file when: item.path is search(inventory_hostname + '-httpd.tgz') ansible.builtin.include_tasks: file: apache-restore.yml loop: "{{ config_files.files }}" |
The first task in the above sets a restore_server variable to the name of the new server. My playbooks I named the new server “new-{{ inventory_hostname }}”. This means it’s the name of the old server with “new-” on the front… not overly complex, but it does the trick.
The second task will search the backup folder’s directory and find all files that have been backed up for each host.
Somewhat similar to the backup procedure, the last task in the restore procedure is to loop over the files from the backup server, then calling task files for the various applications/packages. In this case, I’m looking for the Apache backup files, and when found, running the apache-restore.yml task file.
Next is to examine the apache-restore.yml file:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 | # Task file for installing and configuring apache # - name: Debug restore_server # ansible.builtin.debug: # var: restore_server # Install apache - name: Install apache ansible.builtin.dnf: name: httpd state: latest delegate_to: "{{ restore_server }}" - name: Copy apache config files to ansible server ansible.builtin.fetch: src: "{{ backup_location }}/{{ inventory_hostname }}/{{ inventory_hostname }}-httpd.tgz" dest: "/tmp/{{ inventory_hostname }}-httpd.tgz" flat: true # Changes default fetch so it will save directly in destination delegate_to: "{{ backup_storage }}" - name: Copy config archive to new server from local ansible server ansible.builtin.copy: src: "/tmp/{{ inventory_hostname }}-httpd.tgz" dest: "/tmp/{{ inventory_hostname }}-httpd.tgz" delegate_to: "{{ restore_server }}" - name: Extract config archive ansible.builtin.unarchive: src: "/tmp/{{ inventory_hostname }}-httpd.tgz" dest: /etc/httpd remote_src: true delegate_to: "{{ restore_server }}" - name: Copy apache data files to ansible server ansible.builtin.fetch: src: "{{ backup_location }}/{{ inventory_hostname }}/{{ inventory_hostname }}-httpd-data.tgz" dest: "/tmp/{{ inventory_hostname }}-httpd-data.tgz" flat: true # Changes default fetch so it will save directly in destination delegate_to: "{{ backup_storage }}" - name: Copy data archive to new server from local ansible server ansible.builtin.copy: src: "/tmp/{{ inventory_hostname }}-httpd-data.tgz" dest: "/tmp/{{ inventory_hostname }}-httpd-data.tgz" delegate_to: "{{ restore_server }}" - name: Extract config archive ansible.builtin.unarchive: src: "/tmp/{{ inventory_hostname }}-httpd-data.tgz" dest: /var/www remote_src: true delegate_to: "{{ restore_server }}" - name: Start service httpd and enable it on boot ansible.builtin.service: name: httpd state: started enabled: yes delegate_to: "{{ restore_server }}" |
The above is quite simple. First things first, I install Apache. Next I connect to the backup server, copy the archive config files over, and extract them. I then do the same thing for the data files. Last, I start and enable the Apache service.
After this, I run the suspend-old.yml playbook to pause the old VM.
Very last, I’ll run my testing playbooks that are designed for each app.
Ascender Configuration
I’ve covered adding inventories, projects, and job templates in other blog posts.
I will show the workflow template I created to tie all of the job templates together, though:
A workflow allows me to take playbooks of all sorts and string them together with branching on success or on failure logic. It also allows me to make my playbooks flexible and reusable.
Conclusion
Migrating infrastructure is often complex and time consuming, and while we can’t get more hours or employees to complete the task, we can employ our secret weapon, automation.
CIQ is ready to help you not only standup Ascender in your environment, but to also o experts at helping you migrate your infrastructure. We have tools to assist and at the end you have the automations for your environment ready for continued and future use!
As always, thanks for reading and I appreciate your feedback; happy migrating!
Hey everybody, I’m Greg Sowell and this is Why Am I, a podcast where I talk to interesting people and try to trace a path to where they find themselves today. My guest this go around is Jane Labowitch, better known as Princess Etch. As the royal name implies, she is an artist that uses an etch-a-sketch as her medium. In this chat, I follow her down the rabbit hole on how each etch performs differently and how it takes time to find the right one for the job. She also shares how the video you see is backed by hours upon hours of work that are completely invisible…man, I love artists I hope you enjoy this conversation with Jane. Help us grow by sharing with someone!
Please show them some love on their socials here: https://princessetch.com/,
https://www.tiktok.com/@princessetch,
https://www.instagram.com/princessetch/,
https://www.patreon.com/m/princessetch.
If you want to support the podcast you can do so via https://www.patreon.com/whyamipod (this gives you access to bonus content including their Fantasy Restaurant!)
Welcome to the warmup exercise for the Why Am I podcast called “the Fantasy Restaurant.” In here my guests get to pick their favorite: drink, appetizer, main, sides, and dessert…anything goes. Join us in the slow moving and beautiful south of France as we step inside of a building we are transported to DC 40 years ago to a bustling kitchen full of family and love. Oh, also there’s a bottomless basket of bacon LOL. I hope you enjoy this meal with Kelly. Help us grow by sharing with someone!
Please show them some love on their socials here: https://kellyedwards.co/,
https://www.instagram.com/kellyedwards_co/,
https://www.facebook.com/kellyedwardsco.
If you want to support the podcast you can do so via https://www.patreon.com/whyamipod (this gives you access to bonus content including their Fantasy Restaurant!)
Hey everybody, I’m Greg Sowell and this is Why Am I, a podcast where I talk to interesting people and try to trace a path to where they find themselves today. My guest this go around is Kelly Edwards. She’s got an impressive resume: former exec in LA, film producer, writer, teacher…but she only touches on those things. They don’t define the person inside of Kelly, because she’s a phoenix that is constantly being reborn into a completely different person. Not only that, but she opens her eyes each day excited to see who she’ll become. I hope you enjoy this chat with Kelly. Help us grow by sharing with someone!
Please show them some love on their socials here: https://kellyedwards.co/,
https://www.instagram.com/kellyedwards_co/,
https://www.facebook.com/kellyedwardsco.
If you want to support the podcast you can do so via https://www.patreon.com/whyamipod (this gives you access to bonus content including their Fantasy Restaurant!)