Ansible Check Mode Tips

Dev & Ops

Ansible has been the bread and butter of automation here at Nordeus for a few years now. We've used it to automate everything from system configurations to custom orchestrated application deploys. One neat feature that we heavily use, which almost every automation software has, is the check mode, or the "dry run" as some like to call it. The check mode doesn't make any changes on the system, it just simulates what changes will be made. To run Ansible in check mode, we need to pass --check or -C parameters, e.g.:

ansible-playbook deploy.yml --check 

Here are a few tips on how to use the Ansible check mode and how to get the most out of it.

Use It With --diff

When using the check mode by itself, we will only be able to see if some tasks will be changed or not, but it won't show us what changes will actually be made. To be able to see the actual changes, we will need to use it in conjunction with diff mode (--diff or -D). When diff mode is enabled, modules that support it (like e.g. template) will display differences between the current and the new state. This will make it easier for us to determine if the changes we want to make are correct. Basically, when we run playbooks in check mode, we always run them with diff:

ansible-playbook db_servers.yml -CD

and it's very easy to remember: -CD.

Starting from Ansible 2.1 more and more modules will support diff mode since it is very easy to implement it now.

Check Mode Should Always Work on the Second Run

When writing a role and/or playbook, make sure it always works in check mode on the second run. This means that after you run your role/playbook for the first time, you should always run it in check mode one more time just to check if it works correctly. This will allow us to check in the future if something will be changed on the system before making the actual changes.

Of course, it is great if a role/playbook can run in check mode when it's executed for the first time, but in most situations it doesn't make any sense to do that. For example, the following tasks, which are part of our Nginx role, install the Nginx repository, copy the repository key (that is then imported) and finally Nginx is installed.

---
- name: Install Nginx repository
  template:
    src: etc/yum.repos.d/nginx.repo.j2
    dest: /etc/yum.repos.d/nginx.repo
    mode: 0644
    owner: root
    group: root

- name: Copy Nginx repository key
  copy:
    src: etc/pki/rpm-gpg/RPM-GPG-KEY-nginx
    dest: /etc/pki/rpm-gpg/RPM-GPG-KEY-nginx
    mode: 0644
    owner: root
    group: root

- name: Import Nginx GPG repository key
  rpm_key:
    key: /etc/pki/rpm-gpg/RPM-GPG-KEY-nginx
    state: present

- name: Install nginx package
  yum:
    name: nginx
    state: present

If we run this role for the first time, it will fail on importing the GPG key since we didn't actually copy the key (because it's check mode). However, even if the import didn't fail, the installation of the nginx package would fail since it requires the Nginx repository. We could, of course, ignore these errors in check mode (which we will cover later), but this doesn't make any sense; if the repository and the repository key are not installed, we won't be able to import the key and install the package. So making this role work for the first time would make its check mode meaningless, not to mention that future tasks of our role modify configuration files, which also don't exist if Nginx is not installed.

One other reason why not concentrate on making roles/playbooks check mode compatible on the first run is because you will lose a lot of time thinking "will this task work in check mode or not?". This will slow you down and remove your focus from what's important, which is to automate tasks with Ansible.

Use always_run: yes / check_mode: no

The other reason why it's important to see if check mode works is because it doesn't always work out of the box. Let's see the following two tasks:

- name: Fetch list of network-scripts
  shell: /bin/ls /etc/sysconfig/network-scripts/ifcfg-*
  register: network_scripts
  changed_when: no

- name: Remove DNS from network scripts
  lineinfile:
    dest: '{{ item }}'
    regexp: '^DNS[0-9]+='
    state: absent
  with_items: '{{ network_scripts.stdout_lines }}'

The first task lists all network configuration files and registers the output of the ls command into the network_scripts variable, and the second task removes lines that match the regular expression ^DNS[0-9]+= from those files. If we run these tasks regularly, everything works fine, but if we run them in check mode, the first task is skipped while the second one fails because the network_scripts variable doesn't have a list of files to iterate over. The reason why this happens is because shell and command modules don't support check mode, so they are skipped. To overcome this problem, we can just add always_run: yes   to the shell task, which will tell Ansible to run the shell module regularly while in check mode. So it all looks like this:

- name: Fetch list of network-scripts
  shell: /bin/ls /etc/sysconfig/network-scripts/ifcfg-*
  register: network_scripts
  changed_when: no
  always_run: yes

- name: Remove DNS from network scripts
  lineinfile:
    dest: '{{ item }}'
    regexp: '^DNS[0-9]+='
    state: absent
  with_items: '{{ network_scripts.stdout_lines }}'
Be very careful when using always_run: yes!   If we set this on a task that makes modifications, this means Ansible will make the changes on the managed system when run in check mode.

In general, if we are using command or shell modules to read some information from the system, which we use later on in the play, then we will have to use always_run to make them work in check mode. If we use other Ansible modules, which read system information, we usually don't have to use always_run since most of them work correctly in check mode.

Starting with Ansible 2.0, we could've used the find module instead of executing the ls command with the shell module. In that case, we wouldn't need to use always_run since the find module works in check mode just as it does in normal execution.

Bare in mind that starting with Ansible 2.3 always_run: yes  has been replaced with check_mode: no. This basically means "never execute a task in check mode", which translates to always run it regularly. This new option also supports setting check_mode: yes, which will always run the task in check mode. We found this useful for integration tests for custom Ansible modules, but that is a whole different topic.

Skipping Tasks in Check Mode

Sometimes it can be useful to skip some tasks just in the check mode. This can come in handy if we have a task that will fail in check mode and there is no way to workaround this failure. For example, when we want to upgrade one of our apps, we create a new file called "prepare_for_upgrade" that tells the app to prepare for the upgrade procedure. Once the app finishes all on-going operations and is ready for upgrading, it renames that created file into "prepare_for_upgrade.finished". So we do something like this:

- name: Touch file to prepare app for upgrade
  file:
    dest: '{{ app_dir }}/prepare_for_upgrade'
    state: touch

- name: Wait for the app to be ready for upgrading
  stat:
    path: '{{ app_dir }}/prepare_for_upgrade.finished'
  register: result
  until: result.stat.exists == True
  retries: 60
  delay: 1

# Continue deploy...

If we run these tasks regularly, all is good. In check mode, though, we don't want to do the actual upgrade — we just want to test it — so the "prepare_for_upgrade" file is never created (since it's in check mode), and the stat task waiting for the file to be created never finds the file and fails after 60 seconds. But in check mode, we still want to test the rest of the upgrade steps, so we don't want it to fail. If we want to make this work, we could run an e.g. /bin/true, register its execution and, depending if it's skipped or not, set a fact:

- name: Run a command /bin/true
  command: /bin/true
  register: command_true

- set_fact:
    check_mode: '{{ command_true | skipped }}'

Now the check_mode variable will be set to true in check mode, otherwise it will be false. This means we could use this variable to skip the stat task:

- name: Wait for the app to be ready for upgrading
  stat:
    path: '{{ app_dir }}/prepare_for_upgrade.finished'
  register: result
  until: result.stat.exists == True
  retries: 60
  delay: 1
  when: not check_mode

Ansible 2.1 introduced a new magic variable — ansible_check_mode — which makes the whole process a lot easier. So for the previous example, we would just use:

- name: Wait for the app to be ready for upgrading
  stat:
    path: '{{ app_dir }}/prepare_for_upgrade.finished'
  register: result
  until: result.stat.exists == True
  retries: 60
  delay: 1
  when: not ansible_check_mode

Conclusion

Check mode is just one of the many features Ansible provides to give us more control over your infrastructures. It doesn't always work out of the box; it usually takes some additional effort to make it work correctly. But, once roles and playbooks are check mode compatible, we can easily know what changes will be made to our managed systems — before they are applied.

Strahinja Kustudic

Strahinja Kustudic

System Engineer

May 13, 2016