Ansible roles are the fundamental building blocks of maintainable automation. A well-structured role is portable across projects, testable in isolation, and transparent enough that a new team member can understand it without reading a novel of comments. A poorly structured one becomes the thing nobody wants to touch -- the role that only works on that one server, with that one inventory file, run by that one person who left the company six months ago.

This article covers the patterns that separate roles built for a single playbook from roles built for an organization. We'll walk through Molecule testing from scratch, dissect Ansible's notoriously complex variable precedence system, and examine handler strategies that prevent the subtle bugs that surface at 2 AM during a production deployment.

Anatomy of a Scalable Role

Every Ansible role follows a standard directory structure historically generated by ansible-galaxy init (or the newer ansible-creator tool, which is now the recommended scaffolding method). But understanding which directories actually matter -- and how to use them correctly -- is what separates a role that works from one that scales. According to the official Ansible documentation, roles automatically load related vars, files, tasks, handlers, and other artifacts based on a known file structure.

role directory structure
# Standard Ansible role layout
roles/nginx/
├── defaults/
│   └── main.yml          # Variables users SHOULD override
├── vars/
│   └── main.yml          # Variables users should NOT override
├── tasks/
│   ├── main.yml          # Entry point -- delegates to subtasks
│   ├── install.yml
│   ├── configure.yml
│   └── service.yml
├── handlers/
│   └── main.yml          # Event-driven tasks (restarts, reloads)
├── templates/
│   └── nginx.conf.j2     # Jinja2 templates
├── files/                # Static files to copy
├── meta/
│   ├── main.yml          # Role metadata + dependencies
│   └── argument_specs.yml # Input validation (Ansible 2.11+)
├── molecule/             # Test scenarios
│   └── default/
└── README.md

The critical distinction here is between defaults/ and vars/. Variables in defaults/main.yml sit at the absolute bottom of Ansible's precedence hierarchy -- they exist specifically to be overridden by inventory variables, group_vars, host_vars, or playbook-level declarations. Variables in vars/main.yml sit much higher in the precedence stack and are meant for internal role constants that consumers should not change. Getting this wrong is the single most common source of "why isn't my variable taking effect?" bugs in Ansible.

Pro Tip

Prefix every variable in your role with the role name. A role called nginx should define nginx_worker_processes, not worker_processes. This prevents collisions when multiple roles are composed in a single playbook -- a problem the Red Hat Communities of Practice documentation explicitly warns about.

A good defaults/main.yml reads like an API contract. It documents every knob the consumer can turn, with sensible defaults that work out of the box.

roles/nginx/defaults/main.yml
# roles/nginx/defaults/main.yml
# All variables are prefixed with the role name to prevent
# collisions in multi-role playbooks.

# Package management
nginx_package_name: "nginx"
nginx_package_state: "present"

# Service configuration
nginx_service_name: "nginx"
nginx_service_enabled: true
nginx_service_state: "started"

# Core nginx.conf tunables
nginx_worker_processes: "auto"
nginx_worker_connections: 1024
nginx_keepalive_timeout: 65
nginx_multi_accept: true

# Log paths
nginx_access_log: "/var/log/nginx/access.log"
nginx_error_log: "/var/log/nginx/error.log"

# Virtual hosts
nginx_vhosts: []

# SSL defaults
nginx_ssl_protocols: "TLSv1.2 TLSv1.3"
nginx_ssl_ciphers: "HIGH:!aNULL:!MD5"

Task Decomposition

The tasks/main.yml file should be a dispatcher, not a monolith. Split tasks into logical subtask files and include them in order. This improves readability, makes it easier to conditionally skip entire phases, and simplifies debugging when things go wrong.

roles/nginx/tasks/main.yml
# roles/nginx/tasks/main.yml
# Entry point -- orchestrates subtask includes
---
- name: Include OS-specific variables
  ansible.builtin.include_vars: "{{ ansible_os_family | lower }}.yml"

- name: Install nginx packages
  ansible.builtin.include_tasks: install.yml

- name: Configure nginx
  ansible.builtin.include_tasks: configure.yml

- name: Configure virtual hosts
  ansible.builtin.include_tasks: vhosts.yml
  when: nginx_vhosts | length > 0

- name: Ensure nginx service state
  ansible.builtin.include_tasks: service.yml

Notice the include_vars call at the top. This pattern loads OS-family-specific variables (package names, config paths, service names) from files like vars/debian.yml and vars/redhat.yml, making the role portable across distributions without cluttering every task with when: ansible_os_family == 'Debian' conditionals.

Validate Inputs with argument_specs

Starting with Ansible 2.11, roles can define an argument_specs section in meta/main.yml (or a standalone meta/argument_specs.yml file) that formally documents and validates every variable the role accepts. This gives your role a machine-readable API contract: Ansible will reject playbook runs that pass invalid types, miss required variables, or supply values outside the allowed choices -- before a single task executes.

meta/argument_specs.yml
# meta/argument_specs.yml
# Validates role inputs before tasks execute
---
argument_specs:
  main:
    short_description: Install and configure nginx
    description:
      - Installs nginx, manages configuration, and configures virtual hosts.
    options:
      nginx_worker_processes:
        type: str
        default: "auto"
        description: Number of worker processes or "auto"
      nginx_worker_connections:
        type: int
        default: 1024
        description: Max simultaneous connections per worker
      nginx_ssl_protocols:
        type: str
        default: "TLSv1.2 TLSv1.3"
        description: Allowed TLS protocol versions
      nginx_vhosts:
        type: list
        elements: dict
        default: []
        description: List of virtual host configurations
      nginx_service_enabled:
        type: bool
        default: true
        description: Whether nginx starts on boot

When teams adopt argument_specs, they gain two things at once: automatic input validation at runtime and auto-generated documentation via ansible-doc -t role nginx (the role must be on your configured role path or installed within a collection for ansible-doc to discover it). For roles shared across an organization, this eliminates the "read the README to figure out what variables exist" problem and catches misconfiguration before it reaches a target host.

Variable Precedence: The 22 Layers of Pain

Ansible's variable precedence system is one of its most confusing aspects, and misunderstanding it accounts for a disproportionate share of debugging time. The official documentation lists the full precedence order from least to greatest priority. Understanding the critical layers -- and where role variables fall -- is essential for writing roles that behave predictably when composed.

Here is the precedence order from lowest (most easily overridden) to highest (overrides all others):

  1. Command line values (e.g., -u my_user -- these are not variables, but they do override defaults)
  2. Role defaults (defaults/main.yml) -- lowest-priority variables
  3. Inventory file or script group vars
  4. Inventory group_vars/all
  5. Playbook group_vars/all
  6. Inventory group_vars/*
  7. Playbook group_vars/*
  8. Inventory file or script host vars
  9. Inventory host_vars/*
  10. Playbook host_vars/*
  11. Host facts / cached set_facts
  12. Play vars
  13. Play vars_prompt
  14. Play vars_files
  15. Role vars (vars/main.yml)
  16. Block vars (only for tasks within the block)
  17. Task vars (only for the specific task)
  18. include_vars
  19. set_facts / registered vars
  20. Role params (when passed via roles: keyword)
  21. include params
  22. Extra vars (-e on the command line) -- highest priority
Caution

The gap between defaults/main.yml (level 2) and vars/main.yml (level 15) is enormous. Variables in vars/ override inventory variables, group_vars, host_vars, and play vars. If you put user-configurable values in vars/main.yml, consumers cannot override them through normal inventory mechanisms -- they'd need include_vars, set_fact, or --extra-vars to win.

The practical rule is straightforward: put everything the user should be able to change in defaults/main.yml. Put platform constants and internal implementation details in vars/main.yml. The Ansible documentation itself advises this approach, noting that variables in the defaults folder are designed to be easy to override while those in the vars directory are meant for values that should remain consistent. One additional nuance: variables created with set_fact using the cacheable: true option have high precedence during the current play, but when loaded from the fact cache in a subsequent play, they revert to the same precedence level as host facts -- a subtle distinction that can cause confusion in multi-play playbooks.

The include_vars Trap

One of the subtlest precedence issues involves include_vars. When you use include_vars inside a role (for example, to load OS-specific variables), the loaded variables take precedence level 18 -- above role vars, block vars, and task vars. However, set_fact and registered variables sit one level higher at 19, so they can override include_vars values. The real danger is that include_vars silently overrides inventory variables, group_vars, host_vars, and play vars -- meaning a user who set a value in their inventory may find it unexpectedly ignored.

the include_vars trap
# vars/debian.yml -- loaded via include_vars in tasks/main.yml
# WARNING: these will override play vars and role vars!
nginx_package_name: "nginx-full"    # overrides any play-level definition
nginx_conf_dir: "/etc/nginx"

# SAFE: only define platform-specific implementation details here,
# never user-facing defaults. Use unique variable names that
# don't collide with defaults/main.yml.
__nginx_os_package: "nginx-full"    # double-underscore = internal
__nginx_conf_path: "/etc/nginx"

The convention of double-underscore prefixing internal variables (like __nginx_os_package) was popularized by Jeff Geerling's roles on Ansible Galaxy and is now recognized as a community best practice -- the Red Hat Automation Good Practices guide explicitly cites his roles as prior art for this convention. This naming pattern signals that a variable is an internal implementation detail, not part of the role's public API, and prevents accidental collisions with user-defined variables. Note that Ansible itself treats underscore-prefixed variables identically to any other variable -- this is purely a community convention, not a language feature.

Debugging Precedence Issues

When a variable isn't behaving as expected, Ansible's verbose mode is your first tool. Running your playbook with increasing levels of -v flags reveals where each variable is coming from.

debugging variable precedence
# Level 1: task-level output
$ ansible-playbook site.yml -v

# Level 3: shows variable origins and connection details
$ ansible-playbook site.yml -vvv

# Nuclear option: print a variable and its type mid-play
$ ansible -m debug -a "var=hostvars[inventory_hostname]['nginx_worker_processes']" webservers

# Force a specific value to override everything
$ ansible-playbook site.yml --extra-vars "nginx_worker_processes=4"

Instead of worrying about variable precedence, we encourage you to think about how easily or how often you want to override a variable when deciding where to set it. -- Ansible Official Documentation

Testing Roles with Molecule

Molecule is the official testing framework for Ansible content, maintained as a Red Hat-backed project. It provisions ephemeral infrastructure (typically Docker containers), applies your role, verifies the result, checks for idempotency, and tears everything down. According to the Molecule project documentation, it is designed to support testing with multiple instances, operating systems, distributions, virtualization providers, and test scenarios.

The framework runs a well-defined test sequence. When you execute molecule test, the default sequence is:

  1. dependency -- install Galaxy requirements
  2. cleanup -- run cleanup playbook (if defined)
  3. destroy -- ensure no leftover infrastructure from previous runs
  4. syntax -- run ansible-playbook --syntax-check
  5. create -- provision test instances via the configured driver
  6. prepare -- run a preparatory playbook on fresh instances
  7. converge -- run the role against the instances
  8. idempotence -- run the role again, fail if anything reports changed
  9. side_effect -- optional playbook for external interactions
  10. verify -- run state-checking tests
  11. cleanup -- final cleanup
  12. destroy -- tear down all infrastructure

This sequence is fully customizable via the test_sequence key in molecule.yml. Teams often define shorter sequences for development (skipping idempotence and side_effect) and reserve the full sequence for CI pipelines.

Setting Up Molecule

Install Molecule alongside the Docker driver plugin. Podman is also supported if you're working in environments where Docker's daemon model isn't acceptable.

terminal
# Install Molecule with Docker support
$ pip install molecule molecule-plugins[docker]

# Verify installation
$ molecule --version

# Install linting tools
$ pip install ansible-lint yamllint

# Initialize Molecule in an existing role
$ cd roles/nginx
$ molecule init scenario -d docker

This generates the molecule/default/ directory containing three critical files: molecule.yml (the configuration), converge.yml (the playbook that applies your role), and verify.yml (the test assertions).

Multi-Platform molecule.yml

Testing on a single OS is a recipe for surprises. A properly configured molecule.yml tests across the distributions your role claims to support. Jeff Geerling maintains a set of Docker images with Ansible pre-installed and systemd enabled (the geerlingguy/docker-*-ansible images), which are the de facto standard for Molecule testing.

molecule/default/molecule.yml
# molecule/default/molecule.yml
---
dependency:
  name: galaxy
  options:
    requirements-file: requirements.yml

driver:
  name: docker

platforms:
  - name: nginx-ubuntu2204
    image: geerlingguy/docker-ubuntu2204-ansible:latest
    pre_build_image: true
    privileged: true
    volumes:
      - /sys/fs/cgroup:/sys/fs/cgroup:rw
    cgroupns_mode: host
    command: /usr/sbin/init

  - name: nginx-debian12
    image: geerlingguy/docker-debian12-ansible:latest
    pre_build_image: true
    privileged: true
    volumes:
      - /sys/fs/cgroup:/sys/fs/cgroup:rw
    cgroupns_mode: host
    command: /usr/sbin/init

  - name: nginx-rocky9
    image: geerlingguy/docker-rockylinux9-ansible:latest
    pre_build_image: true
    privileged: true
    volumes:
      - /sys/fs/cgroup:/sys/fs/cgroup:rw
    cgroupns_mode: host
    command: /usr/sbin/init

provisioner:
  name: ansible
  playbooks:
    converge: converge.yml
    verify: verify.yml

verifier:
  name: ansible
Note

The privileged: true and /sys/fs/cgroup volume mount are required for systemd to function inside Docker containers. Without them, any task that manages services via systemctl will fail. If you're testing roles that don't interact with systemd, you can omit these and use lighter-weight images.

Writing the Converge Playbook

The converge playbook is what Molecule runs against your test instances. Keep it minimal -- it should invoke the role and nothing else, unless you need to simulate a realistic integration context.

molecule/default/converge.yml
# molecule/default/converge.yml
---
- name: Converge
  hosts: all
  become: true

  vars:
    nginx_vhosts:
      - server_name: "test.local"
        root: "/var/www/test"
        listen: "80"

  roles:
    - role: "{{ lookup('env', 'MOLECULE_PROJECT_DIRECTORY') | basename }}"

Writing Verify Tests

The verify playbook runs after converge and asserts that the system is in the desired state. Molecule uses Ansible itself as the default verifier since version 3, replacing the earlier TestInfra default. The ansible.builtin.assert module is your primary tool here.

molecule/default/verify.yml
# molecule/default/verify.yml
---
- name: Verify nginx role
  hosts: all
  become: true
  gather_facts: true

  tasks:
    - name: Gather package facts
      ansible.builtin.package_facts:
        manager: auto

    - name: Assert nginx is installed
      ansible.builtin.assert:
        that:
          - "'nginx' in ansible_facts.packages or 'nginx-full' in ansible_facts.packages"
        fail_msg: "nginx package is not installed"

    - name: Assert nginx service is running
      ansible.builtin.service_facts:

    - name: Verify nginx service state
      ansible.builtin.assert:
        that:
          - "ansible_facts.services['nginx.service'].state == 'running'"
          - "ansible_facts.services['nginx.service'].status == 'enabled'"
        fail_msg: "nginx service is not running or not enabled"

    - name: Verify nginx configuration syntax
      ansible.builtin.command: nginx -t
      changed_when: false

    - name: Check nginx is listening on port 80
      ansible.builtin.wait_for:
        port: 80
        timeout: 10

    - name: Verify virtual host config exists
      ansible.builtin.stat:
        path: "/etc/nginx/sites-enabled/test.local.conf"
      register: vhost_config
      when: ansible_os_family == 'Debian'

    - name: Assert vhost config is present
      ansible.builtin.assert:
        that: vhost_config.stat.exists
      when: ansible_os_family == 'Debian'

The Idempotence Check

The idempotence phase is where Molecule runs your converge playbook a second time and fails if any task reports changed. This is arguably the single most valuable test Molecule performs. An idempotent role guarantees that running it twice produces the same state -- meaning your role won't introduce drift or trigger unnecessary service restarts on every execution.

Warning

Tasks using ansible.builtin.command or ansible.builtin.shell always report changed by default. You must add changed_when: false (for read-only commands) or a meaningful changed_when expression to prevent false positives in the idempotence check. For tasks that are inherently non-idempotent by nature (such as one-time provisioning steps), you can skip them during Molecule runs by adding tags: ["molecule-notest"] to the task -- Molecule automatically skips tasks with this tag during the idempotence phase.

Multiple Test Scenarios

A single "default" scenario is often insufficient. Molecule supports multiple named scenarios to test different configurations, edge cases, or integration contexts.

terminal
# Create an additional scenario for SSL testing
$ molecule init scenario --scenario-name with_ssl -d docker

# Run a specific scenario
$ molecule test -s with_ssl

# Run all scenarios
$ molecule test --all

# Development workflow: converge without full teardown
$ molecule converge           # apply the role
$ molecule verify             # run tests only
$ molecule login -h nginx-ubuntu2204  # SSH into instance
$ molecule idempotence        # check idempotence only
$ molecule destroy            # clean up when done

The development workflow of converge, verify, login, iterate is far faster than running molecule test each time, which destroys and recreates everything. Use molecule test for CI pipelines and final validation, and the individual subcommands for rapid development.

Handler Strategies That Won't Bite You

Handlers are one of Ansible's more elegant concepts: tasks that only run when notified, and that only run once regardless of how many tasks notify them. According to the official Ansible documentation, handlers are efficient because a handler only executes once even if multiple tasks trigger it -- preventing, for example, Apache from being bounced multiple times during a single playbook run.

But handlers have several behaviors that catch people off guard in production.

Handlers Run at the End of the Play

By default, all notified handlers run after the last task in the play completes. This means if task 3 notifies a handler to restart nginx, and task 15 tries to hit the nginx endpoint, the restart hasn't happened yet. The service is still running the old configuration.

The fix is meta: flush_handlers, which forces all pending handlers to execute immediately at that point in the play.

roles/nginx/tasks/configure.yml
# roles/nginx/tasks/configure.yml
---
- name: Template nginx.conf
  ansible.builtin.template:
    src: nginx.conf.j2
    dest: "{{ nginx_conf_dir }}/nginx.conf"
    owner: root
    group: root
    mode: '0644'
    validate: "nginx -t -c %s"
  notify: reload nginx

- name: Flush handlers to apply config now
  ansible.builtin.meta: flush_handlers

- name: Verify nginx is responding
  ansible.builtin.uri:
    url: http://localhost
    status_code: 200
  retries: 3
  delay: 2
Pro Tip

Add a meta: flush_handlers at the end of every role's tasks/main.yml. This ensures that handlers triggered by your role execute before the next role in the play begins, preventing cross-role ordering issues where role B depends on a service restart that role A notified but hasn't flushed yet.

The listen Keyword: Decoupled Handler Groups

Ansible 2.2 introduced the listen keyword, which allows multiple handlers to subscribe to a single topic. This is significantly more flexible than notifying handlers by name, and it decouples tasks from specific handler implementations. The official documentation notes that this feature is particularly useful when sharing handlers among playbooks and roles.

roles/nginx/handlers/main.yml
# roles/nginx/handlers/main.yml
---
- name: validate nginx config
  ansible.builtin.command: nginx -t
  changed_when: false
  listen: "reload nginx"

- name: reload nginx service
  ansible.builtin.systemd:
    name: "{{ nginx_service_name }}"
    state: reloaded
  listen: "reload nginx"

- name: restart nginx service
  ansible.builtin.systemd:
    name: "{{ nginx_service_name }}"
    state: restarted
  listen: "restart nginx"

When a task notifies reload nginx, both the validation handler and the reload handler execute -- in the order they're defined, not the order they were notified. The validation runs first, and if it fails, the reload never fires. This is a critical safety pattern: always validate configuration before restarting or reloading a service.

Handler Name Collisions Across Roles

Here's a subtle trap: handlers from roles are inserted into the global scope of the play, not scoped to their role. This means if you have two roles -- say nginx and apache -- and both define a handler named restart webserver, only the last one loaded will actually run. The Ansible documentation explicitly warns about this, recommending that you use the form role_name : handler_name when notifying handlers to ensure you trigger the correct one.

avoiding handler collisions
# BAD: generic handler names will collide across roles
notify: restart webserver

# GOOD: role-prefixed handler names
notify: restart nginx

# ALSO GOOD: fully qualified role:handler syntax
notify: nginx : restart nginx service

Prefer reload Over restart

When a service supports graceful reload (SIGHUP or an equivalent mechanism), always prefer state: reloaded over state: restarted. A reload picks up configuration changes without dropping active connections. A restart terminates all connections and starts the process fresh. For web servers, database proxies, and load balancers in production, the difference between a reload and a restart is the difference between zero-downtime deployment and a page of alerts.

CI/CD Integration

Molecule tests are only useful if they run automatically. Integrating Molecule into your CI pipeline ensures that every pull request against a role is validated before merge. GitHub Actions is the simplest path for open-source roles. For teams using Ansible Automation Platform, ansible-navigator provides an alternative execution model that wraps playbook runs in execution environments (container images with bundled collections and dependencies), which can also be integrated into CI pipelines for more production-representative testing.

.github/workflows/molecule.yml
# .github/workflows/molecule.yml
---
name: Molecule Test
on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  molecule:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        scenario:
          - default
          - with_ssl
      fail-fast: false

    steps:
      - name: Checkout
        uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.11'

      - name: Install dependencies
        run: |
          pip install molecule molecule-plugins[docker] ansible-lint yamllint

      - name: Run Molecule
        run: molecule test -s ${{ matrix.scenario }}
        env:
          MOLECULE_DISTRO: ubuntu2204

For GitLab CI, the approach is similar. The Sysbee engineering team documented their approach using service containers to run Molecule within GitLab CI, noting that systemd integration within Docker containers was the primary challenge they had to solve for reliable testing.

.gitlab-ci.yml
# .gitlab-ci.yml
---
stages:
  - test

molecule_test:
  stage: test
  image: docker:24-dind
  services:
    - docker:24-dind
  variables:
    DOCKER_HOST: tcp://docker:2376
    DOCKER_TLS_CERTDIR: "/certs"
    DOCKER_CERT_PATH: "/certs/client"
    DOCKER_TLS_VERIFY: "1"
  before_script:
    - apk add --no-cache python3 py3-pip gcc musl-dev python3-dev libffi-dev
    - pip install molecule[docker] ansible-lint --break-system-packages
  script:
    - molecule test
  rules:
    - changes:
        - roles/nginx/**

Patterns That Work at Scale

When you're managing a handful of roles, you can get away with ad-hoc conventions. When you're managing dozens -- across multiple teams, products, and environments -- you need enforced patterns.

Pin Everything

Use a requirements.yml file with explicit version pins for all role dependencies. Unpinned dependencies are the Ansible equivalent of npm install with no lockfile -- your playbook works today and breaks tomorrow because an upstream author pushed a breaking change.

requirements.yml
# requirements.yml -- pin all the things
# Check Ansible Galaxy for current versions before using
---
roles:
  - name: geerlingguy.docker
    version: "7.4.1"
  - name: geerlingguy.certbot
    version: "5.2.0"

collections:
  - name: ansible.posix
    version: "1.6.0"
  - name: community.general
    version: "9.0.0"

Use ansible-lint in CI

The ansible-lint tool catches deprecated module usage, style violations, and common mistakes. Run it alongside Molecule in your CI pipeline. It is an official Ansible-maintained project and has become a standard part of the Ansible development toolchain.

$ ansible-lint roles/nginx/

Use FQCNs Everywhere

Always use Fully Qualified Collection Names for modules: ansible.builtin.template instead of template, ansible.builtin.service instead of service. This eliminates ambiguity when multiple collections provide modules with the same short name, makes your roles forward-compatible with future Ansible versions, and is enforced by ansible-lint by default.

Loose Coupling Through Role Dependencies

You can declare role dependencies in meta/main.yml, but use this sparingly. Hard dependencies on external roles reduce flexibility and create version management headaches. The Red Hat Automation Good Practices guide warns that roles with hard dependencies on external roles have limited flexibility and increased risk that changes to the dependency will result in unexpected behavior or failures. Prefer soft dependency patterns where the consuming playbook explicitly includes both roles in the correct order.

Consider Collections for Distribution

The Ansible ecosystem has been steadily shifting toward collections as the primary distribution and packaging mechanism. While standalone Galaxy roles still work, collections bundle roles alongside plugins, modules, and documentation into a single versioned, namespaced package. If your organization is distributing roles across teams or publishing them externally, packaging them inside a collection gives you a unified versioning strategy, namespace isolation, and better dependency management. The ansible-creator tool can scaffold both standalone roles and collection structures (replacing the older ansible-galaxy init command, which is being phased out), and Molecule supports testing roles within collections natively.

Note

You don't have to choose one or the other. Many teams keep standalone roles during development and package them into a collection for distribution. The role structure itself doesn't change -- collections are a packaging layer, not a rewrite.

Wrapping Up

Scalable Ansible roles aren't about clever YAML tricks -- they're about discipline. Put user-configurable variables in defaults/, internal constants in vars/, and prefix everything with the role name. Define argument_specs so your role validates its inputs and documents its API automatically. Test with Molecule across every platform you claim to support, and run those tests in CI on every pull request. Use meta: flush_handlers to prevent cross-role timing issues, and always validate configuration before restarting a service.

The investment in proper role structure pays compound interest. A role that's tested, documented, and predictable gets reused. A role that's a fragile snowflake gets copied, forked, and eventually abandoned. Build for the former.

Use roles for everything. Even small projects benefit from role structure. It forces you to organize variables, templates, and handlers logically. -- DevToolbox, "Ansible: The Complete Guide for 2026"