Dec 24, 2024

Ansible VLAN deployment with MikroTik

1478 Words // ReadTime 6 Minutes, 43 Seconds

2024-12-24 13:00 +0100

Yet Another Ansible Post

As the year comes to a close, I can’t help but reflect on the progress I’ve made with Ansible in the past few weeks. Looking back, there’s a lot to be satisfied with. After automating my IPAM system and the startup/shutdown process for my lab, I decided to tackle a long-standing annoyance: deploying VLANs for new lab environments.

Goals

The main goal is to have a single file that describes all the required VLANs. This file should also allow me to delete or reuse VLANs as needed. Each VLAN must be deployed across three switches and configured as a tagged VLAN on specific ports.

Additionally, there are the peculiarities of MikroTik hardware to consider. When creating new VLANs on my Top-of-Rack (ToR) switch, MikroTik recommends disabling the L3 Hardware Offloading feature beforehand. In the past, I’ve encountered strange issues when this wasn’t done prior to creating new networks. Therefore, the automation should also handle disabling and re-enabling this feature as part of the process.

Future Goals

In the future, I plan to expand this setup further, integrating it with Ansible Tower or ArgoCD to enable management through pipelines.

A pipeline is essentially an automated workflow that takes a defined input—such as a configuration file or a code repository—and processes it through several steps to achieve a desired outcome. For example, in this context, a pipeline could validate my VLAN configuration, deploy it to the target switches, and handle any post-deployment tasks automatically.

My locally hosted Gitea instance will serve as the Source of Truth, housing the configuration files and acting as the central repository for all changes. This ensures consistency, version control, and a clear audit trail for every modification.

What I’ve Done

For this implementation, I approached things a bit differently compared to my last two Ansible projects—after all, learning is part of the process!

Project Structure

I structured the project as follows:

INI File: Stores the credentials for the three switches.
ansible.cfg: Manages the inventory file and SSH settings.
Playbook: A YAML file that runs without additional parameters.
Directories:
- group_vars/: Contains global variables shared across all devices.
  - all.yml: Includes all VLANs that should exist, along with descriptions.
- host_vars/: Contains per-device variables.
  - Each switch has its own YAML file defining interfaces and bridges.

Here’s the project structure visualized:

mikrotik/
├── ansible.cfg          # Configuration file for project Ansible settings
├── inventory.ini        # Inventory file with switch credentials
├── vlan_esx.yml         # Main playbook
├── group_vars/
│   └── all.yml          # Global VLAN definitions and descriptions
└── host_vars/
    ├── 192.168.0.1.yml      # Variables for Switch 1 (interfaces, bridges)
    ├── 192.168.0.5.yml      # Variables for Switch 2 (interfaces, bridges)
    └── 192.168.0.7.yml      # Variables for Switch 3 (interfaces, bridges)

Advantages of This Structure

One major advantage of this structure is its scalability. Adding another MikroTik switch is straightforward: I only need to create a new host_var file for the switch and update the inventory.ini file with its credentials.

Additionally, since the playbook runs without requiring extra parameters, it simplifies the GitOps approach I plan to implement later. This means the entire process becomes more streamlined and easily automatable through pipelines, reducing complexity and potential for errors.

Project Files

Here are the files I created as part of this project:

ansible.cfg

This file configures Ansible with the necessary inventory and SSH settings.

[defaults]
host_key_checking = False
transport = ssh
inventory = mikrotik.ini

[ssh_connection]
ssh_type = libssh
timeout = 60

inventory.ini

[mikrotik]
192.168.0.1 ansible_user=admin ansible_password="xxx" ansible_connection=network_cli ansible_network_os=community.network.routeros
192.168.0.5 ansible_user=admin ansible_password="xxx" ansible_connection=network_cli ansible_network_os=community.network.routeros 
192.168.0.7 ansible_user=admin ansible_password="xxx" ansible_connection=network_cli ansible_network_os=community.network.routeros

vlan_esx.yml

---
- name: Manage VLANs with delete option
  hosts: mikrotik
  gather_facts: no
  collections:
    - community.network

  tasks:
    - name: Disable L3 HW Offloading
      community.network.routeros_command:
        commands:
          - "/interface ethernet switch set [find name=switch1] l3-hw-offloading=no"
      when: ansible_host == '192.168.0.1'
      register: hw_offload_result
    
    - name: Delete VLANs marked for deletion
      community.network.routeros_command:
        commands:
          - "/interface bridge vlan remove [find vlan-ids={{ item.id }}]"
      with_items: "{{ vlans }}"
      when: item.delete | bool

    - name: Configure VLANs on the bridge
      community.network.routeros_command:
        commands:
          - "/interface bridge vlan remove [find vlan-ids={{ item.id }}]"
          - "/interface bridge vlan add bridge={{ bridge }} vlan-ids={{ item.id }} tagged={{ interfaces | join(',') }}"
      with_items: "{{ vlans }}"
      when: not item.delete | bool

    - name: Set description for each VLAN
      community.network.routeros_command:
        commands:
          - "/interface bridge vlan comment [find vlan-ids={{ item.id }}] comment=\"{{ item.description }}\""
      with_items: "{{ vlans }}"
      when: not item.delete | bool

    - name: Enable L3 HW Offloading
      community.network.routeros_command:
        commands:
          - "/interface ethernet switch set [find name=switch1] l3-hw-offloading=yes"
      when: ansible_host == '192.168.0.1'
      register: hw_offload_result

group_vars/all.yml

vlans:
  - id: 4
    description: "vMotion"
    delete: false
  - id: 12
    description: "ESXi MGMT"
    delete: false
  - id: 14
    description: "NSXB Host Tep"
    delete: false
  - id: 15
    description: "NSXB Edge Tep"
    delete: false
  - id: 20
    description: "K3s"
    delete: false
  - id: 31
    description: "NSXB Uplink1"
    delete: false
  - id: 41
    description: "NSXB Uplink2"
    delete: false
  - id: 50
    description: "RTEP NSX Federation"
    delete: false
  - id: 69
    description: "vSAN"
    delete: false
  - id: 200
    description: "VCF VM MGMT"
    delete: false
  - id: 201
    description: "VCF MGMT"
    delete: false
  - id: 202
    description: "VCF vSAN"
    delete: false
  - id: 203
    description: "VCF vSAN"
    delete: false
  - id: 204
    description: "VCF HostTEP"
    delete: false
  - id: 205
    description: "VCF EdgeTEP"
    delete: false
  - id: 206
    description: " "
    delete: true
  - id: 207
    description: " "
    delete: true
  - id: 208
    description: " "
    delete: true
  - id: 209
    description: " "
    delete: true
  - id: 211
    description: " "
    delete: true
  - id: 212
    description: " "
    delete: true

host_vars/switch1.yml

Each file defines the specific interfaces and bridges for a switch. Example for 192.168.0.5.yml

interfaces:
  - 10_bonding_SWA02
  - 00_bonding_CoreRouter
  - 01_ether1_ESX01_1
  - 01_ether2_ESX01_2
  - 02_qsfpplus1-1_ESX02_1
  - 03_qsfpplus1-2_ESX03_1
  - 04_qsfpplus1-3_ESX04_1
  - 05_qsfpplus1-4_ESX05_1
  - 07_ether3_ESX07_1
  - 07_ether4_ESX07_2
  - 08_ether5_ESX08_1
  - 08_ether6_ESX08_2 
  - 09_ether7_ESX09_1
  - 09_ether8_ESX09_2
bridge: bridge

What Does the Playbook Do?

This playbook is designed to manage VLANs on MikroTik switches, including the ability to delete or configure VLANs. It uses the community.network collection and skips fact-gathering since it only executes specific commands on the devices.

Step-by-Step Breakdown

Disable L3 Hardware Offloading
The playbook begins by disabling the L3 Hardware Offloading feature on the switch named Tor Switch. This step is necessary because MikroTik recommends turning off this feature before making VLAN configuration changes. The command is executed only if the host IP is 192.168.0.1. The result is stored in the variable hw_offload_result.
Delete VLANs Marked for Deletion
VLANs that are marked for deletion in the vlans variable (delete: true) are removed. The playbook iterates through the list of VLANs and executes the removal command for each VLAN marked as deleted.
Configure VLANs
For VLANs that are not marked for deletion, the playbook configures them as follows:
- Removes any existing VLAN with the same ID to avoid conflicts.
- Adds the VLAN to the specified bridge and assigns it to the interfaces defined as tagged. This ensures a clean and consistent configuration.
Set VLAN Descriptions
The playbook adds a description to each VLAN that is not marked for deletion. It uses the comment function in MikroTik and sets the description based on the variables provided.
Enable L3 Hardware Offloading
Finally, the playbook re-enables the L3 Hardware Offloading feature on ToR Switch, but only if the host IP is 192.168.0.1. The result of this step is also stored in the hw_offload_result variable.

Summary

This playbook automates the entire VLAN management process:

Disables the L3 offloading feature when required.
Deletes VLANs marked for removal.
Configures new VLANs, ensuring no conflicts.
Sets descriptions for the VLANs.
Re-enables the L3 offloading feature after the configuration.

The structure ensures reliability and consistency, handling edge cases such as existing VLANs and hardware offloading quirks automatically.

What’s Left to Do

There are several improvements I plan to make to this playbook in the future:

Clean Up Variables
Currently, there are some unused variables in the playbook that I need to clean up to keep the codebase tidy and maintainable.
Enhanced Logic for VLAN Checks
I want to extend the logic to verify if a VLAN already matches the desired target configuration. This would prevent unnecessary deletion and re-creation of VLANs, reducing downtime and ensuring a smoother operation.
Improved Error Handling
Better error handling is a priority to ensure the playbook gracefully recovers from unexpected issues, such as failed commands or unreachable devices.
Pipeline Integration
The ultimate goal is to integrate the playbook into a pipeline, enabling automated execution through tools like Ansible Tower or ArgoCD. This would streamline the entire process and align it with a GitOps approach.
Distributed Switch Integration
It’s also conceivable to extend the functionality by adding new VLANs directly to the distributed switch in my vSphere environment. However, this would be handled in a separate playbook to maintain modularity. The pipeline would then orchestrate both playbooks to ensure a seamless configuration process.

By addressing these points, the project will become more robust, scalable, and aligned with modern automation practices.