Skip to content

Ansible Inventory for Slurm Worker Node Reboot

Executing a Simple Reboot on an ANSIBLE Playbook Targeting SLURM Nodes Outside of Headnode

When managing servers with SSH and automation tools like Ansible, targetting specific nodes within your inventory is essential. Here’s how to reboot a worker node in the context of Slurm using Ansible without directly executing from head-node privileges:

  1. Define Target Node Inventory (if not already defined): Ensure you have an ansible_inventory file with your nodes listed, including slurm target definitions if applicable. For simplicity here is a single host definition for node where the reboot task will be triggered from outside of head-node:

    [slurm_worker]
    node1 ansible_host=<ip>
    
  2. Create Ansible Playbook (or use ad hoc command): Below is a minimal play that targets node1 using your specified conditionals and executes the reboot:

    - name: Slurm Worker Node Reboot Task
      hosts: slurm_worker
      gather_facts: no
      remote_user: team
      become: true
      
      tasks:
        - name: Initiate system restart (with timeout)
          reboot: 
            reboot_timeout: '300' # Timeout set to wait for up to five minutes after issuing the command.
    

To execute this playbook, you would use a remote machine that meets all necessary conditions stated below and run your Ansible commands as follows:

ansible-playbook -i ansible_inventory path/to/your_reboot_script.yml --private-key=path/to/ssh-key here or provide ssh key forpassphrase if required`
   
If you'd prefer a simpler, direct ad hoc command instead of using the full playbook:
```sh
ansible node1 -u team -b "reboot_timeout=30s" # This will reboot 'node1', waiting up to 5 seconds (one second for each character) before reboots.

Remember, these commands presume that the remote machine has: SSH access with team user privileges on target node(s), Ansible installed along with necessary modules like system reboot module (become), and permission to execute shutdown operations (requiring appropriate sudo rights). To trigger a reboots remotely from any suitable host, ensure that your inventory file is accessible.

By following these guidelfales for execution outside of head-node privileges while using Slurm’s worker nodes with Ansible, you maintain flexibility and control over system administration tasks even in complex cluster environments.


Previous Post
How to Defragment an Etcd Database in a Kubernetes
Next Post
Understanding Shared Repository vs Fork and Pull M