July 19, 2020

Migration Setup

If you are migrating to Amazon Web Services (AWS) and are looking to use newer instance types (specifically t3a) then you need to make sure that you have a number of modules loaded automatically. I’ve been migrating some Oracle Linux 7 hosts to AWS using CloudEndure and on the first trial run, I couldn’t work out why they were booting OK in the AWS EC2 console, but I couldn’t connect to them. It later also became apparent that for some reason even booting, wasn’t consistent.

There were 5 key items that needed to be completed as part of the migration:

  • enable and ensure the NVMe driver loads at boot
  • enable and ensure the ENA driver loads at boot
  • make sure that the network interface is consistent
  • allow SSM Session Manager to operate
  • enable basic OS metric collection

NVMe

NVMe based EBS volumes were introduced with the Nitro System at re:Invent 2017. Information on which systems and version drivers are setup by default can be found on the EBS volumes and NVMe section of the EC2 documentation.

At the time of writing the following systems and versions were listed as supported:

  • Amazon Linux 2
  • Amazon Linux AMI 2018.03
  • Ubuntu 14.04 (with linux-aws kernel) or later
  • Red Hat Enterprise Linux 7.4 or later
  • SUSE Linux Enterprise Server 12 SP2 or later
  • CentOS 7.4.1708 or later
  • FreeBSD 11.1 or later
  • Debian GNU/Linux 9 or later

While Oracle Linux 7 isn’t explicitly listed, it is a derivative of the Red Hat family of distributions (similar to CentOS). The difference is that the Oracle Linux systems I was dealing with was using the UEK (Unbreakable Enterprise Kernel).

The version the servers are running is 4.1.12-124.39.5.el7uek.x86_64.

Running modinfo nvme results in the following:

filename:       /lib/modules/4.1.12-124.39.5.el7uek.x86_64/kernel/drivers/nvme/host/nvme.ko
version:        1.0
license:        GPL
author:         Matthew Wilcox <willy@linux.intel.com>
srcversion:     E89CDBD5763CB8D94DD823D
alias:          pci:v0000106Bd00002001sv*sd*bc*sc*i*
alias:          pci:v*d*sv*sd*bc01sc08i02*
alias:          pci:v0000144Dd0000A822sv*sd*bc*sc*i*
alias:          pci:v0000144Dd0000A821sv*sd*bc*sc*i*
alias:          pci:v00001C5Fd00000540sv*sd*bc*sc*i*
alias:          pci:v00001C58d00000003sv*sd*bc*sc*i*
alias:          pci:v00008086d00005845sv*sd*bc*sc*i*
alias:          pci:v00008086d00000A54sv*sd*bc*sc*i*
alias:          pci:v00008086d00000A53sv*sd*bc*sc*i*
alias:          pci:v00008086d00000953sv*sd*bc*sc*i*
depends:        nvme-core
retpoline:      Y
intree:         Y
vermagic:       4.1.12-124.39.5.el7uek.x86_64 SMP mod_unload modversions
signat:         X509
signer:         Oracle CA Server
sig_key:        FA:68:C6:0B:61:05:75:81:01:2A:AF:53:95:37:95:B4:8E:9F:8B:ED
sig_hashalgo:   sha512
parm:           use_threaded_interrupts:int
parm:           use_cmb_sqes:use controller's memory buffer for I/O SQes (bool)
parm:           nvme_io_queues:set the number of nvme io queues (uint)

This means we have a driver and we can use. Let’s set it up.

This is pretty straight forward.

You need to add the driver to dracut and rebuild the initrd image.

echo 'add_drivers+=" nvme "' | sudo tee /etc/dracut.conf.d/aws.conf

Now rebuild the initrd image: sudo dracut -f -v

You can check easily if this has worked simply by checking that the new image has the module included as follows:

sudo lsinitrd | grep nvme
drwxr-xr-x   3 root     root            0 Jul 19 16:46 usr/lib/modules/4.1.12-124.39.5.el7uek.x86_64/kernel/drivers/nvme
drwxr-xr-x   2 root     root            0 Jul 19 16:46 usr/lib/modules/4.1.12-124.39.5.el7uek.x86_64/kernel/drivers/nvme/host
-rw-r--r--   1 root     root        90462 May 27 05:59 usr/lib/modules/4.1.12-124.39.5.el7uek.x86_64/kernel/drivers/nvme/host/nvme-core.ko
-rw-r--r--   1 root     root        59022 May 27 05:59 usr/lib/modules/4.1.12-124.39.5.el7uek.x86_64/kernel/drivers/nvme/host/nvme.ko

ENA

The ENA (Elastic Network Adapter) was introduced in June 2016 with initial support for X1 instances.

ENA is now supported across a number of instance types and Operating Systems. You can find the latest information in the EC2 ENA documentation.

The process is very similar to enabling NVMe.

First we make sure that the module exists, the version, then we inform dracut to install it, rebuild initrd and check: modinfo ena

filename:       /lib/modules/4.1.12-124.39.5.el7uek.x86_64/kernel/drivers/net/ethernet/amazon/ena/ena.ko
version:        1.1.2
license:        GPL
description:    Elastic Network Adapter (ENA)
author:         Amazon.com, Inc. or its affiliates
srcversion:     1CCD9807B601A1966B96ADD
alias:          pci:v00001D0Fd0000EC21sv*sd*bc*sc*i*
alias:          pci:v00001D0Fd0000EC20sv*sd*bc*sc*i*
alias:          pci:v00001D0Fd00001EC2sv*sd*bc*sc*i*
alias:          pci:v00001D0Fd00000EC2sv*sd*bc*sc*i*
depends:
retpoline:      Y
intree:         Y
vermagic:       4.1.12-124.39.5.el7uek.x86_64 SMP mod_unload modversions
signat:         X509
signer:         Oracle CA Server
sig_key:        FA:68:C6:0B:61:05:75:81:01:2A:AF:53:95:37:95:B4:8E:9F:8B:ED
sig_hashalgo:   sha512
parm:           debug:Debug level (0=none,...,16=all) (int)

echo 'add_drivers+=" ena "' | sudo tee /etc/dracut.conf.d/aws.conf

sudo dracut -f -v

sudo lsinitrd | grep ena
drwxr-xr-x   2 root     root            0 Jul 19 17:03 usr/lib/modules/4.1.12-124.39.5.el7uek.x86_64/kernel/drivers/net/ethernet/amazon/ena
-rw-r--r--   1 root     root       137278 May 27 05:59 usr/lib/modules/4.1.12-124.39.5.el7uek.x86_64/kernel/drivers/net/ethernet/amazon/ena/ena.ko

Consistent Network Interface Naming

In recent kernel versions interfaces have started to be named based on various information. To ensure that interfaces are named the good old eth you need to inform the kernel of this on boot:

net.ifnames=0

To make sure this is done as part of grub setup, you need to update the grub configuration and then rebuild the grub.cfg file.

sed -i 's/GRUB_CMDLINE_LINUX="/GRUB_CMDLINE_LINUX="net.ifnames=0 /' /etc/default/grub
grub2-mkconfig -o /boot/grub2/grub.cfg

You can check that grub is setup correctly by checking the grub.cfg file:

sudo grep ifnames /boot/grub2/grub.cfg
linux16 /vmlinuz-4.1.12-124.39.5.el7uek.x86_64 root=/dev/mapper/ol-root ro net.ifnames=0 crashkernel=auto vconsole.font=latarcyrheb-sun16 rd.lvm.lv=ol/swap rd.lvm.lv=ol/root vconsole.keymap=us   numa=off transparent_hugepage=never

Yours may look a little different, and every kernel line should have this configuration item.

Install and setup SSM Session Manager

This one is pretty easy, and we can set this up prior to the migration to AWS.

yum install -y https://s3.amazonaws.com/ec2-downloads-windows/SSMAgent/latest/linux_amd64/amazon-ssm-agent.rpm

Install and setup CloudWatch Agent

We can install the CloudWatch Agent now, but we have to setup the Agent after the migration. We can do this automatically with some help from CloudEndure.

CloudEndure has this concept of Post Launch Scripts that we can use to configure the agent after the instance is launched in AWS.

First let’s install the agent:

yum install -y https://s3.amazonaws.com/amazoncloudwatch-agent/centos/amd64/latest/amazon-cloudwatch-agent.rpm

Now let’s create a file with a basic metric configuration: /etc/basic-cloudwatch.json:

{
    "agent": {
        "metrics_collection_interval": 300,
        "run_as_user": "root"
    },
    "metrics": {
        "append_dimensions": {
            "InstanceId": "${aws:InstanceId}"
        },
        "metrics_collected": {
            "disk": {

                "drop_device": true,
                "ignore_file_system_types": [
                    "overlay",
                    "sysfs",
                    "devtmpfs",
                    "tmpfs",
                    "devtmpfs",
                    "nfs4"
                ],
                "measurement": [
                    "used_percent",
                    "inodes_free"
                ],
                "resources": [
                    "*"
                ]
            },
            "mem": {
                "measurement": [
                    "mem_used_percent"
                ]
            },
            "swap": {
                "measurement": [
                    "swap_used_percent"
                ]
            }
        }
    }
}

Now we configure a post_launch script /boot/post_launch/cwsetup.sh:

#!/bin/bash
/opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -s -m ec2 -c file:/etc/basic-cloudwatch.json

Putting it all together with Ansible

I needed to deploy this to a number of machines, so I created this Ansible playbook.

---
- hosts: all
  gather_facts: yes
  become: yes
  tasks:
    - name: setup dract for nvme and ena
      copy:
        dest: /etc/dracut.conf.d/aws.conf
        content: 'add_drivers+=" ena nvme "'
    - name: build a new initrd
      command: dracut -f -v
    - name: read in grub default
      shell: cat /etc/default/grub
      register: grubdefault
    - name: make eth the default
      command:
        cmd: sed -i 's/GRUB_CMDLINE_LINUX="/GRUB_CMDLINE_LINUX="net.ifnames=0 /' /etc/default/grub
        warn: no
      when: grubdefault.stdout.find('net.ifnames') == -1
    - name: build a new grub config
      command: grub2-mkconfig -o /boot/grub2/grub.cfg
    - name: install ssm agent
      yum:
        name: https://s3.amazonaws.com/ec2-downloads-windows/SSMAgent/latest/linux_amd64/amazon-ssm-agent.rpm
        state: present
    - name: install cloudwatch agent
      yum:
        name: https://s3.amazonaws.com/amazoncloudwatch-agent/centos/amd64/latest/amazon-cloudwatch-agent.rpm
        state: present
    - name: copy basic cloudwatch config
      copy:
        dest: /etc/basic-cloudwatch.json
        src: basic-cloudwatch.json
    - name: create post_launch directory
      file:
        path: /boot/post_launch
        state: directory
    - name: create a script to setup cloudwatch agent
      copy:
        dest: /boot/post_launch/cwsetup.sh
        content: |
          #!/bin/bash
          /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -s -m ec2 -c file:/etc/basic-cloudwatch.json
        mode: 0755

© Greg Cockburn

Powered by Hugo & Kiss.