If you are migrating to Amazon Web Services (AWS) and are looking to use newer instance types (specifically t3a) then you need to make sure that you have a number of modules loaded automatically. I’ve been migrating some Oracle Linux 7 hosts to AWS using CloudEndure and on the first trial run, I couldn’t work out why they were booting OK in the AWS EC2 console, but I couldn’t connect to them. It later also became apparent that for some reason even booting, wasn’t consistent.
There were 5 key items that needed to be completed as part of the migration:
- enable and ensure the NVMe driver loads at boot
- enable and ensure the ENA driver loads at boot
- make sure that the network interface is consistent
- allow SSM Session Manager to operate
- enable basic OS metric collection
NVMe
NVMe based EBS volumes were introduced with the Nitro System at re:Invent 2017. Information on which systems and version drivers are setup by default can be found on the EBS volumes and NVMe section of the EC2 documentation.
At the time of writing the following systems and versions were listed as supported:
- Amazon Linux 2
- Amazon Linux AMI 2018.03
- Ubuntu 14.04 (with linux-aws kernel) or later
- Red Hat Enterprise Linux 7.4 or later
- SUSE Linux Enterprise Server 12 SP2 or later
- CentOS 7.4.1708 or later
- FreeBSD 11.1 or later
- Debian GNU/Linux 9 or later
While Oracle Linux 7 isn’t explicitly listed, it is a derivative of the Red Hat family of distributions (similar to CentOS). The difference is that the Oracle Linux systems I was dealing with was using the UEK (Unbreakable Enterprise Kernel).
The version the servers are running is 4.1.12-124.39.5.el7uek.x86_64
.
Running modinfo nvme
results in the following:
filename: /lib/modules/4.1.12-124.39.5.el7uek.x86_64/kernel/drivers/nvme/host/nvme.ko
version: 1.0
license: GPL
author: Matthew Wilcox <willy@linux.intel.com>
srcversion: E89CDBD5763CB8D94DD823D
alias: pci:v0000106Bd00002001sv*sd*bc*sc*i*
alias: pci:v*d*sv*sd*bc01sc08i02*
alias: pci:v0000144Dd0000A822sv*sd*bc*sc*i*
alias: pci:v0000144Dd0000A821sv*sd*bc*sc*i*
alias: pci:v00001C5Fd00000540sv*sd*bc*sc*i*
alias: pci:v00001C58d00000003sv*sd*bc*sc*i*
alias: pci:v00008086d00005845sv*sd*bc*sc*i*
alias: pci:v00008086d00000A54sv*sd*bc*sc*i*
alias: pci:v00008086d00000A53sv*sd*bc*sc*i*
alias: pci:v00008086d00000953sv*sd*bc*sc*i*
depends: nvme-core
retpoline: Y
intree: Y
vermagic: 4.1.12-124.39.5.el7uek.x86_64 SMP mod_unload modversions
signat: X509
signer: Oracle CA Server
sig_key: FA:68:C6:0B:61:05:75:81:01:2A:AF:53:95:37:95:B4:8E:9F:8B:ED
sig_hashalgo: sha512
parm: use_threaded_interrupts:int
parm: use_cmb_sqes:use controller's memory buffer for I/O SQes (bool)
parm: nvme_io_queues:set the number of nvme io queues (uint)
This means we have a driver and we can use. Let’s set it up.
This is pretty straight forward.
You need to add the driver to dracut and rebuild the initrd image.
echo 'add_drivers+=" nvme "' | sudo tee /etc/dracut.conf.d/aws.conf
Now rebuild the initrd image: sudo dracut -f -v
You can check easily if this has worked simply by checking that the new image has the module included as follows:
sudo lsinitrd | grep nvme
drwxr-xr-x 3 root root 0 Jul 19 16:46 usr/lib/modules/4.1.12-124.39.5.el7uek.x86_64/kernel/drivers/nvme
drwxr-xr-x 2 root root 0 Jul 19 16:46 usr/lib/modules/4.1.12-124.39.5.el7uek.x86_64/kernel/drivers/nvme/host
-rw-r--r-- 1 root root 90462 May 27 05:59 usr/lib/modules/4.1.12-124.39.5.el7uek.x86_64/kernel/drivers/nvme/host/nvme-core.ko
-rw-r--r-- 1 root root 59022 May 27 05:59 usr/lib/modules/4.1.12-124.39.5.el7uek.x86_64/kernel/drivers/nvme/host/nvme.ko
ENA
The ENA (Elastic Network Adapter) was introduced in June 2016 with initial support for X1 instances.
ENA is now supported across a number of instance types and Operating Systems. You can find the latest information in the EC2 ENA documentation.
The process is very similar to enabling NVMe.
First we make sure that the module exists, the version, then we inform dracut to install it, rebuild initrd and check:
modinfo ena
filename: /lib/modules/4.1.12-124.39.5.el7uek.x86_64/kernel/drivers/net/ethernet/amazon/ena/ena.ko
version: 1.1.2
license: GPL
description: Elastic Network Adapter (ENA)
author: Amazon.com, Inc. or its affiliates
srcversion: 1CCD9807B601A1966B96ADD
alias: pci:v00001D0Fd0000EC21sv*sd*bc*sc*i*
alias: pci:v00001D0Fd0000EC20sv*sd*bc*sc*i*
alias: pci:v00001D0Fd00001EC2sv*sd*bc*sc*i*
alias: pci:v00001D0Fd00000EC2sv*sd*bc*sc*i*
depends:
retpoline: Y
intree: Y
vermagic: 4.1.12-124.39.5.el7uek.x86_64 SMP mod_unload modversions
signat: X509
signer: Oracle CA Server
sig_key: FA:68:C6:0B:61:05:75:81:01:2A:AF:53:95:37:95:B4:8E:9F:8B:ED
sig_hashalgo: sha512
parm: debug:Debug level (0=none,...,16=all) (int)
echo 'add_drivers+=" ena "' | sudo tee /etc/dracut.conf.d/aws.conf
sudo dracut -f -v
sudo lsinitrd | grep ena
drwxr-xr-x 2 root root 0 Jul 19 17:03 usr/lib/modules/4.1.12-124.39.5.el7uek.x86_64/kernel/drivers/net/ethernet/amazon/ena
-rw-r--r-- 1 root root 137278 May 27 05:59 usr/lib/modules/4.1.12-124.39.5.el7uek.x86_64/kernel/drivers/net/ethernet/amazon/ena/ena.ko
Consistent Network Interface Naming
In recent kernel versions interfaces have started to be named based on various information. To ensure that interfaces are named the good old eth
you need to inform the kernel of this on boot:
net.ifnames=0
To make sure this is done as part of grub setup, you need to update the grub configuration and then rebuild the grub.cfg
file.
sed -i 's/GRUB_CMDLINE_LINUX="/GRUB_CMDLINE_LINUX="net.ifnames=0 /' /etc/default/grub
grub2-mkconfig -o /boot/grub2/grub.cfg
You can check that grub is setup correctly by checking the grub.cfg
file:
sudo grep ifnames /boot/grub2/grub.cfg
linux16 /vmlinuz-4.1.12-124.39.5.el7uek.x86_64 root=/dev/mapper/ol-root ro net.ifnames=0 crashkernel=auto vconsole.font=latarcyrheb-sun16 rd.lvm.lv=ol/swap rd.lvm.lv=ol/root vconsole.keymap=us numa=off transparent_hugepage=never
Yours may look a little different, and every kernel line should have this configuration item.
Install and setup SSM Session Manager
This one is pretty easy, and we can set this up prior to the migration to AWS.
yum install -y https://s3.amazonaws.com/ec2-downloads-windows/SSMAgent/latest/linux_amd64/amazon-ssm-agent.rpm
Install and setup CloudWatch Agent
We can install the CloudWatch Agent now, but we have to setup the Agent after the migration. We can do this automatically with some help from CloudEndure.
CloudEndure has this concept of Post Launch Scripts that we can use to configure the agent after the instance is launched in AWS.
First let’s install the agent:
yum install -y https://s3.amazonaws.com/amazoncloudwatch-agent/centos/amd64/latest/amazon-cloudwatch-agent.rpm
Now let’s create a file with a basic metric configuration: /etc/basic-cloudwatch.json:
{
"agent": {
"metrics_collection_interval": 300,
"run_as_user": "root"
},
"metrics": {
"append_dimensions": {
"InstanceId": "${aws:InstanceId}"
},
"metrics_collected": {
"disk": {
"drop_device": true,
"ignore_file_system_types": [
"overlay",
"sysfs",
"devtmpfs",
"tmpfs",
"devtmpfs",
"nfs4"
],
"measurement": [
"used_percent",
"inodes_free"
],
"resources": [
"*"
]
},
"mem": {
"measurement": [
"mem_used_percent"
]
},
"swap": {
"measurement": [
"swap_used_percent"
]
}
}
}
}
Now we configure a post_launch script /boot/post_launch/cwsetup.sh:
#!/bin/bash
/opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -s -m ec2 -c file:/etc/basic-cloudwatch.json
Putting it all together with Ansible
I needed to deploy this to a number of machines, so I created this Ansible playbook.
---
- hosts: all
gather_facts: yes
become: yes
tasks:
- name: setup dract for nvme and ena
copy:
dest: /etc/dracut.conf.d/aws.conf
content: 'add_drivers+=" ena nvme "'
- name: build a new initrd
command: dracut -f -v
- name: read in grub default
shell: cat /etc/default/grub
register: grubdefault
- name: make eth the default
command:
cmd: sed -i 's/GRUB_CMDLINE_LINUX="/GRUB_CMDLINE_LINUX="net.ifnames=0 /' /etc/default/grub
warn: no
when: grubdefault.stdout.find('net.ifnames') == -1
- name: build a new grub config
command: grub2-mkconfig -o /boot/grub2/grub.cfg
- name: install ssm agent
yum:
name: https://s3.amazonaws.com/ec2-downloads-windows/SSMAgent/latest/linux_amd64/amazon-ssm-agent.rpm
state: present
- name: install cloudwatch agent
yum:
name: https://s3.amazonaws.com/amazoncloudwatch-agent/centos/amd64/latest/amazon-cloudwatch-agent.rpm
state: present
- name: copy basic cloudwatch config
copy:
dest: /etc/basic-cloudwatch.json
src: basic-cloudwatch.json
- name: create post_launch directory
file:
path: /boot/post_launch
state: directory
- name: create a script to setup cloudwatch agent
copy:
dest: /boot/post_launch/cwsetup.sh
content: |
#!/bin/bash
/opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -s -m ec2 -c file:/etc/basic-cloudwatch.json
mode: 0755