Select Page

How we can monitor application that runs on EC2 and reboot it if health check is down using Lambda Python code

Use Case: We have applications based on EC2 instances, for example, 5 EC2 instances with the same version of applications. In these instances, developer runs some testing with Chromeless containers. But under big overload sometimes we have an “out of memory” error. Dev would like to check the application and restart in case of 500 errors.

Was decided to build a Lambda function using Python 3 that dynamically checks instances, make some health check, and restart instance if application not running by the shell command.

If applications running in private subnet we should create a Lambda under the same subnet, to have access directly to instances. We can gather some information using boto3 and instance tags.

Also, you should be aware – you need to modify an instance and place user data to an instance. User data should contain scripts to automatically start the application in case of restarting.

Note: By default user data execute the script just when instance creating, but in case of the reboot (start/stop) – no. You should add
https://aws.amazon.com/premiumsupport/knowledge-center/execute-user-data-ec2/

 
Content-Type: multipart/mixed; boundary="//"
MIME-Version: 1.0

--//
Content-Type: text/cloud-config; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename="cloud-config.txt"

#cloud-config
cloud_final_modules:
- [scripts-user, always]

--//
Content-Type: text/x-shellscript; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename="userdata.txt"

#!/bin/bash
#Your script here
/bin/echo "Hello World" >> /tmp/testfile.txt 
--//

Lambda code:
This code should be in Lambda (Python 3.7)


import boto3
from botocore.vendored import requests

ec2 = boto3.client('ec2', 'us-west-1')

    
def lambda_handler(event, context):
    returnInstanceIp()
    

def generateData():
    filters = [{
         'Name': 'tag:Group',
         'Values': ['Manage']
            }]
    response = ec2.describe_instances(Filters=filters)
    return response
    
def makeHealthCheck(ipAddress):
    print(ipAddress)
    try:
        r = requests.get('http://' + ipAddress + ':8080/', timeout=2.001)
        print(r.status_code)
        return "true"
    except:
        return "false"
    
def returnInstanceIp(data=generateData()):
    instancesIps = []
    for reservation in data["Reservations"]:
        for instance in reservation["Instances"]:
            if instance['State']['Name'] == 'running':
                healthCheck = makeHealthCheck(instance['PrivateIpAddress'])
                print(instance['InstanceId'])
                print(healthCheck)
                if healthCheck == 'false':
                    print("Restart Instance " + instance['InstanceId'])
                    instanceId = instance['InstanceId']
                    response = ec2.reboot_instances(InstanceIds=[instanceId])
                else:
                    print("Good instance")

Basically this Lambda will check each instance with tag Group: Manage and make an HTTP request. If code equals 200 Lambda function just skip it. But in the case of code 5XX or 4XX she will restart the instance.

Off top:

You can check quizzes hereĀ https://www.devopscat.tech/quizzes/