AWS Batch enables developers, scientists, and engineers to quickly and efficiently run hundreds of thousands of batch computing jobs on AWS. AWS Batch dynamically provisions the optimal quantity and type of computing resources (e.g., CPU or memory optimized instances) based on the volume and specific resource requirements of the batch jobs submitted. This article will show how to work with AWS Batch in Python using the Boto3 library by implementing a job that imports records into the DynamoDB table from a file uploaded into the S3 bucket.
Table of contents
- What is AWS Batch?
- Features
- Job states
- Job scheduler
- Prerequisites
- Docker container
- DynamoDB table
- S3 bucket
- AWS Batch job’s IAM role
- Managing AWS Batch using Boto3
- Create AWS Batch compute environment
- Create AWS Batch job queue
- Register AWS Batch job definition
- Submit AWS Batch job for execution
- Summary
- Related articles
If you’re new to the Boto3 library, we encourage you to check out the Introduction to Boto3 library article.
What is AWS Batch?
AWS Batch plans, schedules, and executes your batch computing workloads across the full range of AWS compute services and features, such as AWS Fargate, Amazon EC2, and Spot Instances.
AWS Batch organizes its work into four components:
- Jobs – the unit of work submitted to AWS Batch, whether it be implemented as a shell script, executable, or Docker container image.
- Job Definition – describes how your work is executed, including the CPU and memory requirements and IAM role that provides access to other AWS services.
- Job Queues – listing of work to be completed by your Jobs. You can leverage multiple queues with different priority levels.
- Compute Environment – the compute resources that run your Jobs. Environments can be configured to be managed by AWS or on your own, as well as several types of instances on which Jobs will run. You can also allow AWS to select the right instance type.
Features
- EC2 Instances will run only for the needed time, taking advantage of per-second billing. You can also lower your costs by using spot instances.
- It’s possible to configure how many retries you’d like for any job.
- It offers queues where you send the jobs. Each queue could be configured with a certain priority so you can configure which jobs will run first. You can also have queues that use better resources to speed up the process.
- It supports Docker containers so that you can focus only on your code.
Job states
Here’s a list of AWS Batch jobs states:
- SUBMITTED: Accepted into the queue but not yet evaluated for execution
- PENDING: Your job dependencies on other jobs which have not yet been completed
- RUNNABLE: Your job has been evaluated by the scheduler and is ready to run
- STARTING: Your job is in the process of being scheduled to a compute resource
- RUNNING: Your job is currently running
- SUCCEEDED: Your job has finished with exit code 0
- FAILED: Your job finished with a non-zero exit code, was canceled or terminated
Job scheduler
The AWS Batch scheduler evaluates when, where, and how to run jobs submitted to a job queue. Jobs run in the order they are introduced as long as all dependencies on other jobs have been met.
Prerequisites
Let’s create a Docker container and IAM role for AWS Batch job execution, DynamoDB table, and S3 bucket.
Docker container
You can skip this section and use an already existing Docker image from Docker Hub: luckytuvshee/importuser:latest
.
First, we need to create a Docker image, which is responsible for the computing task we’ll run as an AWS Batch job.
Here’s a working folder structure:
The content of the Dockerfile
:
FROM amazonlinux:latestRUN yum -y install which unzip python3 pip3RUN pip3 install boto3ADD importUser.py /usr/local/bin/importUser.pyWORKDIR /tmpUSER nobodyENTRYPOINT ["/usr/local/bin/importUser.py"]
Now, let’s create the importUser.py
Python script that imports data from a CSV file uploaded to the S3 bucket into the DynamoDB table:
#!/usr/bin/python3import osimport boto3import csv from datetime import datetime, timezones3_resource = boto3.resource('s3')print('os environ:', os.environ)table_name = os.environ['table_name']bucket_name = os.environ['bucket_name']key = os.environ['key']table = boto3.resource('dynamodb').Table(table_name)csv_file = s3_resource.Object(bucket_name, key)items = csv_file.get()['Body'].read().decode('utf-8').splitlines()reader = csv.reader(items)header = next(reader)current_date = datetime.now(timezone.utc).isoformat()[:-6] + 'Z'for row in reader: table.put_item( Item={ 'id': row[header.index('id')], 'number': row[header.index('number')], 'createdAt': current_date, } )print('records imported successfully')
Additional information:
- Working with Files in Python
- Working with S3 in Python using the Boto3
- Working with DynamoDB in Python using the Boto3
Let’s build a Docker image:
docker build -f Dockerfile -t luckytuvshee/importuser .
As soon as the image has been built, you can push it to the Docker registry:
docker push luckytuvshee/importuser
Additional information:
- The Most Useful [Docker] Commands Everybody Should Know About [Examples]
DynamoDB table
Let’s create a DynamoDB table that stores records imported by the AWS Batch job.
Additional information:
- Working with DynamoDB in Python using the Boto3
import boto3 dynamodb = boto3.resource('dynamodb')response = dynamodb.create_table( TableName='batch-test-table', KeySchema=[ { 'AttributeName': 'id', 'KeyType': 'HASH' } ], AttributeDefinitions = [ { 'AttributeName': 'id', 'AttributeType': 'S' }, ], ProvisionedThroughput={ 'ReadCapacityUnits':1, 'WriteCapacityUnits':1 })print(response)

S3 bucket
Now, we need to create an S3 bucket, which will store uploaded CSV files. The AWS Batch job will process these files.
Additional information:
- Working with S3 in Python using the Boto3
import boto3 s3 = boto3.resource('s3')response = s3.create_bucket( Bucket='batch-test-bucket-ap-1', CreateBucketConfiguration={ 'LocationConstraint': 'ap-northeast-1' })print(response)

CSV file example
Here’s an example of the CSV file data, which we’ll upload to the S3 bucket:

We’ll name this file sample-zip.csv
. Let’s put it into the S3 bucket:

AWS Batch job’s IAM role
Now, let’s create the IAM role for the Docker Container to run the Python Boto3 script.
This role requires access to the DynamoDB, S3, and CloudWatch services. For simplicity, we’ll use theAmazonDynamoDBFullAccess,AmazonS3FullAccess, andCloudWatchFullAccess managed policies, but we strongly encourage you to make a custom role with only the necessary permissions.
Additional information:
- Working with IAM in Python using the Boto3
import boto3import json client = boto3.client('iam')assume_role_policy = { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "ecs-tasks.amazonaws.com" }, "Action": "sts:AssumeRole" } ] }response = client.create_role( RoleName='dynamodbImportRole', AssumeRolePolicyDocument=json.dumps(assume_role_policy))client.attach_role_policy( RoleName=response['Role']['RoleName'], PolicyArn='arn:aws:iam::aws:policy/AmazonDynamoDBFullAccess')client.attach_role_policy( RoleName=response['Role']['RoleName'], PolicyArn='arn:aws:iam::aws:policy/AmazonS3FullAccess')client.attach_role_policy( RoleName=response['Role']['RoleName'], PolicyArn='arn:aws:iam::aws:policy/CloudWatchFullAccess')print(response)

Managing AWS Batch using Boto3
This article section will cover how to manage the AWS Batch service and create and run the AWS Batch job.
Create AWS Batch compute environment
To create a computing environment for AWS Batch, you need to use thecreate_compute_environment() method of the AWS Batch Boto3 client.
AWS Batch job queues are mapped to one or morecompute environments:
MANAGED
– the managed compute environments launch Amazon ECS container instances into the VPC and subnets you specify when creating the compute environment. Amazon ECS container instances need external network access to communicate with the Amazon ECS service endpoint.UNMANAGED
– In an unmanaged compute environment, you manage your own compute resources. You must verify that the AMI you use for your compute resources meets the Amazon ECS container instance AMI specification.
You can also set the instance type to optimal, so that means AWS will evaluate the job and look at what kind of job is it CPU requirement job, is it memory dependent job or is it a combination of different other requirements and will select the correct instance for the job to be executed.
import boto3client = boto3.client('batch')response = client.create_compute_environment( computeEnvironmentName='dynamodb_import_environment', type='MANAGED', state='ENABLED', computeResources={ 'type': 'EC2', 'allocationStrategy': 'BEST_FIT', 'minvCpus': 0, 'maxvCpus': 256, 'subnets': [ 'subnet-0be50d51', 'subnet-3fd16f77', 'subnet-0092132b', ], 'instanceRole': 'ecsInstanceRole', 'securityGroupIds': [ 'sg-851667c7', ], 'instanceTypes': [ 'optimal', ] })print(response)

Create AWS Batch job queue
To create a job queue for AWS Batch, you need to use thecreate_job_queue() method of the AWS Batch Boto3 client.
Jobs are submitted to ajob queue, where they reside until they can be scheduled to a compute resource. Information related to completed jobs persists in the queue for 24 hours.
When you’re creating a queue, you have to define the queue state (ENABLED
or DISABLED
).
You can have different types of queues with varying kinds of priorities.
import boto3client = boto3.client('batch')response = client.create_job_queue( jobQueueName='dynamodb_import_queue', state='ENABLED', priority=1, computeEnvironmentOrder=[ { 'order': 100, 'computeEnvironment': 'dynamodb_import_environment' }, ],)print(response)

Register AWS Batch job definition
To register a job definition in AWS Batch, you need to use theregister_job_definition() method of the AWS Batch Boto3 client.
AWS Batchjob definitionsspecify how batch jobs need to be run.
Here are some of the attributes that you can specify in a job definition:
- IAM role associated with the job
- vCPU and memory requirements
- Container properties
- Environment variables
- Retry strategy
import boto3iam = boto3.client('iam')client = boto3.client('batch')dynamodbImportRole = iam.get_role(RoleName='dynamodbImportRole')response = client.register_job_definition( jobDefinitionName='dynamodb_import_job_definition', type='container', containerProperties={ 'image': 'luckytuvshee/importuser:latest', 'memory': 256, 'vcpus': 16, 'jobRoleArn': dynamodbImportRole['Role']['Arn'], 'executionRoleArn': dynamodbImportRole['Role']['Arn'], 'environment': [ { 'name': 'AWS_DEFAULT_REGION', 'value': 'ap-northeast-1', } ] },)print(response)

Submit AWS Batch job for execution
Jobs are the unit of work executed by AWS Batch ascontainerized applicationsrunning on Amazon EC2 or ECS Fargate.
Containerized jobs can reference a container image, command, and parameters.
With containerOverrides
parameter, you can override some parameters you defined in the container at job submission. You make a general-purpose container, and then you can pass some extra override configurations at initialization.
You can also specify the retryStrategy
, which allows you to define how often you want the job to be restarted before it fails.
import boto3client = boto3.client('batch')response = client.submit_job( jobDefinition='dynamodb_import_job_definition', jobName='dynamodb_import_job1', jobQueue='dynamodb_import_queue', containerOverrides={ 'environment': [ { 'name': 'table_name', 'value': 'batch-test-table', }, { 'name': 'bucket_name', 'value': 'batch-test-bucket-ap-1', }, { 'name': 'key', 'value': 'sample-zip.csv', } ] },)print(response)

You can check the AWS Batch job status in the AWS console:

As soon as the AWS Batch job finishes its execution, you may check the imported data in the DynamoDB table.
Summary
This article covered the fundamentals of AWS Batch and how to use Python and the Boto3 library to manage AWS Batch Jobs. We’ve created a Demo Job that imports a CSV file from the S3 bucket to the DynamoDB table.
Suppose you’d like to learn more about using the Boto3 library, especially in combination with AWS Lambda. In that case, we encourage you to check out one of the top-rated Udemy courses on the topic – AWS Automation with Boto3 of Python and Lambda Functions.
Related articles
- What is Serverless computing
- Working with SQS in Python using Boto3
- Container Management and Orchestration on AWS
- Cloud CRON – Scheduled Lambda Functions
- AWS Step Functions – How to manage long-running tasks
Tuvshinsanaa Tuul
Hi, Tuvshinsanaa Tuul from Mongolia. I have a Bachelor of Information System. I’m a Software Engineer experienced with JavaScript, AWS, Python, and PHP/Laravel.
FAQs
How do I run a Python script from a batch file in AWS? ›
- Navigate to the IAM service.
- Click on "roles" and then "Create Role"
- Under "Select type of trusted entity" select "AWS service"
- Under "Choose a use case" select "Elastic Container Service"
AWS Batch plans, schedules, and executes your batch computing workloads across the full range of AWS compute services and features, such as Amazon EC2 and Spot Instances. AWS Lambda is a compute service that lets you run code without provisioning or managing servers.
What is the use of Boto3 in Python? ›Boto3 makes it easy to integrate your Python application, library, or script with AWS services including Amazon S3, Amazon EC2, Amazon DynamoDB, and more.
Is AWS Boto3 free? ›AWS does offer free services and you can sign up for free. You will need a username and token to log in to boto3 through the backend, so go to https://aws.amazon.com and sign up for a free account.
Does AWS batch need Docker? ›Prerequisites. Before you get started, there a few things to prepare. If this is the first time you have used AWS Batch, you should follow the Getting Started Guide and ensure you have a valid job queue and compute environment. You also need a working Docker environment to complete the walkthrough.
Which EC2 is best for batch processing? ›Amazon EC2 Spot is commonly used by customers running batch workloads due to the cost savings it provides and which can attain up to 90% compared to On-Demand Instances prices.
Does AWS batch need a VPC? ›With Amazon Virtual Private Cloud (Amazon VPC), you can launch AWS resources into a virtual network that you've defined. We strongly recommend that you launch your container instances in a VPC.
Why do we need AWS batch? ›Q: Why should I use AWS Batch? AWS Batch handles job execution and compute resource management, allowing you to focus on developing applications or analyzing results instead of setting up and managing infrastructure. If you are considering running or moving batch workloads to AWS, you should consider using AWS Batch.
Is batch better than continuous process? ›The batch process can provide for better tracing and higher product quality for specialty products or highly diverse product sets. For operations that produce large quantities of products, the continuous process allows for larger-scale production.
What is Boto3 Python in AWS? ›The AWS SDK for Python (Boto3) provides a Python API for AWS infrastructure services. Using the SDK for Python, you can build applications on top of Amazon S3, Amazon EC2, Amazon DynamoDB, and more.
Why is AWS Python called Boto? ›
Boto derives its name from the Portuguese name given to types of dolphins native to the Amazon River.
How do I upload files to AWS S3 with Python and Boto3? ›- Create a boto3 session.
- Create an object for S3 object.
- Access the bucket in the S3 resource using the s3.Bucket() method and invoke the upload_file() method to upload the files.
- upload_file() method accepts two parameters.
We can see here that using AWS CLI is much faster than boto3.
What is the difference between Boto and Boto3? ›Boto3 is a ground-up rewrite of Boto. It uses a data-driven approach to generate classes at runtime from JSON description files that are shared between SDKs in various languages. This includes descriptions for a high level, object oriented interface similar to those available in previous versions of Boto.
How long does Boto3 session last? ›Session Duration
By default, the temporary security credentials created by AssumeRoleWithSAML last for one hour. However, you can use the optional DurationSeconds parameter to specify the duration of your session.
Jobs that run on Fargate resources can't expect to run for more than 14 days. If the timeout duration exceeds 14 days, the Fargate resources may no longer be available and the job will be terminated.
Does AWS batch use Lambda? ›Invoking Lambda functions from Amazon S3 batch operations
You can invoke the Lambda function with an unqualified or qualified function ARN. If you want to use the same function version for the entire batch job, configure a specific function version in the FunctionARN parameter when you create your job.
If you spin up your first AWS Batch environment in the AWS Web Console, AWS will create this Service Role for you. Using the same CloudFormation template, we'll create a Lambda function to trigger the job.
Is batch processing fast? ›Batch processing allows companies to process large volumes of data quickly.
Which algorithm is best for batch processing? ›Max–Min Ant System (MMAS) algorithm is developed to solve the problem. A local search method MJE (Multiple Jobs Exchange) is proposed to improve the performance of the algorithm by adjusting jobs between batches.
How can I improve batch processing? ›
You can improve performance by submitting multiple, smaller batches to be processed at the same time on different servers, instead of submitting one large batch.
Does AWS batch use EC2? ›AWS Batch uses Amazon ECS optimized AMIs for EC2 and EC2 Spot compute environments. The default is Amazon Linux 2 ( ECS_AL2 ). Before March 31, 2021, this default was Amazon Linux ( ECS_AL1 ) for non-GPU, non AWS Graviton instances.
Can we have 2 VPC in AWS? ›The simplest way to connect two VPCs is to use VPC Peering. In this setup, a connection enables full bidirectional connectivity between the VPCs. This peering connection is used to route traffic between the VPCs. VPCs across accounts and AWS Regions can also be peered together.
Does AWS batch reuse containers? ›Every time I submit a batch job, does a new Docker container get created or the old container will be reused. Yes. Each job run on Batch will be run as a new ECS Task, meaning a new container for each job.
How does AWS batch work internally? ›AWS Batch automatically provisions compute resources and optimizes the workload distribution based on the quantity and scale of the workloads. With AWS Batch, there's no need to install or manage batch computing software, so you can focus your time on analyzing results and solving problems.
How do I start a batch job in AWS? ›- From the navigation bar, select the AWS Region to use.
- In the navigation pane, choose Jobs, Submit job.
- For Job name, choose a name for your job. ...
- For Job definition, choose a previously created job definition for your job.
With batch processing, users may be forced to viewing data in both systems in order to see the most current data, resulting in losing order processing efficiency. Depending on the order flow volume throughout the workday, batch processing may create bottlenecks when transaction levels spike.
Why is batch processing slow? ›Slow batch processing might be caused by system traffic, not system performance. The user who complains that their batch job or query is taking too long may merely be caught in line behind a long-running job. You may have batch system traffic jams when some of these situations exist.
Why batch processing is inefficient? ›Because batch processing goes much slower, the overall cost of processing goes up. Starting up and using batch equipment can also increase energy consumption and the quality discrepancy between batches goes up. This can lead to lost production and compromised quality if the batch process isn't monitored closely.
How do I create a S3 bucket with Boto3? ›sess = Session(aws_access_key_id=tmp_access_key, aws_secret_access_key=tmp_secret_key, aws_session_token=security_token) s3_conn_boto3 = sess. client(service_name='s3', region_name=region) bucket = s3_conn_boto3.
Should I use Boto3 resource or client? ›
Clients vs Resources
Resources are the recommended pattern to use boto3 as you don't have to worry about a lot of the underlying details when interacting with AWS services. As a result, code written with Resources tends to be simpler.
Boto3 is the Amazon Web Services (AWS) Software Development Kit (SDK) for Python, which allows Python developers to write software that makes use of services like Amazon S3 and Amazon EC2.
Is Python or Java better for AWS? ›Both Java and Python are widely used across AWS applications, so whichever one you choose to focus on, you'll be in for the win.
Which Python library is used for AWS? ›Boto3 is the AWS SDK for Python. You can use it to create, configure, and manage AWS services such as Amazon Elastic Compute Cloud (EC2), Amazon Simple Storage Service (S3), and Amazon DynamoDB.
Is Boto a dolphin? ›Boto is a Portuguese name given to several types of dolphins and river dolphins native to the Amazon and the Orinoco River tributaries.
How do I pull data from AWS S3 using Python? ›- Step 1: Setup an account. ...
- Step 2: Create a user. ...
- Step 3: Create a bucket. ...
- Step 4: Create a policy and add it to your user. ...
- Step 5: Download AWS CLI and configure your user. ...
- Step 6: Upload your files.
- Create an IAM role with S3 write access or admin access.
- Map the IAM role to an EC2 instance.
- Install AWS CLI in EC2 instance.
- Run the AWS s3 cp command to copy the files to the S3 bucket.
- pip install boto3 && pip install aws-cli. Login to your AWS account. ...
- aws configure. ...
- touch main.py && code . ...
- import boto3. ...
- EC2 = boto3.client('ec2')S3 = boto3.client('s3') ...
- S3.download_file('test-bucket', 's3-script.py', 'local-script.py') ...
- with open('local-script.py', 'r') as f; ...
- python main.py.
No, you don't need the awscli.
Can you use Boto3 in Lambda? ›It can do anything from providing web pages and processing data streams to using APIs and connecting with other AWS and non-AWS services. To accomplish our aim by Working with Boto3 Lambda (AWS), doing some data wrangling, and saving the metrics and charts on report files on an S3 bucket.
Can you call AWS CLI from Python? ›
All you have to do is install Boto3 library in Python along with AWS CLI tool using 'pip'. One Boto3 is installed, it will provide direct access to AWS services like EC2.
What port does Boto3 use? ›Boto uses port 443 to connect to AWS services.
What is S3 in Boto3? ›A low-level client representing Amazon Simple Storage Service (S3)
Is Boto3 deprecated? ›On May 30, 2022, the AWS SDK for Python (Boto3 and Botocore) and the AWS Command Line Interface (AWS CLI) v1 will no longer support Python 3.6. This will be the third in a recent series of runtime deprecations which started in 2021.
Why Boto3 is used in Python? ›Boto3 makes it easy to integrate your Python application, library, or script with AWS services including Amazon S3, Amazon EC2, Amazon DynamoDB, and more.
How does Boto3 session work? ›The boto3. Session class, according to the docs, “ stores configuration state and allows you to create service clients and resources.” Most importantly it represents the configuration of an IAM identity (IAM user or assumed role) and AWS region, the two things you need to talk to an AWS service.
How do you assume AWS Boto3? ›To assume a role, an application calls the AWS STS AssumeRole API operation and passes the ARN of the role to use. The operation creates a new session with temporary credentials. This session has the same permissions as the identity-based policies for that role.
How do I run a .PY file in AWS? ›Run a Python script from GitHub
Open the AWS Systems Manager console at https://console.aws.amazon.com/systems-manager/ . In the navigation pane, choose Run Command. If the AWS Systems Manager home page opens first, choose the menu icon ( ) to open the navigation pane, and then choose Run Command. Choose Run command.
- Prerequisites.
- Step 1: Install Python.
- Step 2: Add code.
- Step 3: Run the code.
- Step 4: Install and configure the AWS SDK for Python (Boto3)
- Step 5: Add AWS SDK code.
- Step 6: Run the AWS SDK code.
- Step 7: Clean up.
- Step 1: Sign in to your AWS account. ...
- Step 2: Launch a virtual machine with an EC2 Instance. ...
- Step 3: Select Amazon Machine Image (AMI) & Instance. ...
- Step 4: Select or create Key Pair. ...
- Step 5: Access AWS CLI. ...
- Step 6: Update Existing Packages. ...
- Step 7: Create a New Directory.
How do I run a Python script from a batch file? ›
- Step 1: Create the Python Script. To start, create your Python Script. ...
- Step 2: Save your Script. Save your Python script (your Python script should have the extension of '. ...
- Step 3: Create the Batch File to Run the Python Script. ...
- Step 4: Run the Batch File.
- In Source Type, select S3.
- In the Command Line field, enter parameters for the script execution. ...
- (Optional) In the Working Directory field, enter the name of a directory on the node where you want to download and run the script.
The benefits of Python in AWS Lambda environments
Python is without a doubt the absolute winner when it comes to spinning up containers. It's about 100 times faster than Java or C#. Third-party modules. Like npm, Python has a wide variety of modules available.
Boto derives its name from the Portuguese name given to types of dolphins native to the Amazon River.
Which scripting language is best for AWS? ›If you choose to go for AWS training and Certification, Java is said to be the best choice as a programming language. Cloud apps built with Java can run on different operating systems including Linux, Windows, Raspberry, etc.
Can I code in Python in Amazon interview? ›Yes, you can.
Do you need to know Python for AWS? ›You are expected to know python as aws CLI is accessed using boto. Knowledge of shell programming would be helpful. These are the basic programming languages that anybody using AWS is expected to know.
Is Python good for Amazon? ›So, Amazon uses Python because it's popular, scalable, and appropriate for dealing with Big Data.
Can we run batch file automatically? ›Run batch file with Task Scheduler
To use Task Scheduler to run the batch file automatically at a specific time, use these steps: Open Start. Search for Task Scheduler and click the top result to open the app. Right-click the "Task Scheduler Library" branch and select the New Folder option.
Batch files are a great way to run tasks on your PC automatically. If you'd like to schedule a batch file to run automatically, use your Windows 10 or Windows 11 PC's Task Scheduler utility. Task Scheduler lets you trigger your batch file to run at a specific time or when a specified event occurs.
What does %% mean in batch? ›
Use double percent signs ( %% ) to carry out the for command within a batch file. Variables are case sensitive, and they must be represented with an alphabetical value such as %a, %b, or %c. ( <set> ) Required. Specifies one or more files, directories, or text strings, or a range of values on which to run the command.