Playwright on Elastic Beanstalk
2022-12-30
Est. 3m readI’ve spent a lot of time with web scrapers over the years. BeautifulSoup was my first love. And then it was Puppeteer. But the modern approach is to use Playwright. It’s an incredible tool for all kinds of browser automation and I recommend starting with it.
I appreciate how intuitive the API is and it typically just works for all of my JavaScript needs.
python-3.7 on Amazon Linux 2
Today, I tried setting up Playwright in AWS Elastic Beanstalk. While using the
python-3.7 platform I was unable to execute playwright install via SSH:
1$ playwright install
2ERROR: cannot install on amzn distribution - only Ubuntu is supportedLucky for me, I had come across a similar issue at work. We wanted playwright
to be setup on our self-hosted runners for GitHub Actions without having to
playwright install each time a new runner is setup.
The solution was to use a Docker image with playwright pre-installed. We used mcr.microsoft.com/playwright:v1.27.0-focal as the container for each of the steps. See a full example here.
Ubuntu Dockerfile
A simple solution is to recreate the EB environment using the docker platform.
We’ll no longer use the python-3.7 platform. That also means the Procfile is no
longer needed.
1$ eb init --platform docker --region us-east-1 my-app-name
2$ eb create \
3 --platform docker \
4 --elb-type application \
5 --region us-east-1 \
6 -k my-app-keysFrom there, I created a Dockerfile with the following contents. You may
need to customize this container for your project. I’m using Python + FastAPI.
- Note: You may need to replace
main:appwith your own entrypoint.
1FROM ubuntu:20.04
2
3ENV DEBIAN_FRONTEND=noninteractive
4RUN apt-get update && apt-get install -y python3.9 python3.9-dev python3-pip
5RUN pip install gunicorn
6
7WORKDIR /app
8COPY requirements.txt .
9RUN pip install -r requirements.txt
10
11# install playwright
12RUN playwright install --with-deps
13
14COPY . .
15
16EXPOSE 8000
17
18# the -b 0.0.0.0:8000 is required for the load balancer
19# to communicate with the container
20CMD ["gunicorn", "main:app", "-b", "0.0.0.0:8000", "--worker-class", "uvicorn.workers.UvicornWorker", "--workers", "1"]That’s all you really need to get playwright configured in your EB environment.
Deploying
Updating your app with your local files is as you’d expect…
1$ eb deployClosing Notes
- Docker images are cleaned up after each deployment.
- You’ll probably want to update the SSL certificate in the Listeners
section of your load balancer after it’s created.
- To use SSL, update the newly created security group to allow inbound/outbound HTTPS traffic.
- Use the
eb sshcommand to connect to your EB environment via SSH (assuming you specified-kineb create.) - Use the
eb logscommand to view the logs for your EB environment.