Creating a Website on AWS

How This Website Works

This website uses 6 AWS services to operate (and one optional service for version control and a content repository). I'll discuss each one briefly, although over time, I will probably write more, since these articles are not static, but fluid brain dumps (at least, until all the info is dumped).

What About The Content

I keep the CodeCommit repository on my computer. I write new content by copying the template and writing an article. I then commit the article on my computer to the repository. I have a small script `publish.sh` which I keep in the repository and when I run that script, it synchronizes to the S3 bucket and invalidates the CloudFront cache. In fact, because I synchronize the entire folder, you can download my publish.sh script from my site. It's very short:

#! /bin/zsh
# sync the files to s3
aws s3 sync --exclude '.git/*' . s3://tolidano.com
# invalidate the cloudfront distribution
aws cloudfront create-invalidation --distribution-id E3D7R1KZ2V941B --paths '/*'

There isn't anything sensitive here - you knowing the name of the bucket holding my website does not expose me in any way, nor does the publication of my CloudFront distribution ID.

How Much This Website Costs

A key thing people will care about is cost, but the TL;DR: it's less than $1/mo for my site and in theory, I could host dozens of sites (on a single domain) and it would still be less than $1. This is because of a single key factor: Free Tier. AWS has a free tier where each month, you get some small allocation of dozens of their services for free, each month, forever. For example:

So all total, the only cost is the hosted zone and DNS lookups, currently costing me $0.52 per month. Now, granted, I use my account for plenty of other things (that I'll hopefully write about) and so my actual costs are slightly higher (usually between $20 and $30 a month) but if I just wanted to run some sites cheaply, this would be it.

Hypothetically...

So one thing that is very annoying and "basic" about this model is that the header and footer are part of the article template and thus, if I ever want to change anything (like swapping CSS frameworks), I have to do a find-and-replace in every single file. While I have half a dozen articles, that's easy, but if I had 100, that might be rough (although replace all in VS Code works very well as does gsed/sed). I thought about how I might fix this, and the art of the possible. First, I thought "maybe I can access the content in the response with CloudFront Functions just before it gets back to the viewer in a viewer response association?" But this turns out to be false - CloudFront Functions cannot access the content on the way in or out. There is no "server-side includes" here because there is no server processing that I can access in any meaningful way, so the Apache/Nginx solutions are all out the window. Now, there is a feature called Lambda@Edge which is the binding of Lambda and CloudFront together - allowing full lambda functions to run at the edge. They still have limitations versus full lambda, but allow you to assign IAM policies and access content. What they still do not offer is the ability to modify content. But you can completely replace the content. So I could trap responses from the origin, and either keep the header and footer in the lambda code or keep them as separate cachable objects in the bucket along with the content. But Lambda@Edge has no free tier, so I will pay for each request ($0.60 per million and $0.18 / GB-hour, which I suspect would cover about 250,000 requests, billed at the 1 ms increment). So do I spend another $0.65 - $0.70 a month for the versatility of only updating the header/footer in a single place? I think I'll at least try it out and see.

Ok, so it worked! Now I run an edge lambda that does exactly as described above. It took 8 versions, and it also has to handle 404s now, but overall, it does a great job. The one "special" piece is that the title of the page has to be kept with the article, so the first line of the file going to S3 is the title. Also there's an exception that it only handles URIs that have "html" in them. Here's the code for the lambda:

import io
import json

import boto3
import botocore

session = boto3.Session()
s3 = session.client('s3', region_name="us-east-1")
bucket_name = "tolidano.com"

top = """
<html>

    <head>
        <title>%TITLE%</title>
        <meta name="viewport" content="width=device-width, initial-scale=1">
        <meta charset="UTF-8">
        <link rel="stylesheet" href="https://unpkg.com/simpledotcss/simple.min.css">
        <link rel="stylesheet" href="/site.css" />
        <link rel="icon" type="image/png" href="/favicon.ico" />
        <link rel="apple-touch-icon" href="/favicon.png" />
        <script src="https://unpkg.com/htmx.org@1.8.4"
            integrity="sha384-wg5Y/JwF7VxGk4zLsJEcAojRtlVp1FKKdGy1qN+OMtdq72WRvX/EdRdqg/LOhYeV"
            crossorigin="anonymous"></script>
    </head>

    <body>
        <header>
            <nav>
                <a href="/">Home</a>
            </nav>
            <h1>%TITLE%</h1>
        </header>
        <main>
            <div id="d">
        
"""
bottom = """

        </div>
    </main>
</body>

</html>
"""

not_found = "Not Found. Go <a href='/'>Home</a>"

headers = {
            "cache-control": [
                {
                    "key": "Cache-Control",
                    "value": "max-age=100"
                }
            ],
            "content-type": [
                {
                    "key": "Content-Type",
                    "value": "text/html"
                }
            ]
        }

def lambda_handler(event, context):
    request = event["Records"][0]["cf"]["request"]
    
    if "html" not in request["uri"]:
        return request
    
    bytes_buffer = io.BytesIO()
    try:
        s3.download_fileobj(Bucket=bucket_name, Key=request["uri"][1:], Fileobj=bytes_buffer)
        byte_value = bytes_buffer.getvalue()
        file_data = byte_value.decode()
        lines = file_data.split("\n")
        title = lines[0]
        body = "\n".join(lines[1:])
        
        return {
            "status": 200,
            "statusDescription": "OK",
            "headers": headers,
            "body": top.replace("%TITLE%", title) + body + bottom
        }
    except botocore.exceptions.ClientError as error:
        return {
            "status": 404,
            "statusDescription": "Not Found",
            "headers": headers,
            "body": not_found
        }

Actually I did a few more things... First, you see the date created and updated - I modified the bottom variable to include a footer:

     <footer>
     <div>%DATES%</div>
     <div><a href="#top">Back to Top</a></div>
     </footer>

Then I added a snippet of code to get that data from S3 (because the bucket has versioning turned on):

        versions = s3.list_object_versions(Bucket=bucket_name, Prefix=key)["Versions"]
        newest, oldest = versions[0]["LastModified"], versions[-1]["LastModified"]
        dates = f"Created: {oldest} / Updated: {newest}"
        body = top.replace("%TITLE%", title) + main + bottom.replace("%DATES%", dates)

So now you have a created and updated date on all the articles automatically.

Next I found it would suffer from cold start (because I'm not very popular). So I wrote a little lambda that is triggered by an EventBridge rate expression - rate(15 minutes) - which should be enough to keep the lambda warm for some segment of users since cold starts are usually every 20 minutes or so. The code has no dependencies (I interact with urllib directly, no requests) so it's fast and simple.

I got tired of running the publish script every time I pushed to the repo. I am already forced to be logged in (via aws sso login) anyway in order for CodeCommit to work) but it was still an extra step I know can be automated away with any reasonable CI/CD setup. So I set up a CodeBuild project and dropped a buildspec.yml file into the root of my repo. Then I set up a CodePipeline with a single build step (no "deploy" step is necessary, because the "build" step does everything I need). The buildspec looks just like my publish script, but in YAML:

version: 0.2

phases:
  install:
    commands:
      - aws s3 sync --exclude '.git/*' . s3://tolidano.com
  post_build:
    commands:
      - aws cloudfront create-invalidation --distribution-id E3D7R1KZ2V941B --paths '/*'