Kellen

Parker

Automatically Generating and Hosting Websites with AI and AWS

Introduction

Something that has been fascinating to me recently is the dead internet theory, which suggests that soon the majority of the internet will be run by bots—or even worse, that it already is. I want to explore this theory today by testing the feasibility of generating and hosting one or many websites automatically, with as little human input as possible. This script will consist of two major parts: using AI to generate many sites, and then hosting these generated sites. While these sites will initially be 'blogs,' they could easily be transformed into any other type of content-driven site as well.

Generating a Site

There are a two major avenues of success that I can envision when thinking about automatically generating websites. The first approach would be generating static sites. Using static files would offer a few key benefits. First, and most importantly, it significantly simplifies the hosting process compared to the subsequent approach. It also simplifies the script itself and still allows for a good amount of site-by-site customization if desired. Lastly, it would present much lower costs, maintenance, and security risks compared to the API-driven approach.

The other approach would be the API-driven approach. Websites would need to be hosted with a backend component that can fetch data dynamically. This approach would be ideal if the goal was to generate sites that require up-to-date information, or if there was a need to easily edit and/or add content at any time. Hosting the sites and the API would be more complex than hosting static sites, but it would be well worth it if the scalability and malleability were prioritized.

Without and specific need for that scalability and malleability for this project, I decided on the first approach.

Static Site Generation

There are a few goals I had in mind that dictated the approach I used to generate these static sites. The first goal was to ensure a level of uniqueness between them. Although the theme itself will be the same (though adding a variety of themes wouldn't necessarily be difficult), each generated site should have a different color scheme and feel. The next goal, a more obvious one, was that each site should have its own content. To make this goal a bit more challenging, I wanted to generate content using as few tokens as possible, which translates to lower costs.

When you think of the most simple static site, you likely imagine a basic index.html file with some CSS. Any extended functionality beyond this will come in the form of added HTML, CSS, and JS files. This approach would certainly be simple and effective, but the outlined goals above become increasingly tricky with such a basic solution. For example, site-by-site coloring would be straightforward using css variables, but content generation would require parsing, converting, or generating text into proper HTML format. Furthermore, any additional content would involve editing already created files (such as an 'All Blog Posts' page), which could make this approach quite complex in the end.

The better approach I found was to use a Next.js project as a framework from which all static sites would be exported, given a set folder of content. This content folder would include all blog posts, CTAs, colors, and metadata that were intended for use. The content folders could also be git tracked and appended at a later date, and the site could be re-exported to easily add content. This is a good baseline template that I built my framework out of. I then made use of the wonderful Next.js static export feature after the content has been generated.

Generating Content

I wanted all of the content on these sites to have an authentic feel. To achieve this, all 'blog posts' will have an author, post date, sections, and even off-site links. Everything from the paragraphs to the authors is generated using the OpenAI API, and more specifically their GPT-3.5 Turbo model, as it is the most cost-effective option.

The prompting itself is nothing special, though I did have to wrestle with the model when trying to get it to adhere to the JSON format that I needed. Below is the prompt I used to generate an about page for the site.

1response = client.chat.completions.create(
2        model="gpt-3.5-turbo",
3        response_format={"type": "json_object"},
4        messages=[
5            {
6                "role": "system",
7                "content": "You are a content author. You will generate an about page for a website. The about page should include an two paragraph overview of the site given some site information. You will output this information as a JSON object with only one key being 'overview'.",
8            },
9            {
10                "role": "user",
11                "content": f"Generate an about page for a site with the title {title}, the theme {theme}, and the description {description}",
12            },
13        ],
14    )

python

Hosting Dozens of Sites at a Time

There are numerous ways to host static sites. However, the number of options decreases significantly when trying to host sites solely through scripting. Even still, there are still too many approaches to compare and contrast here, so I will only go over my approach and why I chose it.

I opted to use AWS for hosting these static sites, and more specifically, an S3 bucket combined with CloudFront distributions. The process is straightforward: upload the static site folder to a designated S3 bucket, and then create a CloudFront distribution that points to that folder, making the sites accessible. This setup not only aligns with the static site direction I chose previously, but also leverages AWS's inherent benefits—scalability, reliability, and availability.

Domains and Certificates

While using only the approach described above meets the goal of hosting the sites, there are a few shortcomings. The first is that the domain will be a randomly generated CloudFront domain, which is slightly unsightly. The second issue is that we would be using HTTP instead of HTTPS on that custom domain, which can be improved.

The SSL certificate issue can easily be solved by using AWS Certificate Manager to create certificates for the custom domain. Despite the inherent simplicity, it does divide our script into two parts. The division arises from the need to wait for the certificate to be issued before it can be used in the CloudFront distribution's initial configuration.

For the custom domain, we need a registrar with an API that allows us to set DNS records. We will use this endpoint twice: the first time will be for ACM validation, and later to point the DNS to our CloudFront distribution.

Putting it all Together

As mentioned earlier, the goal was to fully automate the whole process of generated and hosting many sites. I felt that the natural choice for this script would be Python as it has great OpenAI API integration and Boto3 for AWS. The script will take a few inputs for the sites, such as domain, site title, scale, site description, and theme. The domain and site title inputs are self-explanatory, but the site description is used to help generate content and find images. The scale is an integer that determines how many posts are generated, and the theme is used to link similar sites to each other.

As previously noted, the script was better of being divided into two parts after adding certification as a feature. This division also provided a great reason to introduce a database to track the sites. Not only will the database be extremely useful in tracking the state of the sites between the two halves of the script, but it allows for the ability to link like-sites to each other, as well as potentially allowing for an admin dashboard in the future.

Script 1: Creation

The first of the two scripts covers everything up until DNS validation. The initial step of this script involves creating the content folder for the site and all of its content. This step is where all of the AI 'magic' happens. Next, the site is built and exported as a static site. These files are then uploaded to the S3 bucket. Subsequently, we create a ACM certificate for our domain, add the DNS validation records to our domain, and await it being issued. The final step of this first script is to add our site information to the database and give it a CREATED status.

1# create.py
2def main():
3    ### ARGUMENTS
4    # Create the parser
5    parser = argparse.ArgumentParser(
6        description="Deploy a site with specific domain and affiliate link."
7    )
8
9    # Add arguments
10    parser.add_argument("domain", type=str, help="The domain name for the site")
11    parser.add_argument("title", type=str, help="The title of the site")
12    parser.add_argument("scale", type=int, help="The scale of the site")
13    parser.add_argument("theme", type=str, help="The theme to be used")
14    parser.add_argument("description", type=str, help="The description of the site")
15
16    args = parser.parse_args()
17    domain = args.domain
18    title = args.title
19    scale = args.scale
20    theme = args.theme
21    description = args.description
22    print(
23        f"Domain: {domain}, Title: {title}, Scale: {scale}, Theme: {theme}, Description: {description}"
24    )
25
26    ### CREATION
27    # Create a content folder and export the static site
28    create_site(domain, title, scale, theme, description)
29
30    ### HOSTING
31    # Create ACM certificate
32    certificate_arn = create_acm_certificate(domain)
33    if not certificate_arn:
34        print("Failed to create ACM certificate")
35        return
36
37    # Upload site to S3
38    bucket_arn = add_site_to_bucket(domain, local_site_directory, bucket_name)
39    if not bucket_arn:
40        print("Failed to create S3 bucket")
41        return
42
43    ### DATABASE
44    # Add site to database
45    create_site_entry(
46        domain,
47        title,
48        scale,
49        theme,
50        description,
51        "CREATED",
52        bucket_name,
53        domain,  # Using the domain as the S3 destination path
54        bucket_arn,
55        "NULL",  # CloudFront ARN
56        certificate_arn,
57    )
58

python

Script 2: Hosting

The second script first checks to ensure that the certificate has been issued for the domain. Once confirmed, it proceeds to create the CloudFront distribution, incorporating the ACM certificate into the distribution's configuration. After creating the distribution, the S3 bucket policy must be amended to allow access to the static site files. Finally, the domain's DNS records can be updated to only include the distribution's records, and the site's database entry can have its status upgraded to DEPLOYED.

This second script can also be made into a background task that checks for sites with a certification status of CREATED and a valid ACM certificate, and finishes the deployment for any it finds.

1# deploy.py
2def main():
3    sites = load_sites_with_status_created()
4
5    for site in sites:
6        domain = site["domain"]
7        acm_arn = site["acm_arn"]
8        s3_bucket_name = site["s3_bucket_name"]
9        s3_destination_path = site["s3_destination_path"]
10
11        # For each site, check if ACM certificate is verified
12        if is_acm_certificate_verified(domain, acm_arn):
13            print(f"Certificate for {domain} is verified.")
14        else:
15            print(f"Certificate for {domain} is not verified.")
16            continue
17
18        # Create CloudFront distribution for the new S3 bucket directory
19        distribution_arn = create_cloudfront_distribution_for_s3_directory(
20            s3_bucket_name, s3_destination_path, acm_arn, domain
21        )
22        if not distribution_arn:
23            print("Failed to create CloudFront distribution")
24            continue
25
26        # Update the S3 bucket policy to allow the new CloudFront distribution
27        update_s3_policy_for_cloudfront(s3_bucket_name, distribution_arn)
28
29        # Get our disribution's domain name
30        distribution_domain_name = get_distribution_domain_name(distribution_arn)
31
32        # Append CloudFront DNS records to Namecheap
33        append_cloudfront_dns_records_to_namecheap(domain, distribution_domain_name)
34
35        # Update the site's status in the DB
36        update_site_status(domain, "DEPLOYED")
37
38        # Update the site's CloudFront ARN in the DB
39        update_cloudfront_arn(domain, distribution_arn)

python

Examples

Site 1: Adventuring Europe

Site 2: Wandering Pacific

Conclusion

Automatically creating and hosting sites with a script is not only possible, but also cost-effective and relatively efficient. The site generation process can be expanded to create more robust and distinctive sites, but a base level of each is already there. Efficiency will also improve as the models themselves become more time-efficient, as they are currently the limiting factor in run time.

I am still at odds on whether or not to make this repo public. For now it will remain private, but please feel free to email me at [email protected] if you would like access!