Automatically Generating and Hosting Websites with AI and AWS
June 4, 2024
Artificial IntellegenceIntroduction
Something that has been fascinating to me recently is the dead internet theory, which suggests that soon the majority of the internet will be run by bots—or even worse, that it already is. I want to explore this theory today by testing the feasibility of generating and hosting one or many websites automatically, with as little human input as possible. This script will consist of two major parts: using AI to generate many sites, and then hosting these generated sites. While these sites will initially be 'blogs,' they could easily be transformed into any other type of content-driven site as well.
Generating a Site
There are a two major avenues of success that I can envision when thinking about automatically generating websites. The first approach would be generating static sites. Using static files would offer a few key benefits. First, and most importantly, it significantly simplifies the hosting process compared to the subsequent approach. It also simplifies the script itself and still allows for a good amount of site-by-site customization if desired. Lastly, it would present much lower costs, maintenance, and security risks compared to the API-driven approach.
The other approach would be the API-driven approach. Websites would need to be hosted with a backend component that can fetch data dynamically. This approach would be ideal if the goal was to generate sites that require up-to-date information, or if there was a need to easily edit and/or add content at any time. Hosting the sites and the API would be more complex than hosting static sites, but it would be well worth it if the scalability and malleability were prioritized.
Without and specific need for that scalability and malleability for this project, I decided on the first approach.
Static Site Generation
There are a few goals I had in mind that dictated the approach I used to generate these static sites. The first goal was to ensure a level of uniqueness between them. Although the theme itself will be the same (though adding a variety of themes wouldn't necessarily be difficult), each generated site should have a different color scheme and feel. The next goal, a more obvious one, was that each site should have its own content. To make this goal a bit more challenging, I wanted to generate content using as few tokens as possible, which translates to lower costs.
When you think of the most simple static site, you likely imagine a basic index.html file with some CSS. Any extended functionality beyond this will come in the form of added HTML, CSS, and JS files. This approach would certainly be simple and effective, but the outlined goals above become increasingly tricky with such a basic solution. For example, site-by-site coloring would be straightforward using css variables, but content generation would require parsing, converting, or generating text into proper HTML format. Furthermore, any additional content would involve editing already created files (such as an 'All Blog Posts' page), which could make this approach quite complex in the end.
The better approach I found was to use a Next.js project as a framework from which all static sites would be exported, given a set folder of content. This content folder would include all blog posts, CTAs, colors, and metadata that were intended for use. The content folders could also be git tracked and appended at a later date, and the site could be re-exported to easily add content. This is a good baseline template that I built my framework out of. I then made use of the wonderful Next.js static export feature after the content has been generated.
Generating Content
I wanted all of the content on these sites to have an authentic feel. To achieve this, all 'blog posts' will have an author, post date, sections, and even off-site links. Everything from the paragraphs to the authors is generated using the OpenAI API, and more specifically their GPT-3.5 Turbo model, as it is the most cost-effective option.
The prompting itself is nothing special, though I did have to wrestle with the model when trying to get it to adhere to the JSON format that I needed. Below is the prompt I used to generate an about page for the site.
1response = client.chat.completions.create(
2 model="gpt-3.5-turbo",
3 response_format={"type": "json_object"},
4 messages=[
5 {
6 "role": "system",
7 "content": "You are a content author. You will generate an about page for a website. The about page should include an two paragraph overview of the site given some site information. You will output this information as a JSON object with only one key being 'overview'.",
8 },
9 {
10 "role": "user",
11 "content": f"Generate an about page for a site with the title {title}, the theme {theme}, and the description {description}",
12 },
13 ],
14 )
python
Hosting Dozens of Sites at a Time
There are numerous ways to host static sites. However, the number of options decreases significantly when trying to host sites solely through scripting. Even still, there are still too many approaches to compare and contrast here, so I will only go over my approach and why I chose it.
I opted to use AWS for hosting these static sites, and more specifically, an S3 bucket combined with CloudFront distributions. The process is straightforward: upload the static site folder to a designated S3 bucket, and then create a CloudFront distribution that points to that folder, making the sites accessible. This setup not only aligns with the static site direction I chose previously, but also leverages AWS's inherent benefits—scalability, reliability, and availability.
Domains and Certificates
While using only the approach described above meets the goal of hosting the sites, there are a few shortcomings. The first is that the domain will be a randomly generated CloudFront domain, which is slightly unsightly. The second issue is that we would be using HTTP instead of HTTPS on that custom domain, which can be improved.
The SSL certificate issue can easily be solved by using AWS Certificate Manager to create certificates for the custom domain. Despite the inherent simplicity, it does divide our script into two parts. The division arises from the need to wait for the certificate to be issued before it can be used in the CloudFront distribution's initial configuration.
For the custom domain, we need a registrar with an API that allows us to set DNS records. We will use this endpoint twice: the first time will be for ACM validation, and later to point the DNS to our CloudFront distribution.
Putting it all Together
As mentioned earlier, the goal was to fully automate the whole process of generated and hosting many sites. I felt that the natural choice for this script would be Python as it has great OpenAI API integration and Boto3 for AWS. The script will take a few inputs for the sites, such as domain, site title, scale, site description, and theme. The domain and site title inputs are self-explanatory, but the site description is used to help generate content and find images. The scale is an integer that determines how many posts are generated, and the theme is used to link similar sites to each other.
As previously noted, the script was better of being divided into two parts after adding certification as a feature. This division also provided a great reason to introduce a database to track the sites. Not only will the database be extremely useful in tracking the state of the sites between the two halves of the script, but it allows for the ability to link like-sites to each other, as well as potentially allowing for an admin dashboard in the future.
Script 1: Creation
The first of the two scripts covers everything up until DNS validation. The initial step of this script involves creating the content folder for the site and all of its content. This step is where all of the AI 'magic' happens. Next, the site is built and exported as a static site. These files are then uploaded to the S3 bucket. Subsequently, we create a ACM certificate for our domain, add the DNS validation records to our domain, and await it being issued. The final step of this first script is to add our site information to the database and give it a CREATED
status.
1# create.py
2def main():
3 ### ARGUMENTS
4 # Create the parser
5 parser = argparse.ArgumentParser(
6 description="Deploy a site with specific domain and affiliate link."
7 )
8
9 # Add arguments
10 parser.add_argument("domain", type=str, help="The domain name for the site")
11 parser.add_argument("title", type=str, help="The title of the site")
12 parser.add_argument("scale", type=int, help="The scale of the site")
13 parser.add_argument("theme", type=str, help="The theme to be used")
14 parser.add_argument("description", type=str, help="The description of the site")
15
16 args = parser.parse_args()
17 domain = args.domain
18 title = args.title
19 scale = args.scale
20 theme = args.theme
21 description = args.description
22 print(
23 f"Domain: {domain}, Title: {title}, Scale: {scale}, Theme: {theme}, Description: {description}"
24 )
25
26 ### CREATION
27 # Create a content folder and export the static site
28 create_site(domain, title, scale, theme, description)
29
30 ### HOSTING
31 # Create ACM certificate
32 certificate_arn = create_acm_certificate(domain)
33 if not certificate_arn:
34 print("Failed to create ACM certificate")
35 return
36
37 # Upload site to S3
38 bucket_arn = add_site_to_bucket(domain, local_site_directory, bucket_name)
39 if not bucket_arn:
40 print("Failed to create S3 bucket")
41 return
42
43 ### DATABASE
44 # Add site to database
45 create_site_entry(
46 domain,
47 title,
48 scale,
49 theme,
50 description,
51 "CREATED",
52 bucket_name,
53 domain, # Using the domain as the S3 destination path
54 bucket_arn,
55 "NULL", # CloudFront ARN
56 certificate_arn,
57 )
58
python
Script 2: Hosting
The second script first checks to ensure that the certificate has been issued for the domain. Once confirmed, it proceeds to create the CloudFront distribution, incorporating the ACM certificate into the distribution's configuration. After creating the distribution, the S3 bucket policy must be amended to allow access to the static site files. Finally, the domain's DNS records can be updated to only include the distribution's records, and the site's database entry can have its status upgraded to DEPLOYED
.
This second script can also be made into a background task that checks for sites with a certification status of CREATED
and a valid ACM certificate, and finishes the deployment for any it finds.
1# deploy.py
2def main():
3 sites = load_sites_with_status_created()
4
5 for site in sites:
6 domain = site["domain"]
7 acm_arn = site["acm_arn"]
8 s3_bucket_name = site["s3_bucket_name"]
9 s3_destination_path = site["s3_destination_path"]
10
11 # For each site, check if ACM certificate is verified
12 if is_acm_certificate_verified(domain, acm_arn):
13 print(f"Certificate for {domain} is verified.")
14 else:
15 print(f"Certificate for {domain} is not verified.")
16 continue
17
18 # Create CloudFront distribution for the new S3 bucket directory
19 distribution_arn = create_cloudfront_distribution_for_s3_directory(
20 s3_bucket_name, s3_destination_path, acm_arn, domain
21 )
22 if not distribution_arn:
23 print("Failed to create CloudFront distribution")
24 continue
25
26 # Update the S3 bucket policy to allow the new CloudFront distribution
27 update_s3_policy_for_cloudfront(s3_bucket_name, distribution_arn)
28
29 # Get our disribution's domain name
30 distribution_domain_name = get_distribution_domain_name(distribution_arn)
31
32 # Append CloudFront DNS records to Namecheap
33 append_cloudfront_dns_records_to_namecheap(domain, distribution_domain_name)
34
35 # Update the site's status in the DB
36 update_site_status(domain, "DEPLOYED")
37
38 # Update the site's CloudFront ARN in the DB
39 update_cloudfront_arn(domain, distribution_arn)
python
Examples
Site 1: Adventuring Europe
- Domain: adventuringeurope.online
- Title: Adventuring Europe
- Scale: 5 Posts
- Theme: Travel
- Description: Showcase of French Cuisine
- Result: https://adventuringeurope.online/
- Total Run Time: 58 seconds (2020 M1 Macbook Pro)
Site 2: Wandering Pacific
- Domain: wanderingpacific.online
- Title: Wandering the Pacific
- Scale: 5 Posts
- Theme: Travel
- Description: Showcase of Pacific Tourism
- Result: https://wanderingpacific.online/
- Total Run Time: 62 seconds (2020 M1 Macbook Pro)
Conclusion
Automatically creating and hosting sites with a script is not only possible, but also cost-effective and relatively efficient. The site generation process can be expanded to create more robust and distinctive sites, but a base level of each is already there. Efficiency will also improve as the models themselves become more time-efficient, as they are currently the limiting factor in run time.
I am still at odds on whether or not to make this repo public. For now it will remain private, but please feel free to email me at [email protected] if you would like access!