Adding a Dynamic Blogroll to a Static Site With AWS Lambda

Recently I created a small client library to the Neocities API for Clojure called neocities-clj. I originally created it just to make uploading to Neocities, where I'm hosting TWSiO, easier. However, I was thinking about other uses for the library and I realized that the ability to automate changes to a Neocities site, which is static site host, could actually give Neocities sites some “pseudo-dynamic” features. To test this idea out, I decided to add an updating blogroll to TWSiO that's updated by a script running in AWS Lambda. You can find that script at https://github.com/TWSiO/TWSiO-Blogroll.

What's a Blogroll?

A blogroll is a place on a blog or personal site where you can link to other sites you follow. It's a good way to help other people find like minded writers as well as help those writers be found.

Inspired by blogrolls I’ve seen in places like Blogger that actually feature the latest posts by the followed blogs, I decided to make a small list of the three latest blog posts of the other sites I follow on my home page, as well as the latest post of each site on a page where I describe each site I follow. Those posts from the sites I follow are updated daily on TWSiO by the Lambda function and script I created.

How the Blogroll Functionality was Created

To start with, I chose my scripting language of choice, Clojure, to create a script that can be ran by Lambda.

(Note: I know the script could be cleaner and simpler, but this was just a script for a quick proof of concept).

Parsing the Followed Site's Feeds

The script first fetches the RSS or Atom feed of each of the sites in the blogroll. The Atom parsing code is below (the RSS code is mostly the same):

(defn parse-atom-entry
  [entry]
  (let [subtags @(:children entry)
        subtag-get (fn [tag-name subtag]
                     (as-> subtag X
                       (filter #(= tag-name (:tag %)) X)
                       (first X)))

        subtag-text #(:text (subtag-get %1 %2))
        dt-format java.time.format.DateTimeFormatter/ISO_OFFSET_DATE_TIME

        date (java.time.LocalDateTime/parse
               (subtag-text :updated subtags)
               dt-format)
        ]
    {:date date
     :link (:href (:attrs (subtag-get :link subtags)))
     :title (subtag-text :title subtags)
     }
    ))

(defn parse-atom
  [atom-string]
  (let [entries (xpath/$x "/feed/entry" atom-string)
        date-titles (map parse-atom-entry entries)
        ]
    (sort-by :date #(- (compare %1 %2)) date-titles)
   ))

parse-atom uses a Clojure xpath library to find all of the entries in the Atom XML string, maps parse-atom-entry over each of the entries to get the relevant data from each entry, then orders the results based on publish date. parse-atom-entry extracts the post title, URL, and publish date from the parsed XML data.

Creating the Updated Pages

The script then makes an HTTP request to get the site’s current home page and the “Sites I Follow” page from the web.

Here’s the code that modifies the home page (the code to modify the “Sites I Follow” page is mostly the same).

(defn latest-feeds-html-reducer
  [aggr [site-id post]]
  (let [date (:date post)
        iso-format java.time.format.DateTimeFormatter/ISO_LOCAL_DATE
        site-info (site-id site-names)
        li (str
             "<li>"
             "<h3><a href=\"" (:url site-info) "\">" (:name site-info) "</a></h3>"
             "<time datetime=\"" (.toString date) "\">" (.format date iso-format) "</time>"
             "<h4><a href=\"" (:link post) "\">" (:title post) "</a></h4>"
             "</li>"
             )
        ]
    (str aggr li)
    ))

(defn latest-feeds-html
  [posts]
  (str
    (reduce latest-feeds-html-reducer "" posts)
    ))

(defn home-feeds
  [html feeds]
  (let [home-template (html/html-resource (java.io.StringReader. html))
        newest-three (take 3 (combined-feeds feeds))
        feeds-html (latest-feeds-html newest-three)
        transformed (html/at home-template [:#blogroll-feed] (html/html-content feeds-html))
        ]
    (apply str (html/emit* transformed))
    ))

It looks like there’s a lot going on here, but it’s mostly just getting data into the right structure (which, again, could probably be simplified a bit). home-feeds takes the HTML string of the home page and the parsed post feed data, takes the newest three posts, transforms the post feed data into an HTML fragment, and then replaces the section with the blogroll-feed ID on my homepage using Clojure’s Enlive library. The latest-feeds-html creates the HTML fragment for a post by reducing the list of posts with latest-feeds-html-reducer which creates HTML list items with the post data, and concatenates them with the other created HTML list item strings.

Updating the Host

Once it has the pages that should replace the home and “Sites I Follow” pages, it just uses upload-home

(defn upload-home
  [html api-key]
  (spit "/tmp/home.html" html)
  (neo/upload
    {"/index.html" "/tmp/home.html"}
    :api-key api-key))

which takes the HTML and it uploads it to Neocities using neocities-clj. However, to authenticate with neocities in order to do the upload, the function also needs the API key1.

Getting the API Key

Since this will be run in Lambda, I stored the API key in AWS Secrets Manager which is a more secure place to store things like API keys, passwords, etc. than just hard coding it into the script itself.

(def secret-id "neocities")

(defn get-api-key
  []
  (let
    [aws-token (System/getenv "AWS_SESSION_TOKEN")
     headers {"X-Aws-Parameters-Secrets-Token" aws-token}
     secrets-path (str "/secretsmanager/get?secretId=" secret-id)
     possible-port (System/getenv "PARAMETERS_SECRETS_EXTENSION_HTTP_PORT")
     port (if (nil? possible-port) 2773 possible-port)
     secrets-endpoint (str "http://localhost:"
                           port
                           secrets-path
                           )
     body (as-> secrets-endpoint X
            (http/get X {:headers headers})
            (:body X))
     parsed (json/parse-string body true)
     ]
    (:SecretString parsed)
     ))

(defn parse-atom-entry
  [entry]
  (let [subtags @(:children entry)
        subtag-get (fn [tag-name subtag]
                     (as-> subtag X
                       (filter #(= tag-name (:tag %)) X)
                       (first X)))

        subtag-text #(:text (subtag-get %1 %2))
        dt-format java.time.format.DateTimeFormatter/ISO_OFFSET_DATE_TIME

        date (java.time.LocalDateTime/parse
               (subtag-text :updated subtags)
               dt-format)
        ]
    {:date date
     :link (:href (:attrs (subtag-get :link subtags)))
     :title (subtag-text :title subtags)
     }
    ))

To get the API key from the Secrets Manager you have to make an HTTP request to a particular localhost endpoint which returns a response with the API key in it. It looks like there’s a lot going on there, but it’s mostly just constructing the URL to get the AWS Secret, and then extracting the result from the HTTP response.

Adapting the Script for a Java Runtime

Lambda doesn’t actually have a runtime explicitly for Clojure, and although you could use a custom runtime to run Clojure2, Leiningen build tool can also package Clojure code as JARs which we can use with the latest Java runtime. One concession we have to make to run on the Java runtime though, is that it expects a class with something like a static method on it rather than a Clojure function. That means we have to use Clojure’s Java interop functionality to define a class with :gen-class:

(ns blogroll-script.core
  (:gen-class
    :methods [^:static [handler [Object] String]]
    )
  (:require ...))

The :methods section defines a static method handler which takes a Java Object and returns a String.

Creating the Lambda

Then creating the Lambda itself was easy. I just had to go through the setup wizard to set things like the handler method, add a layer to access the API key secret, and upload the JAR. After that, testing the Lambda manually updated the feeds on my site.

However I still needed an event to trigger the script to run once a day. I used Amazon EventBridge and created a rule to send an event to the Lambda function every day with the cron expression 0 0 * * ? * and voilà, I have a pseudo-dynamically updating feature on my “static” website.

Troubleshooting

Of course, the actual path to creating the script and Lambda function wasn't as smooth as that. There were some things I had to troubleshoot along the way.

Errors in Java Runtime

One issue with running Clojure code as a JAR with a Java based runtime is that it made the error messages kind of cryptic. I could sort of decipher how some of the error messages translated from generated Java code to Clojure code, but the stack traces weren't very helpful since it was referencing a lot of under the hood Clojure stuff, and the line numbers didn't match up with the original Clojure code. I mostly just tried to make my best guess at how to interpret them, and it all worked out in the end so I suppose my guesses weren't too far off.

Probable Enlive Bug

There was an issue with parsing TWSiO's HTML with Enlive where it would convert

<a ...><h3>...</h3></a>

into

<a ...></a><h3>...</h3>

even if you just parsed and re-serialized it where you would expect the same output as the input. I think it's probably a bug with Enlive, however I need to investigate a bit more and make sure I have a have a good replicable case before submitting it as an actual bug.

Mysterious Type Error

Near the end, I could tell pretty much everything was working, however the Lambda function was returning an error message saying it couldn't convert a hashmap to a string somewhere. Because of the error message problems I couldn't tell exactly where the problem was, and I traced it down to the final step of the script and couldn't find anything wrong. It wasn't until I was about to try adding another gen-class method for some testing that I realized that I specified that the handler function of the existing gen-class class as

(:gen-class
    :methods [^:static [handler [Object] String]]
    )

which says that the handler function/static method is returning a String, however the return from -handler is the result of upload-blogroll which is in turn the result from the neocities-clj which is a hashmap. I just changed the handler to have the last expression which is returned to be "Finished". This just goes to show you how interoperating between a dynamically typed language and a statically typed language can be a bit awkward sometimes.

Overall I'm pretty happy with how it went. It didn't take too long and it was a successful proof of concept of adding "pseudo-dynamic" features to TWSiO. Now that I've tried it out with a simpler feature maybe I'll try and think up some more "pseudo-dynamic" features to add to the site.

Resources

Footnotes

  1. It has to create a temporary file to include it in the POST. I couldn’t find a good way with the Clojure HTTP client library I used to be able to include the uploaded page as a string.
  2. I saw a few blog posts about running Clojure with a custom GraalVM runtime, which would probably be a hair more efficient and less expensive due to low startup overhead, however this lambda function only runs once a day and only runs for less than 20 seconds so I figured it wasn't worth the extra hassle. In other scenarios I could certainly see it being worth it though.