I'm Learning About: Cloudfront

April2019

When I first put up this website, alexkudlick.com, I was just hand writing html and hosting it on s3. I like s3 hosting for static websites because it's basically the simplest possible solution, with minimal fuss. My deploy process was just uploading files to s3, and I don't have any servers to manage or databases to administer.

I've gradually been making this website more and more sophisticated. I still host it on s3, but I don't hand write the html any more. Now I use next.jsto build the html from React components. I love next.js because it lets me compose a vocabulary of components to describe the content, rather than writing the content. For instance, the top of the file that creates this blog entry looks like:

export default () => (
  <BlogEntry
    title="I'm Learning About: Cloudfront"
    description="Things I'm learning from putting cloudfront in front of this website."
    publishDate="04-25-2019"
  >
  <p>
    When I first put up this website, <a href="alexkudlick.com">alexkudlick.com</a>, I was
...

When I want to write a blog entry all I do use a <BlogEntry> component, which automatically includes things like a <Layout>, <Title>, and publishing to my rss feed. It's the compositional nature of React that I find so useful; I can create more and more rich components by combining ones that I've already built.

There were some interesting learning experiences in making this change. I now have a build step that locally compiles my React javascript down to html and then pushes it up to s3. I wanted to only push the files that had changed, because I like a slim and lean deploy process that only takes seconds. But there was a problem combining next.js with aws command line tools - next.js overwrites all the html files on every build, even if they haven't changed, and aws s3 sync decides which files have "changed" by their time stamps.

So I had the opportunity to write one my first npm module, the cleverly named aws-s3-sync-by-hash, which does an s3 sync but determines changes by the hash of the file. If you're hosting content on s3, maybe this will be useful to you too!


So, static hosting on s3 has been going fine for me, but I have something I want to do that won't be feasible with just s3. I want to have private sections of the site that are protected by http authentication so I can share drafts of posts with select people. That's just not something you can do on s3 - there's no web server, so there's no way to put logic in between the user and the content.

I did some research, and it seems like this will be easy to do with cloudfront.  A nice side benefit would be enabling https on my website.

It was pretty easy to set up cloudfront. I just followed this aws guide to set up the cloudfront distribution. The next step was to point my domain alexkudlick.com to the cloudfront distribution. First I verified that the cloudfront distribution was working by directly visiting the cloudfront domain (d3n0iziqu9ovnv.cloudfront.net).

Well, whaddya know, aws has another handy guide for connecting route53 to a cloudfront distribution. The only caveat is you have to wait for your cloudfront distribution to go to the enabled state, which can take up to fifteen to twenty minutes.

I set up my route53 route, visited http://alexkudlick.com (to check the https redirect), saw my home page and called it a day. Not bad for a few hours of work.

Later on that week, I was talking to my brother about my website and went to show him my last blog post. I clicked the Blog link at the top of my website, and was greeted with disaster:

All my pages besides the home page were broken. The problem was that I was relying on s3's default object behavior for subdirectories - you can configure s3 to serve a default object if the client requests a url that corresponds to a directory. In my case I configured it to serveindex.html. So if you visit https://alexkudlick.com/blog/, you will be served the file index.html in the blog directory in my bucket.

I tried to follow this adviceand changed the cloudfront origin to the s3 static website url rather than the s3 bucket, but that broke things even worse because of s3 permissions. I had changed my bucket permissions to allow reads only from cloudfront in order to prepare for having private sections. It wouldn't really make sense to have a private section that cloudfront guards with a username and password if people could just go directly to the s3 url. So the problem with setting the origin to the s3 url was that browsers would fail to fetch all the resources like css.

So I have spent the past hour trying to fix this problem. I really like to have urls that end in /, not /index.htmlbut I couldn't really get things to work even after making the bucket public again. To top it all off, there were problems where cloudfront was serving cached versions of my pages after I deployed new changes (why, cloudfront, why!???!), so I had to go through and invalidate cloudfront's cache. In my mind cloudfront should be using the etag header from s3 (s3 serves this up, right?) to decide when to invalidate the cache.

I had hoped to end this entry on a positive note. This morning I wrote it up, pushed it triumphantly, and then saw the broken css on /blog. I have to clean the house before I go pick up my daughter, so I'm throwing in the towel and changing all the urls to end with /index.html. I know, it's ugly. Perhaps I'll come back to this problem and try to learn more about cloudfront, especially its caching behavior.

Next I'll be working on a blog about building modular java applications with gradle. That will take a while to get through, so expect it to be a few weeks.