Nginx proxy Amazon S3 resources

Your approach to proxy S3 files via Nginx makes a lot of sense. It solves number of problems and comes with extra benefits such masking URLs, proxy cache, speed up transferring by offload SSL/TLS. You do it almost right, let me show what is left to make it perfect.

For sample queries I use the S3 bucket and an image URL mentioned in the public comment to the original question.

We start with inspecting of Amazon S3 files’ headers

curl -I http://yanpy.dev.s3.amazonaws.com/img/blog/sailing-routes-around-croatia-central-dalmatia-islands/yachts-anchored-paradise-cove-croatia-3.jpg

HTTP/1.1 200 OK
Date: Sun, 25 Jun 2017 17:49:10 GMT
Last-Modified: Wed, 21 Jun 2017 07:42:31 GMT
ETag: "37a907fc5dd7cfd0c428af78f09e95a9"
Expires: Fri, 21 Jul 2018 07:41:49 UTC
Accept-Ranges: bytes
Content-Type: binary/octet-stream
Content-Length: 378843
Server: AmazonS3

We can see missing Cache-Control but Conditional GET headers have already been configured. When we reuse E-Tag/Last-Modified (that’s how a browser’s client side cache works), we get HTTP 304 alongside with empty Content-Length. An interpretation of that is client (curl in our case) queries the resource saying that no data transfer required unless file has been modified on the server:

curl -I http://yanpy.dev.s3.amazonaws.com/img/blog/sailing-routes-around-croatia-central-dalmatia-islands/yachts-anchored-paradise-cove-croatia-3.jpg 
--header "If-None-Match: 37a907fc5dd7cfd0c428af78f09e95a9"

HTTP/1.1 304 Not Modified
Date: Sun, 25 Jun 2017 17:53:33 GMT
Last-Modified: Wed, 21 Jun 2017 07:42:31 GMT
ETag: "37a907fc5dd7cfd0c428af78f09e95a9"
Expires: Fri, 21 Jul 2018 07:41:49 UTC
Server: AmazonS3

curl -I http://yanpy.dev.s3.amazonaws.com/img/blog/sailing-routes-around-croatia-central-dalmatia-islands/yachts-anchored-paradise-cove-croatia-3.jpg 
--header "If-Modified-Since: Wed, 21 Jun 2017 07:42:31 GMT"

HTTP/1.1 304 Not Modified
Date: Sun, 25 Jun 2017 18:17:34 GMT
Last-Modified: Wed, 21 Jun 2017 07:42:31 GMT
ETag: "37a907fc5dd7cfd0c428af78f09e95a9"
Expires: Fri, 21 Jul 2018 07:41:49 UTC
Server: AmazonS3

“PageSpeed suggested to leverage browser caching” that means
Cache=control is missing. Nginx as proxy for S3 files solves
not only problem with missing headers but also saves traffic
using Nginx proxy cache.

I use macOS but Nginx configuration works on Linux exactly the same way without modifications. Step by step:

1.Install Nginx

brew update && brew install nginx

2.Setup Nginx to proxy S3 bucket, see configuration below

3.Request the file via Nginx. Please take a look at the Server header, we see Nginx rather than Amazon S3 now:

curl -I http://localhost:8080/s3/img/blog/sailing-routes-around-croatia-central-dalmatia-islands/yachts-anchored-paradise-cove-croatia-3.jpg

HTTP/1.1 200 OK
Server: nginx/1.12.0
Date: Sun, 25 Jun 2017 18:30:26 GMT
Content-Type: binary/octet-stream
Content-Length: 378843
Connection: keep-alive
Last-Modified: Wed, 21 Jun 2017 07:42:31 GMT
ETag: "37a907fc5dd7cfd0c428af78f09e95a9"
Expires: Fri, 21 Jul 2018 07:41:49 UTC
Accept-Ranges: bytes
Cache-Control: max-age=31536000

Request the file via Nginx

4.Request the file using Nginx proxy with Conditional GET:

curl -I http://localhost:8080/s3/img/blog/sailing-routes-around-croatia-central-dalmatia-islands/yachts-anchored-paradise-cove-croatia-3.jpg 
--header "If-None-Match: 37a907fc5dd7cfd0c428af78f09e95a9"

HTTP/1.1 304 Not Modified
Server: nginx/1.12.0
Date: Sun, 25 Jun 2017 18:32:16 GMT
Connection: keep-alive
Last-Modified: Wed, 21 Jun 2017 07:42:31 GMT
ETag: "37a907fc5dd7cfd0c428af78f09e95a9"
Expires: Fri, 21 Jul 2018 07:41:49 UTC
Cache-Control: max-age=31536000

Request the file using Nginx proxy with Conditional GET

5.Request the file using Nginx proxy cache, please take a look at X-Cache-Status header, its value is MISS until cache warmed up after first request

curl -I http://localhost:8080/s3_cached/img/blog/sailing-routes-around-croatia-central-dalmatia-islands/yachts-anchored-paradise-cove-croatia-3.jpg
HTTP/1.1 200 OK
Server: nginx/1.12.0
Date: Sun, 25 Jun 2017 18:40:45 GMT
Content-Type: binary/octet-stream
Content-Length: 378843
Connection: keep-alive
Last-Modified: Wed, 21 Jun 2017 07:42:31 GMT
ETag: "37a907fc5dd7cfd0c428af78f09e95a9"
Expires: Fri, 21 Jul 2018 07:41:49 UTC
Cache-Control: max-age=31536000
X-Cache-Status: HIT
Accept-Ranges: bytes

Request the file using Nginx proxy cache

Based on Nginx official documentation I provide the Nginx S3 configuration with optimised caching settings that supports the following options:

  • proxy_cache_revalidate instructs NGINX to use conditional GET
    requests when refreshing content from the origin servers
  • the updating parameter to the proxy_cache_use_stale directive instructs NGINX to deliver stale content when clients request an item
    while an update to it is being downloaded from the origin server,
    instead of forwarding repeated requests to the server
  • with proxy_cache_lock enabled, if multiple clients request a file that is not current in the cache (a MISS), only the first of those
    requests is allowed through to the origin server

Nginx configuration:

worker_processes  1;
daemon off;

error_log  /dev/stdout info;
pid        /usr/local/var/nginx/nginx.pid;


events {
  worker_connections  1024;
}


http {
  default_type       text/html;
  access_log         /dev/stdout;
  sendfile           on;
  keepalive_timeout  65;

  proxy_cache_path   /tmp/ levels=1:2 keys_zone=s3_cache:10m max_size=500m
                     inactive=60m use_temp_path=off;

  server {
    listen 8080;

    location /s3/ {
      proxy_http_version     1.1;
      proxy_set_header       Connection "";
      proxy_set_header       Authorization '';
      proxy_set_header       Host yanpy.dev.s3.amazonaws.com;
      proxy_hide_header      x-amz-id-2;
      proxy_hide_header      x-amz-request-id;
      proxy_hide_header      x-amz-meta-server-side-encryption;
      proxy_hide_header      x-amz-server-side-encryption;
      proxy_hide_header      Set-Cookie;
      proxy_ignore_headers   Set-Cookie;
      proxy_intercept_errors on;
      add_header             Cache-Control max-age=31536000;
      proxy_pass             http://yanpy.dev.s3.amazonaws.com/;
    }

    location /s3_cached/ {
      proxy_cache            s3_cache;
      proxy_http_version     1.1;
      proxy_set_header       Connection "";
      proxy_set_header       Authorization '';
      proxy_set_header       Host yanpy.dev.s3.amazonaws.com;
      proxy_hide_header      x-amz-id-2;
      proxy_hide_header      x-amz-request-id;
      proxy_hide_header      x-amz-meta-server-side-encryption;
      proxy_hide_header      x-amz-server-side-encryption;
      proxy_hide_header      Set-Cookie;
      proxy_ignore_headers   Set-Cookie;
      proxy_cache_revalidate on;
      proxy_intercept_errors on;
      proxy_cache_use_stale  error timeout updating http_500 http_502 http_503 http_504;
      proxy_cache_lock       on;
      proxy_cache_valid      200 304 60m;
      add_header             Cache-Control max-age=31536000;
      add_header             X-Cache-Status $upstream_cache_status;
      proxy_pass             http://yanpy.dev.s3.amazonaws.com/;
    }

  }
}

Leave a Comment