HTTP Headers: Controlling Cache and History Mechanism

I’ll answer my own question:

Static public content

Date: <current time>
Expires: <current time + one year>

Rationale: This is compatible with the HTTP/1.0 proxies and RFC 2616 Section 14: http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.21
The Last-Modified header is not needed for correct caching (because conforming user agents follow the Expires header) but may be included for the end user consumption. Including the Last-Modified header may also decrease the server data transfer in case user hits the Reload/Refresh button. If Last-Modified header is added, it should reflect real data instead of something invented up. If you want to decrease server data transfer (in case user hits Reload/Refresh button) and cannot include real Last-Modified header, you may add ETag header to allow conditional GET (http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.26). If you already include Last-Modified also adding ETag is just waste. Note that Last-Modified is clearly superior because it’s supported by HTTP/1.0 clients and proxies, too. A suitable value for ETag in case of dynamic pages is SHA-1 of the contents of the page/resource. Note that using Last-Modified or ETag will not help with the server load, only with the server outgoing internet pipe / data transfer rate.

Static non-public content

Date: <current time>
Expires: <current time>
Cache-Control: private, max-age=31536000, s-maxage=0
Vary: Cookie

Rationale: The Date and Expires headers are for HTTP/1.0 compatibility and because there’s no sensible way to specify that the response is private, these headers communicate that the response may not be cached. The Cache-Control header tells that this response may be cached by private cache but shared cache may not cache the response. The s-maxage=0 is added because private may not be supported by all proxies that support Cache-Control (http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9.3 – I have no idea which proxies are broken). The max-age is set to value of 60*60*24*365 (1 year) because the HTTP/1.1 specification does not define any upper limit for this parameter, I guess that this is implementation dependant. The Expires headers SHOULD be limited to one year in the future, so using the same logic here should be okay. The Vary: Cookie header is required because the session that is used to check if the visitor is allowed to see the content is transferred in a cookie; because the returned response depends on the cookie value the cache may not use cached response if cookie header is changed.

I might personally break the last part. By not including the Vary: Cookie header I can improve caching a lot. For example: I have a profile image at http://example.com/icon/12 which is returned only for selected authenticated users. I have a visitor X with session id 5f2 and I allow the image to that user. Visitor X logs out and then later logs in again. Now X has session id 2e8 stored in his session cookie. If I have Vary: cookie, the user agent of X cannot use the cached image and is forced to reload this to its cache. Because the content varies by Cookie, a conditional GET with last modification time cannot be used. I haven’t tested if using ETag could help in this case because in that case, the server response would be the same (match the SHA-1 ETag computed from the contents of the response). Be warned that Internet Explorer (at least up to version 9) always forces conditional GET for resources that include Vary: Cookie even if suitable response were already in cache (source: http://blogs.msdn.com/b/ie/archive/2010/07/14/caching-improvements-in-internet-explorer-9.aspx). This is because internal cache implementation of MSIE does not remember which Cookie it sent the first time so it cannot know if the current Cookie is the same one.

However, here’s an example of a problem that is caused by dropping the Vary: Cookie header to show why this is indeed required for technically correct behavior: see the example above and imagine that after X has logged out, visitor Y logs in with the same user agent (the user agent may have been restarted between X and Y, it does not matter). If Y views a page that includes a link to http://example.com/icon/12 then Y will see the icon embedded inside the page even though Y wouldn’t be able to see the icon if X had not been using the same user agent previously. In my case I don’t consider this a big enough problem because Y would be able to access the icon manually by inspecting the user agent cache regardless of possibly added Vary: Cookie. However, this issue may prevent Y from noticing that he wouldn’t technically have access to this content (this may be important e.g. if Y is co-authoring the content). If the content is considered sensitive, the server must send no-store regardless of the problems caused by this Cache-Control directive.

Here too, adding Last-Modified header will help with users hitting Reload/Refresh button (see discussion above).

Volatile public content

Date: <current time>
Expires: <current time>
Cache-Control: public, max-age=0, s-maxage=0
Last-Modified: <real-last-modification-time>

Rationale: Tell HTTP/1.0 clients and proxies that this response should be considered stale immediately. The Last-Modified time is included to allow skipping content data transmission when the resource is accessed again and client supports conditional GET. If the Last-Modified cannot be used, ETag may be used as a replacement (see discussion above). It’s critical to use Last-Modified to allow conditional GET with HTTP/1.0 compatible clients.

If the content may be delayed even slightly, then Expires, max-age and s-maxage [sic] should be adjusted suitably. For example, adding 5 seconds to those might help a lot for highly popular site, as suggested by symcbean’s answer. Note that unlike conditional GET, increasing the expiry time will decrease server load instead of just decreasing server outgoing data traffic (because the server will see less requests in total).

Volatile non-public content

Date: <current time>
Expires: <current time>
Cache-Control: private, max-age=0, s-maxage=0
Last-Modified: <real-last-modification-time>
Vary: Cookie

Rationale: Tell HTTP/1.0 clients and proxies that this response should be considered stale immediately. The Last-Modified time is included to allow skipping content data transmission when the resource is accessed again and client supports conditional GET. If the Last-Modified cannot be used, ETag may be used as a replacement (see discussion above). It’s critical to use Last-Modified to allow conditional GET with HTTP/1.0 compatible clients. Also note that Cache-Control must not include no-cache, must-revalidate or no-store because using any of these directives will break the back button in at least one user agent. However, if the content the server is transferring contains sensitive material that should not be stored in permanent storage, the no-store flag MUST be used regardless of breaking the back button. Warning: note that the use of no-store cannot prevent sensitive material ending up on the hard disk without encryption if the operating system has swapping enabled and the swap is not encrypted! Also note that using no-store makes very little sense unless the connection is encrypted (HTTPS/SSL).

Leave a Comment