- Damodar Chetty
- Jan 21, 2008
Over the weekend, I had a
chance to read Steve Souder's book on performance improvements for web
applications. As an interesting departure from all the tomes that you see on
tweaking server-side application architectures, JVM configurations, application
server parameters, and Java API usage, this book focuses on that thin slice of
the presentation tier that lives on the client.
The net upshot is a book
that is almost technology agnostic - it can be read by any web developer,
whether you develop using ASP.NET, PHP, or even J2EE. However, note that web
server configuration tips are Apache centric.
For those who can't wait to
hear what I think (I'm sure there's at least one of you out there), I can
safely categorize this as one of my most eye-opening reads in the recent 12 months, and is
a must-read for any web applications developer. Over the next few paragraphs,
I'll try to explain why this is so.
The case for front-end engineering
is made very forcefully by an interesting graphic (that should be familiar for
those who use Firebug's Net tab in Firefox). This graphic shows a timeline of
the entire HTTP traffic that is spawned by what, to the user, is a single
request for a page. When the requested page is retrieved and parsed, additional
requests may be spawned to retrieve the components required by that page. A
typical page will have references to one or more images, stylesheets, and
Javascript files. Each of these components must be requested separately by the
browser client, and this is what causes that flurry of HTTP traffic.
An interesting statistic is
that it often takes longer to download these ancillary components than it does
to generate and return the original page requested. His empirical data shows
that the main request comprises only 10% of the actual traffic.
This has the significant
implication that even minor tweaks to the handling of these ancillary components
can result in much larger end user response time improvements. It also implies
that the traditional approach to performance tuning - which targets server side
application code - is probably not the most cost effective approach.
In other words it is crucial
that you determine your own ratio of (time required to download the main page /
time required for the completed response) for your web site. The lower this
number, the more likely you are to find improvements by following the rules in
this book. Of course, the higher this number, the more likely it is that you
are bound by your server side application code.
But regardless of what this
number turns out to be, it's a wonderful measure that tells you something you
probably should know about your web site.
However, a more compelling
argument is that changes to the front-end usually don't require as much
resources as do projects that involve rearchitecting the backend software
and/or hardware.
A quick perusal of the table
of contents indicates that this book is organized using 14 rules that identify
the best practices to be employed. I've enumerated the top 10 of these rules
below.
If a single HTTP request
spawns a request for each web component on that page, then logic dictates that
the fewer the components on that page, the faster it will load.
Simple techniques to do
this, without adversely affecting the user experience, are to combine the
individual graphics into a single graphic, and then use either an image map or
a CSS sprite to delineate the different sections of that image; or, to combine
separate CSS (or JS) files into a single master CSS (or JS) file.
This next rule addresses
performance by moving fairly static web components (e.g., images, stylesheets,
scripts, etc.) closer to the end user - by leveraging Content Delivery Networks
run by third parties such as Akamai, to host your components. Note that this
does not address your application components - which can end up being a major
reengineering exercise.
This rule reaches out across
the wire to make the client's cache work for you. The goal is to reduce the
number of web component requests, after the first request, by having the client
cache them locally.
You do this by marking
components that change infrequently (in particular, images) as cacheable, and
by setting their lifetime in the cache to a date far in the future.
"Far" is a relative term, and depends on the item being cached. For a
company logo, you might set this in terms of years, whereas hot topic images
might expire in a day.
You can do this using either
the HTTP/1.0 Expires header or the HTTP/1.1 Cache-Control header with the max-age
directive, or preferably both. The advantage with the latter being that you can
use a relative cache lifetime (in seconds), rather than having to worry about
computing absolute dates and clock synchronization.
Note that caching is not
fool proof, users are known to clear their caches; filled caches are
automatically cleared to make space for new entries; and the user may not visit
the page frequently enough to benefit from the cache.
There is a development cost to be paid for using this rule though, i.e., how to handle the case where a component has indeed changed since it was cached. A workable approach is to add a revision number to that component's file name. When a page requests that component using the new file name, the cached version will no longer match, and so a fresh request will be issued.
This rule reduces network
traffic by actively compressing the component before it is transmitted over the
wire. It focuses on the tradeoff of network traffic against server CPU overhead
incurred due to the compression.
Servers must strive to gzip
all text responses, including HTML documents, scripts, and stylesheets. PDF
files and images are already compressed, and so do not benefit from additional
compression.
When the server detects that
a browser can support compression (via its Accept-Encoding request header), and
if the file matches the configured filter (e.g., size and type), it will
compress the file, and send back a Content-Encoding response header to indicate
the type of compression used.
As with the other rules
there are caveats galore - incl. the use of proxies, and browser
incompatibilities, which need to be considered.
The progress indicator of
the web world is the incremental rendering of the page itself. The components
of a page are painted in, as they are downloaded, to give an indication that
the page is alive and well - this is known as "progressive
rendering".
Unfortunately, placing
stylesheets at the bottom of a document prevents Internet Explorer from
rendering any content until the stylesheet is downloaded (aka Blank White
Screen). This is "by design", and is intended to avoid having to
re-render elements if their styles change (aka Flash of Unstyled Content).
The interesting aspect here
is that this does not delay the downloading of the components that come before
the stylesheet. It simply suppresses visual cues to the user, causing the
impression of slower performance.
Unlike stylesheets, where
progressive rendering is blocked until all stylesheets have been downloaded,
scripts block progressive rendering only for content below the script. Hence
you should move them as low in the page as possible.
Interestingly, parallel
downloading is disabled while a script is being downloaded, even on different
hostnames. All other components must wait until the script is completely
retrieved. This implies that placing scripts at the top of the page will (a)
force all other downloads to wait, and (b) will block progressive rendering for
the entire page, until the script is loaded. Placing scripts at the bottom of
the page therefore results in better performance, both actual and perceived.
I've not seen the use of CSS
expressions much in the code bases I've worked with, and the example provided
wasn't very compelling. However, Steve sounds the alarm on how often such
expressions are evaluated.
Inlining Javascript and
stylesheets tend to be faster than using external files, since no additional
HTTP requests are needed to retrieve them from the server. However, external
files can be cached by the browser, reducing the number of requests that are
needed. The more often a visitor accesses your site in a session, or in a given
period, the more likely that he will benefit from caching external components.
Ideally, we should inline
scripts and CSS for home pages, but use external files for the secondary pages.
You can achieve this by using the onload event in the home page. I.e., once the
home page has completely loaded, you can then dynamically download the external
components used by the secondary pages, into the browser's cache. These
scripts/styles are also loaded inline for the home page, so in order for the
second download to work, your code must deal with double definitions.
On the Internet, Domain Name
Servers provide the service of resolving user memorable host names into IP addresses.
The browser must wait patiently without downloading anything, until the lookup
is complete. The response time depends on the DNS resolver, your proximity to
it, the load on it, and your bandwidth speed.
DNS records are cached both by your browser, as well as your operating system. When both these caches are empty, one DNS lookup is required per unique host name on the retrieved page. In other words, reducing the number of unique host names will reduce the number of lookups that are required. However, this has the negative side effect of reducing parallelization of downloads. Steve's rule of thumb is to split your components across 2 unique host names.
This rule suggests that you
reduce the size of your script files by minimizing their contents, i.e., by
eliminating white space and comments. This is particularly useful when in
combination with compressing your text files.
1. You can often
see a blank space with no downloads that occurs right after the HTML page is
retrieved. This period of quiet indicates that the browser is occupied parsing
the page's contents, and in retrieving components that may have been previously
cached.
2. Even
components that do not have an Expires header in the future are stored in the
cache. On subsequent requests, the browser checks the cache and finds that this
component has expired. A browser cannot use a stale component without first
querying the server using a conditional GET request.
If the component hasn't changed, the server returns a "304 Not
Modified" status code that instructs the browser to use the cached
component. Else, it sends back the requested component. Using a far future
expires header cuts down on these conditional GETs.
3. The HTTP/1.1
spec suggests browsers should download two components in parallel per hostname.
This means that distributing your components evenly across two hostnames will
result in 4 components being downloaded in parallel - this is the sweet spot.
4. FireFox's network.http.max-persistent-connections-per-server
setting in the about:config page lets you set the number of parallel downloads.
This book was easy to
browse, was fairly concise, and had a lot of good information to convey, a
truly unbeatable combination. It provides you with a nice tour of the
performance related aspects of the HTTP specification, while at the same time
getting you in the right frame of mind for analyzing your own web sites. If you
write or maintain web code you'd be well served by reading this book.