Mobile - Web - Media
Thursday, Jan 25, 2007 2:07:43 PM
Optimizing and Scaling Web Apps After reading about MySpace dealt with their scaling up to support millions of users, I decided it was a good idea to review my application's architechture, and identify what areas could scale independently, and what areas need optimization.
I first outlined the application that generates ArtistServer into seven parts - there's opportunity for optimization everywhere, so by looking at the applicaiton in different ways, I feel you can better identify those optimization points.
I then outlined the application into 25 Application Zones:
- Application Zones - the site divided into zones of functionality
- Sub-Applications - areas of the application which can run on separate servers and scale independently from the main site
- Flat File Publshing/Caching - writing of content and charts to files on the server to cut down on database access
- Site Data Methods - listing the interaction with the database by object and method
- Views - listing of all reusable views in the application, like 'song' and 'member'
- Widgets (future) - listing of planned widgets
- API (future) - outline of API plans
For each zone, I wrote what I felt was wrong, what could be done better, and what could run as a sub-application on it's own server. When I use the term sub-application, what I mean, is to take the zone or area that has specific functionality, and modify it so that it can run either on the same server as the main site, or run on a separate server where it can have it's own resources and potentially scale on it's own.
- Artist/Member sites
- Music/Ringtone/Area/Genre pages
- Serving Mp3s/Uploading mp3s
- My Account Admin
- Artist Member Files
- Photos Area
- Serving/Resizing Photos
- Site Skinning
- About/Info pages
- Error system/pages
- Site Admin
For example, the following are already functioning as sub-applications:
After my critical analysis, I found there were three zones I should consider modifying, turning them into sub-applications. They were: stats/tracking, artist/member's files, and RSS. As an example, I'll explain what is being done to the stats and tracking in the application.
- Database Server - database is hosted on it's own server
- Forums - hosted on a separate server using forums.artistserver.com
- Photos - photo uploads, resizing, and serving is done on a separate server using photos.artistserver.com
- MP3 Streams/Downloads- all mp3s are served from media.artistserver.com, which runs on the same server as the main site, but is ready to support a move to it's own server, or even a streaming provider
Stats and Tracking
The application tracks data on artist/member sites, but only those who've upgraded, and for all artists, we track all streams and downloads to their music. Actually, we filter out repeat streams in a 24hr period by the same IP to keep people from gaming the charts.
All the calls
to the stats tables were something that could happen asynchronously. There wasn't a need for the processes that were being tracked to have to wait for the tracking process to complete before completing themselves. For example, you have an artist page, and starting at the top, you would have a query to the database to return that person's record, and another query that tracks the access to the page, another to return their songs, etc. In this case, each query is happening in sequence, which means one has to complete before the other can execute.
Since the capturing of the stats data and writing to the database isn't going to be displayed on the page nor get used in any other portion of the code, it's a prime candidate for becoming an asynchronous process.
After I wrote a new stats object, I set it up on a separate server from the main application server (so my asynchronous stats calls don't compete for resources on the main server), I replaced my stats script on the artist and member pages with a call to the AsyncHTTP object by Compound Theory
(a ColdFusion CFC that uses Java), ran some tests, then took it live. Now when the page executes, and the AsyncHTTP object calls my stats server, the page continues to process without waiting, shaving off several milliseconds from every request to an artist or member page.
The same thing will be done with the mp3 tracking on the site, which as you can imagine, should make the process of requesting an mp3 for streaming and/or download faster for everyone.
After that, I'll break the stats tables out into their own database, so the main database can be
free to handle the application, and the stats database can grow independently. Initially, I'll still run the database on the same database server, but the point is, that at any time, that's a viable scaling option - the stats could easily be setup on it's own database server or server cluster.
Sounds like fun eh? :)
Objectize, Optimize, Modularize, Asynchronize
Create and extend objects.
Optimize at all levels.
Asynchoronize background processes.
After completing changes to the three zones I identified as potential sub-applicaitons, I'll go back over my outline and analysis, and start working on addressing the issues I've listed. While I'm sure there we will still have growing pains as we scale up, I feel the timeinvested now not only delays that experience, but also prepares us for that day. Another plus from this process, is that more of the application is getting documented, something that rarely gets done with Web projects.
- ADD TO: