Load Balancing Heavy Websites on Current Tech? 63
squared99 asks: "I have just delved into some research on a set up for very high traffic websites. I'm particularly interested in how many webservers would be needed at minimum and the type of technology powering them. Slashdot seemed like a good sample site to check out, so I went to Slashdot's technology FAQ to get a starting point. This setup seems to be from 2000, is most likely a bit out of date, and I'm assuming the same number of webservers would not be needed with current server technology. What would experts in the Slashdot community recommend as a required setup to handle Slashdot-like volumes, if they had to do it today using more current hardware? How many webservers could it be reduced to, while maintaining enough redundancy to keep serving pages, even under the heaviest of loads?"
hardware (Score:1)
Re:hardware (Score:2)
Re:It depends (Score:2)
If you're serving out static content, you need to make sure that your graphics are low in size whenever possible. For instance, you don't need a full page graphic like the top of the reply area here on slashdot, just the 1 corner graphic and the rest is a table color.
Going past the basics and assuming its dynamic content, you'll need to make sure your code is optimized (ie: not the type of code someo
One Word (Score:2, Informative)
Prime Example: wikipedia (Score:5, Informative)
Re:Prime Example: wikipedia (Score:1, Insightful)
Re:Prime Example: wikipedia (Score:3, Insightful)
Re:Prime Example: wikipedia (Score:4, Insightful)
Re:Prime Example: wikipedia (Score:2)
Seriously, they have very specific needs, they should made a custom database that handled the problem.
They? It's an open source project, pal, funded and built by people like us. If you think some chunk of work needs doing, you can step up to the plate.
And if you'd rather just go on leeching, then maybe you could change your attitude from, "Dudes, this gift sucks! What were you thinking?" to "Thanks for the cool free thi
Re:Prime Example: wikipedia (Score:3, Insightful)
A complete-and-total system rewrite in something that's not PHP would do wonders for efficiency, but the development manpower is not there- it would take an enormous amount of effort to get it usable, let alone useful.
Test, test, test... (Score:4, Funny)
Paul B.
Re:Test, test, test... (Score:1)
Google for "Virtualized" or "Utility Hosting" (Score:2, Interesting)
Take a look at livejournal's setup (Score:3, Informative)
Other people are much more qualified than I to answer the number of servers questions though.
Re:Take a look at livejournal's setup (Score:3, Insightful)
Re:Take a look at livejournal's setup (Score:2)
I'm seriously interested - we're in the situation LiveJournal was 3 years ago, and I'm interested to know whether there's a better way than going the way they are.
RTFB (Score:1, Informative)
Pound (Score:3, Interesting)
Use Slashdot's current tech guide (Score:2, Funny)
From those, you will get an idea of the type and scope of technolgy the slash teams use to maintain one of the world's most popular sites.
Granted, your team is not as stilled as the crack techs at /. central, but the specs on that page will get you pointed in the right direction.
Re:Some more considerations (Score:4, Informative)
You get to get out a massive single point of failure (the NAS) and you get a little closer to linear scalability (adding another webserver doesn't put more load on your NFS box).
database is the bottleneck (Score:2)
Both sides used pretty much the same setup for webservers... 4 load balanced webservers with hyperthreading at around 3ghz (at the moment... always around 75% of the fastest processors out there to save money). These are sitting in datacenters with multiple 10gbps connections, and each has a hotswapable copy of the entire sys
Re:database is the bottleneck (Score:1)
Re:database is the bottleneck (Score:1, Troll)
we have 4 admins for the mysql servers.
What the hell for?
each has a hotswapable copy of the entire system running at another data center.
with proper failover, this is pointless and wasteful.
Cron static pages off the database when possible
I couldn't agree more.
I'm doing on average of 250k page views per day. My
Re:database is the bottleneck (Score:2)
The 4 database admins are for the 7 different clusters, and these systems do more than provide for webpages, they do spidering for site changes, password management, etc, etc.
so I suppose this kind of a system is beyond the scope of the question asked.
Re:database is the bottleneck (Score:1)
Sad part about your statement is, that due to lack of industry standardization people confuse all the terms.
I myself prefer to use the terms that were published back in 2000 by a company called net genesis. written by Matt cutler . title of the work is called e-metrics - business metrics for the new economy
Foundry ServerIrons (Score:2)
Quizilla.com (Score:1)
The site faq has the grity details [quizilla.com], but basically everything is running on 8 web servers with a cluster of 4 database servers. Mod_perl is used for the most highly trafficed pages, though some less used pages are still static CGIs.
For the way I have it set up, this farm has reached it's limit with the web servers getting pegged pretty constantly during peak hours, and the d
Re:Quizilla.com (Score:2)
In some applications there may be a valid argument for denormalization to reduce load. In others, the SQL was cobbled together by someone without the adequate experience and it's pounding away at the database (one such occurrence is when using subselects with MySQL 4.1.x.. it can prove significantly faster to split out subselects and pass them into another DB call using regular code.. since MySQL does not
Your question cannot be answered (Score:4, Insightful)
Some people might consider a hundred thousand pageviews per day to be heavy. Others might consider a million pageviews per day to be heavy.
From experience a hundred thousand for a reasonable application can be handled on one server. A million would probably require 2 to 4.
Re:Your question cannot be answered (Score:2)
Not ONLY does it depend on how MUCH heavy traffic is, it depends on WHAT you're doing. A simple page that makes a few database queries is going to be a lot faster than a complex page with a bunch of very complex queries that does a lot of mangling with the data returned.
It's dangerous thinking, though, to take the num
1M pages/day on a single server (Score:2)
Carnage Blender serves about 1M pages a day off a single (dual 2.4GHz xeon; 4 GB ram) machine. Those are database-backed pages, with a lot more updates than most read-only-ish sites.
Powered by postgresql. And a lot of tuning.
wildly off topic (Score:2)
I had to stop playing because it was too addictive, but you've got a cool thing there.
Re:1M pages/day on a single server (Score:2)
When I was worrying about such things, it was cheaper to get 4 low-end servers than one high-end as you described. Of course it's a bitch to manage multiple servers instead of just one.
And of course 100k pageviews per day (Which eventually grew to 300k on a different server) simply didn't justify anything more than a low-end server.
Re:Your question cannot be answered (Score:2, Insightful)
Amen to that.
Step one is to figure out what you mean by heavy traffic. Slashdot is probably at a couple million pageviews per day, and Alexa tells us that there are nearly 1500 sites bigger. A top-10 site will get circa 1000x what Slashdot gets.
In step two, figure out what kind of traffic you're dealing with. Most of Slashdot's page views are probably just hits on the front page or current article by guests, so they can be heavil
Re:Your question cannot be answered (Score:2)
Ultramonkey + LVS-Kiss + Mon (Score:3, Informative)
At my work we use Ultramonkey [ultramonkey.org] with LVS-kiss [linpro.no] and Mon [kernel.org].
Our hardware infrastructure includes 2 load-balancers running in a failover system with 3 web servers in the backend (1.8ghz, 512ram, 40gig hdd, 100mbps network) systems. That hosts over 60 million page views a month, it also supports real-time failover. For monitoring there are tools out there that use MRTG/RRD for cluster statistics.
Obvious answer... (Score:3, Informative)
Disclaimer, I work for Netscaler, but the customers we have gained should help in your decision.
Re:Obvious answer... (Score:2)
What the tests prove. (Score:1)
Re:What the tests prove. (Score:1)
That's probably not true. The processor is only one component in a system, and it's often not the bottleneck. Also, since then there have been substantial changes in web servers, kernels, and all sorts of hardware that goes around the processor. Further, that page only talks about static content; it doesn't tell you anything about the dynamic content.
Ordering hardware based on theoretical calculations from formulas in six-year-old arti
Pound (Score:2)
-psy
Outsource to geocities (Score:2)
Nothing like a Hardware-based load-balancer (Score:2)
Some examples here. [cisco.com] The examples are heavy on Corporate speak, but you were asking about a large Web/Content architecture, right?
Popular sites configs (Score:1)
Re:Popular sites configs (Score:2)
Number 958
Spammers Heaven
Served by Shared Hosting
Apache/1.3.27, Red-Hat/Linux, PHP/4.3.3
phpBB
Can't they afford a dedicated server yet?
Round Robin DNS (Score:2)
Take a look at F5 (Score:2)
But before I switched, I got demos from all three players and put them in a head-to-head contest. I would suggest doing the same. In a lab setting, we couldn't hit the devices hard enough to pick a clear winner based on performance. When I looked at administration and features, the F5 pulled to the front.
The GUI is clear and concise
Get a hardware load-balancer (Score:2)
One mistake that I see lots of people make is use a PC-based load-balancer. A hardware device (Foundry ServerIron, Nortel Alteon, Cisco CSM, etc) is well worth the money (especially if you get it on ebay).
A few articles from AnandTech.com (Score:2)
Zeus ZXTM (Score:2)
There isn't one answer. (Score:2)
A good starting place is to just measure it by testing how long it takes to serve a page like what you are expecting to be