Ask Slashdot: Building a Web App Scalable To Hundreds of Thousand of Users? 274
AleX122 writes "I have an idea for a web app. Things I know: I am not the first person with a brilliant idea. Many others 'inventors' failed and it may happen to me, but without trying the outcome will always be failure. That said, the project will be huge if successful. However, I currently do not have money needed to hire developers. I have pretty solid experience in Java, GWT, HTML, Hibernate/Eclipselink, SQL/PLSQL/Oracle. The downside is project nature. All applications I've developed to date were hosted on single server or in small cluster (2 tomcats with fail-over). The application, if I succeed, will have to serve thousands of users simultaneously. The userbase will come from all over the world. (Consider infrastructure requirements similar to a social network.) My questions: What technologies should I use now to ensure easy scaling for a future traffic increase? I need distributed processing and data storage. I would like to stick to open standards, so Google App Engine or a similar proprietary cloud solution isn't acceptable. Since I do not have the resources to hire a team of developers and I will be the first coder, it would be nice if technology used is Java related. However, when you have a hammer, everything looks like a nail, so I am open to technologies unrelated to Java."
Plan9 from outer space. (Score:1, Informative)
It would be cool if you could get Plan 9 working. It's an OS that was designed around distributed computing from the ground up. So much so that the API is hardware agnostic. It doesn't matter what hardware you are running or where it exists. All resources in the cluster are shared automagically. You would need some distributed rackspace in strategic global locations.
Step one is making a small lab with junk computers.
Step two is testing your application in this environment.
If you can get the backend running on Plan 9 then you can start renting servers, installing Plan 9 and adding these servers to your existing cluster. At some point you will be able to turn off the computers at your house and the app will keep running on the remaining cloud servers. It's a pretty sweet idea.
http://plan9.bell-labs.com/plan9/ [bell-labs.com]
It's kinda like UNIX.
Good luck.
Re:Heroku (Score:5, Informative)
OpenStack. You can start with a hosting provider like Rackspace that has as a faithful implementation of it. I know they were recently pinged for some incompatibility, but they have vowed to fix that. If you still can't stomach it, choose a different OpenStack provider. OpenStack is the key.
When you get really big, then you can work on running your own datacenter or paying someone to host the hardware for you (again, Rackspace, DreamHost, etc.). Then you can put your own implementation of OpenStack on the hardware with all the customization specific to your needs. This will naturally build on top of your years of investment with the vanilla OpenStack when you were smaller. The progression path is laid out for you.
I'm replying to this parent because Heroku is also an excellent choice for scaling where you pay as you grow. I'm just not sure if you can later fork Heroku to suit your needs with the datacenter supplier of your choice.
Re:Silly priorities (Score:4, Informative)
It wasn't the website, it was the backend that had problems.
I remember they had started with Ruby on Rails which is notorious for being able to get you up fast and then failing to scale.
They then offloaded parts of the infrastructure to Scala of all things.
http://blog.redfin.com/devblog/2010/05/how_and_why_twitter_uses_scala.html [redfin.com]
Scala is interesting and has some good paradigms built in to the language for the things Twitter needs to do. Not sure if it is really fundamentally better than Java though - after all it runs on the same JVM.
Anyway if I was starting something like this out and I already knew Java I would go with Java. There are enough large sites running it, and there are a lot of people out there who know it so I would feel some confidence that I could do what I needed to do.
Plus I like static typing.
Re:Java is SLOW (Score:4, Informative)
Many sites with very very large userbases use Java extensively in their stack. Including eBay, PayPal, Amazon, Tumblr, LinkedIn and Google.
Millions of page views a day is a small to medium e-commerce site. I was doing a million with Perl back in 2002 on a two CPU 1U machine.
Tumblr gets something close to a billion, as does anyone in the top 100.
API first (Score:4, Informative)
Write your public and private Apis first. Then implement them quick and dirty. Get feedback. Get users. Keep working on the API to make improvements. As you get more traffic hire good people to reimplement those same APIs on a better tech stack. Runs and repeat. You can even mix and match platforms, just use a smart routing proxy like HAProxy to send requests to the appropriate places. Static files go to a CDN, logins can go to something small but secure, high volume requests can go to a big cluster or IaaS like Amazon or Google for on demand scaling.
API first.
Re:Silly priorities (Score:5, Informative)
"They then offloaded parts of the infrastructure to Scala of all things.
http://blog.redfin.com/devblog/2010/05/how_and_why_twitter_uses_scala.html [redfin.com]
Scala is interesting and has some good paradigms built in to the language for the things Twitter needs to do. Not sure if it is really fundamentally better than Java though - after all it runs on the same JVM."
Disclaimer: I was a developer at Twitter until last year.
From the point of view of scalability, Scala is so much more advanced than Java it's not even funny. Ultimately, this boils down to the adoption of immutability as a core concept of the language. In particular, Scala's approach to concurrency is a decade or more ahead of what's in use in Java. Finagle, Twitter's async RPC system, simply wouldn't have been deliverable in a language that makes the use of Futures as difficult as Java does.
"Plus I like static typing."
Scala is statically typed.
Re:Silly priorities (Score:4, Informative)
Disclaimer: Another Twitter engineer here. What my apparently former colleague said, plus X.
Also: Don't be afraid to add caching layers when you see your web server or DBs start to run hot. Putting a memcached instance in place in "front of" your database layer is much easier than sharding the database layers to relieve load - eventually you'll have to do both, but you'll definitely want the memcache layer first. Same with web caches/proxies - putting varnish or squid in front will take some pressure off before you need to implement load balancers.
Re:Heroku (Score:1, Informative)
Did you even bother to read the next sentence after that?