Please create an account to participate in the Slashdot moderation system


Forgot your password?
Java Programming Software

Ask Slashdot: Building a Web App Scalable To Hundreds of Thousand of Users? 274

Posted by Soulskill
from the get-up-in-that-there-cloud dept.
AleX122 writes "I have an idea for a web app. Things I know: I am not the first person with a brilliant idea. Many others 'inventors' failed and it may happen to me, but without trying the outcome will always be failure. That said, the project will be huge if successful. However, I currently do not have money needed to hire developers. I have pretty solid experience in Java, GWT, HTML, Hibernate/Eclipselink, SQL/PLSQL/Oracle. The downside is project nature. All applications I've developed to date were hosted on single server or in small cluster (2 tomcats with fail-over). The application, if I succeed, will have to serve thousands of users simultaneously. The userbase will come from all over the world. (Consider infrastructure requirements similar to a social network.) My questions: What technologies should I use now to ensure easy scaling for a future traffic increase? I need distributed processing and data storage. I would like to stick to open standards, so Google App Engine or a similar proprietary cloud solution isn't acceptable. Since I do not have the resources to hire a team of developers and I will be the first coder, it would be nice if technology used is Java related. However, when you have a hammer, everything looks like a nail, so I am open to technologies unrelated to Java."
This discussion has been archived. No new comments can be posted.

Ask Slashdot: Building a Web App Scalable To Hundreds of Thousand of Users?

Comments Filter:
  • by Anonymous Coward on Saturday April 13, 2013 @04:59PM (#43442743)

  • Heroku (Score:4, Insightful)

    by Anonymous Coward on Saturday April 13, 2013 @05:06PM (#43442791)

    Just use Heroku. Honestly you DO NOT need to worry about this problem. If you don't make enough money by the time you get 10,000 users to hire someone to solve this problem for you then your idea is not as great as you think it is.

  • Show me the users! (Score:5, Insightful)

    by Anonymous Coward on Saturday April 13, 2013 @05:08PM (#43442793)

    Before going all-out to reinvent the wheel on yet-another-next-big-thing web app, why not roll out a proof-of-principle version letting someone else competent do the "heavy lifting" back-end work. Use an existing cloud/hosting service like Amazon EC2 (they'll do a lot better on the basic back-end stuff than your "I'm incompetent but building a cloud app anyway" approach). After you get your first hundred thousand users, and have investment rolling in by the gazillions, then you hire your own crack team of cloud experts to design your own custom back-end solution (or just sell out for a couple hundred million to whatever group of suckers thinks your zero-dollar-per-user profit model will start paying off once they hit the million-user mark).

  • Start smaller (Score:5, Insightful)

    by bfandreas (603438) on Saturday April 13, 2013 @05:12PM (#43442815)
    Do not plan for hundreds of millions of concurrent users at once right off the bat. That's the very common error a lot of startups make. You do not have such a large userbase. It will take some time until you have.
    Think smaller and scale up when your idea takes off. Set yourself concurrent user milestones when you rethink your architecture. You will also have to rethink the iron your stuff runs on and that may dictate what kind of technology you use when you reached your hundreds of millions goal.

    Technology is interchangeable. It's a tool and you choose the best tool for the job and at the moment you have no users and might as well start off with the usual suspects. JSP/Struts, JSF, whatever you are most comfortable with. If in the long run you do find that this is not sustainable and you need to shift to another technology then you can hopefully afford to hire people who know it.

    You really, really should set yourself userbase milestones, plan ahead for reaching them and be prepared when you reach them. For that you need a lot of information. Log how much time users spend on what functionality you offer because this also has an impact on your UI design when you go big. It also has impact on what technology(-ies) you use.

    I usually bill big when I give advice such as this and help setting up a plan when to do what. Your problem is less one of technology but a business one. Think like a businessman first and like a techie second.
  • by kasperd (592156) on Saturday April 13, 2013 @05:20PM (#43442871) Homepage Journal
    This sounds very much like premature optimization. You may end up designing a very scalable application and have the project fail due to too few users. If the actual number of users turn out to be an order of magnitude less than what you can handle on a single host, then all that scalability work was wasted. I think you have better chance of success with a quick proof of concept, which isn't very scalable.

    It is ok to think about scalability before you have the users. But don't waste time implementing the scalable solution for a non-existing user-base.
  • by Anubis IV (1279820) on Saturday April 13, 2013 @05:32PM (#43442947)

    If you're aiming for as many users as you say, then it'll take awhile to get there and you'll have plenty of time to hire folks along the way. At that point, you can go ahead and worry about re-architecting everything. First things first though, especially if you're by yourself: get it up and running with whatever technologies you do know. Once it starts to take off, you can hire people to rewrite it and redesign it around best practices.

    It's not the simplest path, but without bringing in outside investors who'll have the capital to allow you to hire the team it sounds like you need, I don't see what choice you have.

  • by leonardop (532098) on Saturday April 13, 2013 @05:38PM (#43442997)

    I salute you for your ambition and determination. I hope you get to realize your vision.

    Now, as I read your question, I remembered an interview I saw a few days ago with Ben Kamens, one of the engineers working at Khan Academy, talking about scalability and things like how they manage their operation and the spikes of growth they have experienced in the past. It's a little light in technical details, but you may find it interesting: Root Access: How to Scale your Startup to Millions of Users [].

    One thing I'd like to mention is that when you hear someone else talk about the things they've done and how they have done it, it's easy to see it as an advertisement for a particular technology platform (AppEngine and other Google machinery in the previous video, for example), but that's not the thing to focus on. Whatever choices other people have made, the good thing is that their advice can be useful no matter what choices you end up taking. I know this seems like such a trivial thing to say, but evidence suggests that a number of people miss this basic concept, and then discussions quickly degenerate into pointless noise about concrete technologies, instead of the ideas.

    I'd also recommend that you pay a visit to Google Developers youtube channel [] and type something like "scale" or "scalability" in the little channel search box. You might learn a few things from some really smart people who have confronted very real situations regarding scalability.

    Best of luck to you, my friend.

  • by MightyYar (622222) on Saturday April 13, 2013 @05:46PM (#43443077)

    Yup, his best bet is to find a good dick-head business type to partner up with and spilt 50/50 (or less if necessary). Edison died famous and rich. Much smarter men have died penniless and frustrated. Find an Edison and be his Tesla - but be smart enough to stake your claim in black and white.

  • by TCM (130219) on Saturday April 13, 2013 @06:14PM (#43443239)

    You mixed up your wording.

    Do it right the first time, optimize for speed later. You don't want to find out you're unable to optimize because the design is flawed.

  • by Kjella (173770) on Saturday April 13, 2013 @06:17PM (#43443259) Homepage

    Not to mention scalable is also relative, if you are a smash hit and need to upgrade fast you can get a 10G link to the backbone with an 8-socket Xeon E7-8870 server, a ton of memory and a RAID array of SSDs as a pretty damn good stop-gap, which I assume you can't afford now since you can't afford to hire developers. There's probably a bunch of other optimizations you can do too in order to offload parts to other machines when you get that far. This is like asking "Will the wind resistance of my afro keep me from breaking the world record on 100 meter dash?", start caring about that when you get below 10 seconds not when you're considering a running career and don't count getting a haircut as the first step of the way.

  • kiss (Score:4, Insightful)

    by crutchy (1949900) on Saturday April 13, 2013 @06:50PM (#43443437)

    keep it simple stupid

    the more complex you make the app, the bigger the load on your infrastructure and bandwidth

    if you follow google's lead, they developed everything in house. same with pixar, which develops software to handle very high end graphics performance, and even linux started off by taking a problem and solving it with a home grown solution

    if you want a specialized application to handle that many users without running into software performance issues (nevermind server infrastructure and bandwidth, which can probably be gradually improved), you want to make it efficient... so you will probably need to develop it yourself

    if you use off the shelf packages like wordpress and the like, they are full of all sorts of features that you might not need but will still pay for performance-wise

    many people will try to tell you that there is no point reinventing the wheel and that existing wheels will always be better than anything you can come up with, but they are full of shit. if everyone stuck with that ideal we would all have wooden wheels on our cars. there is a lot of merit in reinventing wheels, not only to make better wheels, but in understanding wheels to learn how to better use them. be a little selective about where you want to start customizing from... i wouldn't recommend reinventing the operating system, although google did (based on the linux kernel) and they are reaping the rewards of a more efficient search platform than might otherwise have been possible.

    if you're handy with microcontroller programming you might be able to make a pretty efficient microcontroller-based server cluster, sort of similar to what HP is doing with their new SOC blade technology. microcontrollers and SOC are the future, so if you want to get involved in future tech today, pay attention to what is going on with ucs... a simple example is sheevaplugs and its derivatives. this is also where linux probably has a major leg up on windows because microsoft has been so focused on the x86 platform that (even with the recent release of WIndows RT) they are lagging a ways behind linux in multi-architecture support (have to wonder how much of the linux kernel has been plagiarized in WinRT).

    other things that affect scalability and performance include the efficiency of algorithms... if you haven't done a CS degree, go onto youtube and watch lectures on data structures and algorithm optimization. there are free CS lecture series from MIT and UNSW that I know of. Richard Buckland of UNSW also makes the lectures a little less boring with his antics.

    how you develop your app will also depend on your goal to get 100,000+ users on the site...

    security is probably the hardest and most significant hurdle you'll face... if you fuck security up (either the app isn't secure enough or it's a pain in the ass for users to authenticate) then your app will be a flop

    you also need to think like a user, not like a developer... this is probably where having a small team will help at some point (a few eyes with different perspectives)

    many developers fall into the trap of developing software that is easy for the programmer and thinking that the user will get used to it... which is fine if you have a monopoly. unfortunately by the time you have 10,000 users, your idea will be copied to create competition, and if they do a better job with the user experience you're dead in the water.

    make sure you are standards compliant. use the HTML 5 and CSS 3 validators, but i would recommend avoiding features that aren't also in HTML 4.01 and CSS 2.1 until HTML 5 and CSS 3 become fully implemented and debugged. the exception would be that if you want a feature that would otherwise require flash or java, use html5 instead of flash. if you want 100,000+ users, don't use flash or java!

    i would use a linux distro such as debian with all the fat trimmed. it should be obvious, but don't use a WISA stack.

    keep your service clear of advertising, 3rd party cookies and any 1x1 hidden iframes. don

  • by bored (40072) on Saturday April 13, 2013 @08:39PM (#43443875)

    If you use a database you MUST understand and use the relational aspects of things. If you use the database as just a key:value store I will personally beat the ever living shit out of you.

    Like all simple rigorous rules. This is sort of bad advice in a lot of circumstances. Sure inventing your own hashing function and using the hashes as the keys in a relational DB is stupid. That said, focusing on the main relationships with your tables, and not trying to describe every single edge case will massively simplify the schema. Plus, there are tons of little pieces of information that often need to be persisted, that just don't tend to have any kind of obvious relationship to anything else in the schema. Being able to add key:value attributes on the fly in the code without screwing with the schema can be a huge bonus to initial productivity. Sure if at some point you discover common, frequently used attributes, or you have some kind of performance issue because your reading some value out of a key:value store frequently then by all means fix it.

    All that said, I'm not really a fan of trying to eak performance out of a databases. Use the database for what its good at, complex relationships, and easy storage/retrieval of information. But if your app is trying to do 500k updates per second to a single table, its probably a better idea to seek alternatives rather than throw a bunch of money at database hardware. I have my own mental rule, is this code path going to be a hot one? Yes, then no database queries. There are a ton of strategies for moving the queries/updates out of the paths that are performance sensitive.

  • by durdur (252098) on Saturday April 13, 2013 @09:43PM (#43444109)

    True enough, but you do not want to have the issue where the first sign of your success is your website failing. Early users get turned off if the service is flaky. So you can't just throw up a free website and wait to see when and where it crashes. A little planning is always good and so is a good reasonable starting architecture. That would include for example designing from the start for running with multiple backend servers behind a load balancer.

  • Re:API first (Score:4, Insightful)

    by c0lo (1497653) on Saturday April 13, 2013 @09:56PM (#43444149)

    Write your public and private Apis first. Then implement them quick and dirty....

    API first.

    So true, it can't be stressed enough. Supplementary:

    1. when considering API-s, consider them in term of service interfaces: even better if these services are stateless.

    2. implement the services as different processes, exchanging data in whatever serialization format you fancy (Java serialization, JSON, Google's protocol buffers). Use the quick-and-dirty for their first cycles of implementation: as long as you maintain the interfaces unchanged, one can later come and re-implement them better.

    3. pay attention to what needs to be shared across the whole system and what can be divided/partitioned on different hosts.
    E.g. highly probable that "subscription info/user identity/login services" may need to be supported by a single "database" but, once the user finishes the login, she gets her data from a storage hosted else, supported by whatever later development cycles would find appropriate (of course, at later stages, one will need to implement a "registry" mapping a user identity to where the data is stored. But the first implementation can use a single database for the data of all users as long as you do not tie in the login service with other services

  • by MarkRose (820682) on Saturday April 13, 2013 @10:13PM (#43444199) Homepage

    As someone who has written an application that scales to over 1 billion requests per day, let me offer my thoughts.

    Scaling your application should be as trivial as launching more application server nodes. If you can't add/remove application nodes painlessly, you've probably done something wrong like keep state on them (this includes sessions).

    Don't worry about scaling your application layer at all (within reason). You can always throw more machines at the application side in a pinch, and for a long while it will be cheaper to add servers than to hire someone. When your application servers are costing you more than a salary, hire someone to find the hotspots in the code and make them faster. Until then it's a waste of your time.

    Scaling state, aka your datastores, is where the challenge lies. You need to spend a large amount of time sitting down and analysing every operation you plan to do with your data. SQL is great for a lot of things, but you will eventually run into a point where heavy updates make SQL difficult to scale. Mind you, decent hardware (lots of cores, RAM, and SSD) running MySQL should scale to several thousand active users if your queries are not expensive. The Galera patches to MySQL (incorporated into Percona XtraDB Cluster and MariaDB) can give you true high-availability, but you will still have write-throughput limitations.

    I would also highly recommend you look into Cassandra (especially 1.2+, with CQL 3), which was built from the ground up to scale thousands of low end machines that often fail (if you can't tolerate hardware failure, you messed up). Cassandra is more limited in the kinds of queries you can execute, more relaxed with data consistency, and more thought is needed ahead of time. On the other hand, it can also be used for global replication, which is something you are interested in. At the very least, having a good understanding of its data and query model will open your mind to the kinds of tradeoffs that must be made to enabling scaling.

    Contrary to what others are saying, you are correct to think about scaling now before you even start! Doing a rewrite is costly and expensive in money and time. Why set yourself up for that? Planning for scale before you start is the best time! If you start with a scalable datastore like Cassandra, and structure all your queries to work within its model, it is no more work than doing things in SQL, and you're way ahead of the game!

    The most important part is spending time modeling how you will access your data. Think about how you'll avoid hot spots (which make scaling writes difficult), and think about how to make reads fast by reading as little as possible. Think about caching, and how you'll invalidate the cache of a piece of your data without having to invalidate caches for things that didn't change. (Think about updating on data ingestion instead of running statistics later.) If you can't avoid hot spots, make only small reads, and cache independently, you are not done.

    Good luck!

  • Re:Heroku (Score:1, Insightful)

    by Jane Q. Public (1010737) on Saturday April 13, 2013 @10:18PM (#43444211)

    "OpenStack. You can start with a hosting provider like Rackspace that has as a faithful implementation of it."

    Ahem. Just 2 days ago an article discussed here on Slashdot [] pointed out that Rackspace is not compliant with OpenStack standards.

  • by Anonymous Coward on Sunday April 14, 2013 @01:10AM (#43444657)

    one of those people worried about where to hide all the gold they're going to have someday? "My app will be sooo successful! I need a team of people to make it for me because I don't know how!" how about you make your super successful app, and if anyone ever bothers to use it *then* worry about scaling it up, mmmmk?

    that's a terrible idea - the last thing you want is to have a terrible user experience (requests timing out etc) and deal with complete rewrite (which needs to be completed by yesterday) the moment your site becomes vaguely successful.

    This can easily kill your project at a critical moment - users don't like to hear "sorry, but we couldn't anticipate that (some blog/the local newspaper/...) would mention our site and people would actually try to use it; please come back in 3-6 months when we have rewritten it for scalability".

    Ignoring scalability "until I have users" is a great way to keep costs down while making sure that you cannot ever be successful. If you think that is the financially rational thing to do (because 99.9% of such projects don't succeed anyways) then you shouldn't sink money into a website that is (in your opinion) bound to fail at all. Butr if you are going to invest then you have to invest enough that you actually stand a chance at success (however slim that may be).

Error in operator: add beer