Forgot your password?
typodupeerror
Java Programming Software

Ask Slashdot: Building a Web App Scalable To Hundreds of Thousand of Users? 274

Posted by Soulskill
from the get-up-in-that-there-cloud dept.
AleX122 writes "I have an idea for a web app. Things I know: I am not the first person with a brilliant idea. Many others 'inventors' failed and it may happen to me, but without trying the outcome will always be failure. That said, the project will be huge if successful. However, I currently do not have money needed to hire developers. I have pretty solid experience in Java, GWT, HTML, Hibernate/Eclipselink, SQL/PLSQL/Oracle. The downside is project nature. All applications I've developed to date were hosted on single server or in small cluster (2 tomcats with fail-over). The application, if I succeed, will have to serve thousands of users simultaneously. The userbase will come from all over the world. (Consider infrastructure requirements similar to a social network.) My questions: What technologies should I use now to ensure easy scaling for a future traffic increase? I need distributed processing and data storage. I would like to stick to open standards, so Google App Engine or a similar proprietary cloud solution isn't acceptable. Since I do not have the resources to hire a team of developers and I will be the first coder, it would be nice if technology used is Java related. However, when you have a hammer, everything looks like a nail, so I am open to technologies unrelated to Java."
This discussion has been archived. No new comments can be posted.

Ask Slashdot: Building a Web App Scalable To Hundreds of Thousand of Users?

Comments Filter:
  • by Anonymous Coward

    http://www.codinghorror.com/blog/2010/01/cultivate-teams-not-ideas.html

    • ...unless you actually want to make any qualitative breakthrough. That would depend on ideas and individuals. But, yes, this is not likely to be a case for such approach.
      • by crutchy (1949900) on Saturday April 13, 2013 @06:32PM (#43442955)

        teams are much better at solving problems than individuals

        even this slashdot forum could be thought of as a sort of team, in that many people are coming together to address a problem

        ok there is no leadership and its full of trolls, shills and idiots... maybe it's not really a team... more like a committee... ok so you're probably doomed

        • by phantomfive (622387) on Saturday April 13, 2013 @06:56PM (#43443145) Journal

          even this slashdot forum could be thought of as a sort of team, in that many people are coming together to address a problem

          Good point. That's best argument against teams I've ever seen!

        • by Taco Cowboy (5327)

          teams are much better at solving problems than individuals

          Please correct me if I'm wrong ...

          Based on my experience of past few decades (from the 1970's) in the tech field, the conclusion that I get is the reverse

          Teams are much better of IDENTIFYING problems

          On the other hands, people are much better at solving problems when they are in the "individual mode", than when they are part of a "committee", aka "teams"

          As I said, I may be wrong, and if I do, please correct me

          Thank you !

    • by MightyYar (622222) on Saturday April 13, 2013 @06:46PM (#43443077)

      Yup, his best bet is to find a good dick-head business type to partner up with and spilt 50/50 (or less if necessary). Edison died famous and rich. Much smarter men have died penniless and frustrated. Find an Edison and be his Tesla - but be smart enough to stake your claim in black and white.

    • by Anonymous Coward

      One company I work for (until I found a sweeter place elsewhere) got rid of their entire dev staff except for the top level designers. An offshore dev team gives guarenteed results, low bugs per line count, and actual contracts to say that. As an added bonus, the parking garage doesn't smell like BC bud anymore.

      You might give them, or another offshore place a call. They may be able to get what needs done, with little QA, for pennies on the dollar than it costs to hire people locally.

  • http://evergreen-ils.org/opensrf.php [evergreen-ils.org] I do not know much about it. Here is Dan Scott's first paragraph form his 'Easing Gently into OpenSrf' article: OpenSRF is a message routing network that offers scalability and failover support for individual services and entire servers with minimal development and deployment overhead. You can use OpenSRF to build loosely-coupled applications that can be deployed on a single server or on clusters of geographically distributed servers using the same code and minimal co
    • http://evergreen-ils.org/opensrf.php [evergreen-ils.org] I do not know much about it.

      I was reading that as OpenSerf.php. What a great idea! All the help you need for a pittance, if anything at all. I believe serfdom is highly underrated and I'm glad to see people bringing it back!

      Seems like the submitter wouldn't take kindly to an open environment lest he lose all his yet to be found new found riches so let's get the BSD types on board with UnlimitedHordes! Just be sure no one knows what anyone else is

  • Heroku (Score:4, Insightful)

    by Anonymous Coward on Saturday April 13, 2013 @06:06PM (#43442791)

    Just use Heroku. Honestly you DO NOT need to worry about this problem. If you don't make enough money by the time you get 10,000 users to hire someone to solve this problem for you then your idea is not as great as you think it is.

    • Re:Heroku (Score:5, Informative)

      by Baby Duck (176251) on Saturday April 13, 2013 @06:26PM (#43442909) Homepage

      OpenStack. You can start with a hosting provider like Rackspace that has as a faithful implementation of it. I know they were recently pinged for some incompatibility, but they have vowed to fix that. If you still can't stomach it, choose a different OpenStack provider. OpenStack is the key.

      When you get really big, then you can work on running your own datacenter or paying someone to host the hardware for you (again, Rackspace, DreamHost, etc.). Then you can put your own implementation of OpenStack on the hardware with all the customization specific to your needs. This will naturally build on top of your years of investment with the vanilla OpenStack when you were smaller. The progression path is laid out for you.

      I'm replying to this parent because Heroku is also an excellent choice for scaling where you pay as you grow. I'm just not sure if you can later fork Heroku to suit your needs with the datacenter supplier of your choice.

      • I was gonna mention that worrying about scaling before the app is built is a waste of time. Build the app. Get to your capacity. Modify app. Grow some more. Iterate that several hundred times and you build as you grow, you bill as you grow, and you scale as you grow. Don't fret worrying about how to serve to millions before you've served to thousands.
        • by rwa2 (4391) *

          Maybe his application is performance critical, and that's why the others in the domain have failed. I think performance / latency was always a priority with Google when they entered an already-crowded search engine space, and that was one of the main things that drew people to their service from the established competition.

          Also sounds like he already has his prototype app working on 2 boxes, and doesn't want to pull an EA by launching with that. It's non-trivial to just scale out N instances if they all h

  • Show me the users! (Score:5, Insightful)

    by Anonymous Coward on Saturday April 13, 2013 @06:08PM (#43442793)

    Before going all-out to reinvent the wheel on yet-another-next-big-thing web app, why not roll out a proof-of-principle version letting someone else competent do the "heavy lifting" back-end work. Use an existing cloud/hosting service like Amazon EC2 (they'll do a lot better on the basic back-end stuff than your "I'm incompetent but building a cloud app anyway" approach). After you get your first hundred thousand users, and have investment rolling in by the gazillions, then you hire your own crack team of cloud experts to design your own custom back-end solution (or just sell out for a couple hundred million to whatever group of suckers thinks your zero-dollar-per-user profit model will start paying off once they hit the million-user mark).

    • Sounds right to me. Protoype quickly. Don't worry about scaling. Get it out there and see if it works. If it does, then you worry about scalability.

      • by ATMAvatar (648864) on Saturday April 13, 2013 @07:49PM (#43443429) Journal

        This. The submitter has made an assumption that there will be hundreds of thousands of users. There might not. The only sure thing is that if he spends all his time trying to build a platform capable of serving hundreds of thousands of users right out of the gate, the project will probably fail before a single user sees it.

        Remember: not even Facebook, Twitter, or eBay started off with platforms capable of handling their current load. They all started with something quick and built things out as their respective user bases grew.

        • Re: (Score:2, Insightful)

          by durdur (252098)

          True enough, but you do not want to have the issue where the first sign of your success is your website failing. Early users get turned off if the service is flaky. So you can't just throw up a free website and wait to see when and where it crashes. A little planning is always good and so is a good reasonable starting architecture. That would include for example designing from the start for running with multiple backend servers behind a load balancer.

      • by dbIII (701233)
        You can see from the post above why professional engineers laugh at the self proclaimed "software engineers". Basket weaving on the fly is different to design.
        • by Nerdfest (867930)

          PEngs will soon be able to so the same thing they laugh at software people for when 3D printing is more mainstream. And they will use it for exactly that.

    • by mooingyak (720677) on Saturday April 13, 2013 @09:19PM (#43443797)

      Pretty much the same thought I had.

      Step 1 is to get a version that works for one user.
      Step 2 is to get more than one user.

      You're jumping a few steps ahead of the game.

  • by t00le (136364) on Saturday April 13, 2013 @06:10PM (#43442803)

    I would likely build a front-end using a couple HAProxy load balancers hitting an Apache cluster running opencluster. Use red-black trees with mySQL and cluster a few databases across multiple locations. I would build the front-end with Python and html5, as well as using iphython for cluster controls and other fun stuff.

    In my case I have a rack of HP p-class blade servers that use an Amazon EC2 Centos box to route inside/outside of EC2. When we test something out we use my cluster at home, then when we roll an app or website out we keep it at my house. If the load gets high, then we simply modify the cluster to bring up slave web servers, cache servers, etc. In our case we build the backend first and can roll out an app or web service for very little money or resources, but if we have success with something we just leave it on EC2 since it can likely pay its own bills.

  • Silly priorities (Score:5, Interesting)

    by Anonymous Coward on Saturday April 13, 2013 @06:10PM (#43442809)

    Youtube was a lame app with basic mysql setup. Same with Facebook. When it took off, they hired gold people and fixed the scalability issues. Twitter didn't exactly put scalability first either.

    So get real. Don't worry about "hundred of thousands of users", but about getting something decent out there for users to try. If users come, you'll get scalablity sorted out.

    • Twitter didn't exactly put scalability first either.

      Which is strange to me, because Twitter seems like such a simple website, and yet they had massive scaleability problems. Even today, sometimes. I've wondered what is so difficult about their website that has caused them such problems?

      • Re:Silly priorities (Score:4, Informative)

        by the eric conspiracy (20178) on Saturday April 13, 2013 @07:10PM (#43443227)

        It wasn't the website, it was the backend that had problems.

        I remember they had started with Ruby on Rails which is notorious for being able to get you up fast and then failing to scale.

        They then offloaded parts of the infrastructure to Scala of all things.

        http://blog.redfin.com/devblog/2010/05/how_and_why_twitter_uses_scala.html [redfin.com]

        Scala is interesting and has some good paradigms built in to the language for the things Twitter needs to do. Not sure if it is really fundamentally better than Java though - after all it runs on the same JVM.

        Anyway if I was starting something like this out and I already knew Java I would go with Java. There are enough large sites running it, and there are a lot of people out there who know it so I would feel some confidence that I could do what I needed to do.

        Plus I like static typing.

        • Re:Silly priorities (Score:5, Informative)

          by Anonymous Coward on Saturday April 13, 2013 @10:53PM (#43444137)

          "They then offloaded parts of the infrastructure to Scala of all things.

          http://blog.redfin.com/devblog/2010/05/how_and_why_twitter_uses_scala.html [redfin.com]

          Scala is interesting and has some good paradigms built in to the language for the things Twitter needs to do. Not sure if it is really fundamentally better than Java though - after all it runs on the same JVM."

          Disclaimer: I was a developer at Twitter until last year.

          From the point of view of scalability, Scala is so much more advanced than Java it's not even funny. Ultimately, this boils down to the adoption of immutability as a core concept of the language. In particular, Scala's approach to concurrency is a decade or more ahead of what's in use in Java. Finagle, Twitter's async RPC system, simply wouldn't have been deliverable in a language that makes the use of Futures as difficult as Java does.

          "Plus I like static typing."

          Scala is statically typed.

          • Re:Silly priorities (Score:4, Informative)

            by rekoil (168689) on Sunday April 14, 2013 @02:59AM (#43444771)

            Disclaimer: Another Twitter engineer here. What my apparently former colleague said, plus X.

            Also: Don't be afraid to add caching layers when you see your web server or DBs start to run hot. Putting a memcached instance in place in "front of" your database layer is much easier than sharding the database layers to relieve load - eventually you'll have to do both, but you'll definitely want the memcache layer first. Same with web caches/proxies - putting varnish or squid in front will take some pressure off before you need to implement load balancers.

  • Start smaller (Score:5, Insightful)

    by bfandreas (603438) on Saturday April 13, 2013 @06:12PM (#43442815)
    Do not plan for hundreds of millions of concurrent users at once right off the bat. That's the very common error a lot of startups make. You do not have such a large userbase. It will take some time until you have.
    Think smaller and scale up when your idea takes off. Set yourself concurrent user milestones when you rethink your architecture. You will also have to rethink the iron your stuff runs on and that may dictate what kind of technology you use when you reached your hundreds of millions goal.

    Technology is interchangeable. It's a tool and you choose the best tool for the job and at the moment you have no users and might as well start off with the usual suspects. JSP/Struts, JSF, whatever you are most comfortable with. If in the long run you do find that this is not sustainable and you need to shift to another technology then you can hopefully afford to hire people who know it.

    You really, really should set yourself userbase milestones, plan ahead for reaching them and be prepared when you reach them. For that you need a lot of information. Log how much time users spend on what functionality you offer because this also has an impact on your UI design when you go big. It also has impact on what technology(-ies) you use.


    I usually bill big when I give advice such as this and help setting up a plan when to do what. Your problem is less one of technology but a business one. Think like a businessman first and like a techie second.
  • by jwkane (180726) on Saturday April 13, 2013 @06:14PM (#43442829) Homepage

    Java... ok, why not. I would take a look at Cassandra and Zookeeper to get the ball rolling. You'll need a good load balancer; nginx or haproxy since I don't know of a good one in Java. I assume a bunch of tomcat servers for the actual app. I suppose jboss messaging to keep with the java theme.

    You can get all that on one machine for development, then for deployment you can flexibly adjust the number of db servers, queue servers, load balancers and app servers based on anticipated load. If you're extra-cool you can deloy to a cloud and dynamically allocate servers as-needed.

    Been there, done that. Got the t-shirt. It's fun. Enjoy it.

    Spend an extra day or two thinking about exactly how you're going to handle logging. It will be worth it.

  • OTOH, it could mean success in another idea. The problem is you will never know unless you try everything, and resources usually limit that. You have to decide.

    You may be able to do proof of concept with a couple cheap servers. But if it succeeds, time will be extremely short to go to full scale. So you need to think scale up front, but in a way that works downscaled as well. Do everything agile so every component can work all on one server, or separated on many. Use a distinct hostname for everything

  • by Kagato (116051) on Saturday April 13, 2013 @06:16PM (#43442847)

    Can't do it yourself, then get partners. Set up an equity agreement.

    As far as tech this is no longer new territory. Create server images for a cloud host such as AWS or Rackspace. Bring them up or down with Chef. Concerned about Database? Figure out if you really need a relational database. If not look at a high performance NoQL DB or something that is more or less always in Memory (such as Mongo).

    • by Ash-Fox (726320)

      Following your example of using Mongo, I am interested in hearing how you counter the arguments in this article [hackingdistributed.com] and why Mongo is a better fit for any task, please.

  • Sounds like you may want to check out hosting your stuff over a VPS, maybe with Hawkhost (http://www.hawkhost.com/vps-hosting) or some similar provider?

    I guess the general idea is that you'd want to install / set up your own OpenStack (cloud) solution, and then scall VPS coverage if you need it, without having to install / clone over multiple machines. Check out Openstack and Java integration. As far as I know there's an SDK available: https://github.com/woorea/openstack-java-sdk [github.com], but I'm not sure how compl

  • by kasperd (592156) on Saturday April 13, 2013 @06:20PM (#43442871) Homepage Journal
    This sounds very much like premature optimization. You may end up designing a very scalable application and have the project fail due to too few users. If the actual number of users turn out to be an order of magnitude less than what you can handle on a single host, then all that scalability work was wasted. I think you have better chance of success with a quick proof of concept, which isn't very scalable.

    It is ok to think about scalability before you have the users. But don't waste time implementing the scalable solution for a non-existing user-base.
    • by Kjella (173770) on Saturday April 13, 2013 @07:17PM (#43443259) Homepage

      Not to mention scalable is also relative, if you are a smash hit and need to upgrade fast you can get a 10G link to the backbone with an 8-socket Xeon E7-8870 server, a ton of memory and a RAID array of SSDs as a pretty damn good stop-gap, which I assume you can't afford now since you can't afford to hire developers. There's probably a bunch of other optimizations you can do too in order to offload parts to other machines when you get that far. This is like asking "Will the wind resistance of my afro keep me from breaking the world record on 100 meter dash?", start caring about that when you get below 10 seconds not when you're considering a running career and don't count getting a haircut as the first step of the way.

    • by UnknownSoldier (67820) on Saturday April 13, 2013 @07:30PM (#43443339)

      Agreed. This guy doesn't really understand scalability.

      The OP needs to read how Plenty of Fish started off:
      http://highscalability.com/plentyoffish-architecture [highscalability.com]

      * PlentyOfFish (POF) gets 1.2 billion page views/month, and 500,000 average unique logins per day. The peak season is January, when it will grow 30 percent.
      POF has one single employee: the founder and CEO Markus Frind.
      * 30+ Million Hits a Day (500 - 600 pages per second).
      * 1.1 billion page views and 45 million visitors a month.
      * Has 5-10 times the click through rate of Facebook.
      * 2 load balanced web servers with 2 Quad Core Intel Xeon X5355 @ 2.66Ghz), 8 Gigs of RAM (using about 800 MBs), 2 hard drives, runs Windows x64 Server 2003.

      And also about NginX:
      http://www.aosabook.org/en/nginx.html [aosabook.org]

      If you "need" multiple servers when you are first _starting_ out you're probably focusing on solving the wrong problems.

      • by Nimey (114278)

        That was in 2009. I certainly hope he's not still running Server '03, for starters.

    • It may be worth it to spend a little time thinking in peculiarities of you data that may greatly reduce scalability problems. For instance:

      1.- If your data or user base can be easily partitioned
      2.- If you can get away with low consistency semantics

      If you can find a nice architectural design that has any of these characteristics, many bottlenecks can be removed and scaling up in the future may prove easy. In those cases, there are abundant technology solutions that you could pick up in the future.

      So do

      • by jgrahn (181062)

        It may be worth it to spend a little time thinking in peculiarities of you data that may greatly reduce scalability problems. For instance:

        1.- If your data or user base can be easily partitioned
        2.- If you can get away with low consistency semantics

        If you can find a nice architectural design that has any of these characteristics, many bottlenecks can be removed and scaling up in the future may prove easy. In those cases, there are abundant technology solutions that you could pick up in the future.

        Or perhaps this idea of his doesn't have to be centralized at all. It seems to be a knee-jerk reaction today to "N users will want my FooBar idea, therefore I need one big www.foobar.com web application which handles all of them". Might be true given the FooBar idea -- or might not.

        Usenet, Git and BitTorrent are some counter-examples.

      • by kasperd (592156)
        1. If your data or user base can be easily partitioned
        2. If you can get away with low consistency semantics

        I agree, those properties make scalability much easier. There is another possibility, which is if your data is mostly static. If you can simply copy your data to a bunch of servers and be done with it, then scalability is easy.

        The real killer is if you have strong consistency requirements, and you have users worldwide, and data cannot be partitioned since users around the world need to read and modify t

  • by joshv (13017) on Saturday April 13, 2013 @06:20PM (#43442873)

    Probably the worst thing you can do is start with some complex clustered architectural design.

    Just start on a single server with technologies that are scalable, and design with future scalability in mind. Also design in the ability to capture detailed performance metrics of every tier. When, and if your application usage grows, scale the parts of it that need scaling.

    The biggest issue with scaling is usually the database, and for applications where you are just using the database as a simple persistence store for user settings and simple small data sets, you are probably best to go with one of the many scalable "NoSQL" type solutions such as MongoDB, as they've got scalability baked in for free. If you're trying to run heavy duty analytics that join and aggregate massive datasets, there are single DB clustering solutions, but they aren't cheap. You can always scale out SQL databases horizontally, but then you've got issues cloning and replicating, though there are a lot of products in that space, both free and commercial. A cheap place to start would be with PostgreSQL, which appears to have multiple open source replication products.

    I don't think there is anything inherently limiting to sticking with Java. It's what you know, and the toolsets are deep and rich. No, it's not the hot new thing, but sometimes that can be a good thing.

    • by Ash-Fox (726320)

      The biggest issue with scaling is usually the database, and for applications where you are just using the database as a simple persistence store for user settings and simple small data sets, you are probably best to go with one of the many scalable "NoSQL" type solutions such as MongoDB, as they've got scalability baked in for free.

      Are you a troll, malicious or just plain not knowledgeable?

  • by phantomfive (622387) on Saturday April 13, 2013 @06:22PM (#43442879) Journal
    Facebook did it on PHP. I sure wouldn't have used that, but it shows you can do more with basic technologies than you would expect.

    The Java environment was built for that kind of thing, Spring, Hybernate, etc, so if you build in that, you can be reasonably sure your system will be scaleable.

    Keeping session state in RAM will make your life harder.

    Even with a 'slow' technology, you can always add more servers. The difficult bottleneck is the database, and that can be an intractable problem depending what your goal is.
    • by dbIII (701233)
      I've had to upgrade a pile of desktop computers that were handling work related tasks with no problems but were brought to their knees once a web browser was pointed at that piece of shit Facebook was initially. They got a lot of things wrong to start with, maybe some from crap PHP, others from spitting in the face of web standards to shove more ads down people's throats. Facebook's success is due to selling the concept to advertisers and not due to their crappy initial implementation or whatever it is no
      • I'm sure those computers wouldn't handle facebook any better now. There's a lot on those pages.
        • by dbIII (701233)
          The thing that pissed me off the most back then was the forced loading of everything on those pages every minute and a variety of tricks designed to prevent proxies taking the load (eg. pretend the page is ten years old so needs a refresh - then once you circumvent that trick they add another). Now computers are faster and bandwidth is cheaper but there's still a lot of it as antisocial to the internet as viagra spam in many ways.
          Having to pay a gouging telco monopoly an extra hundred a month just so that
  • by Anubis IV (1279820) on Saturday April 13, 2013 @06:32PM (#43442947)

    If you're aiming for as many users as you say, then it'll take awhile to get there and you'll have plenty of time to hire folks along the way. At that point, you can go ahead and worry about re-architecting everything. First things first though, especially if you're by yourself: get it up and running with whatever technologies you do know. Once it starts to take off, you can hire people to rewrite it and redesign it around best practices.

    It's not the simplest path, but without bringing in outside investors who'll have the capital to allow you to hire the team it sounds like you need, I don't see what choice you have.

  • by chrylis (262281) on Saturday April 13, 2013 @06:38PM (#43442995)

    While there are a number of good tools out there for working with scalability, more important than any particular tool is building your application in such a manner that it's easily parallelizable. In a Web app, a core principle to keep in mind is that the more stateful the application server-side, the more difficult it is to scale, and so designing your application tiers in such a way as to decouple requests is key. Limit the amount of session state the server has to keep track of, and you'll be able to load-balance request handling smoothly.

  • by leonardop (532098) on Saturday April 13, 2013 @06:38PM (#43442997)

    I salute you for your ambition and determination. I hope you get to realize your vision.

    Now, as I read your question, I remembered an interview I saw a few days ago with Ben Kamens, one of the engineers working at Khan Academy, talking about scalability and things like how they manage their operation and the spikes of growth they have experienced in the past. It's a little light in technical details, but you may find it interesting: Root Access: How to Scale your Startup to Millions of Users [youtube.com].

    One thing I'd like to mention is that when you hear someone else talk about the things they've done and how they have done it, it's easy to see it as an advertisement for a particular technology platform (AppEngine and other Google machinery in the previous video, for example), but that's not the thing to focus on. Whatever choices other people have made, the good thing is that their advice can be useful no matter what choices you end up taking. I know this seems like such a trivial thing to say, but evidence suggests that a number of people miss this basic concept, and then discussions quickly degenerate into pointless noise about concrete technologies, instead of the ideas.

    I'd also recommend that you pay a visit to Google Developers youtube channel [youtube.com] and type something like "scale" or "scalability" in the little channel search box. You might learn a few things from some really smart people who have confronted very real situations regarding scalability.

    Best of luck to you, my friend.

  • This is what you want: https://github.com/AppScale/appscale/wiki [github.com]
  • make it work first, unless what you build the first time around is really an unholy mess you will be able to scale and upgrade as you grow much better than you can predict future hotspots on a system that isn't even running yet.
  • You haven't said anything about the problem. If you want ease of scaling go with a pure functional language. Functional languages will force you to isolate state issues. Isolate state and you can operate in total parallel. But... generally you have shared objects which are mutable across the users. So you'd end up with very little meaningfully isolated. Those shared mutable objects are what is creating the scaling complexity. The language or technology doesn't solve that, though it can make the sol

  • I suggest you look at the CQRS pattern. A good Java implementation is http://www.axonframework.org/ [axonframework.org]. The advantage is the CQRS pattern that it is fairly simple, but highly scalable. So you can start small and simple with the confidence that you can tweak and optimise in the future to scale as required. There are good tutorials and support too. My team is using it for an industrial application and we have found that it has been very robust. It might take a bit of work to get your head around the concepts, b
  • by paugq (443696)

    I like C++, therefore I use Wt [webtoolkit.eu] for webapps. Great performance and scalability, great for embedded systems, great for huge systems, great when using third-party libraries (you can use any C or C++ library), etc.

  • Azure (Score:2, Interesting)

    by akb (39826)

    Sounds like you want a PaaS provider that doesn't lock you in to a platform. I have a similar problem to you (PHP not Java) and I rejected AppEngine for the same reason as you. To my surprise I am leaning towards Azure, Microsoft's cloud offering. Their website service allows you to write your web app in a few different frameworks without having to customize it for their platform and then only pay for what resources you use. Management is as simple as manipulating sliders to how many resources you are w

  • Any of the cloud providers are great for this. You can start with a free micro image from Amazon maybe during development phase if you have to start dirt cheap, and go up from there. Any of the cloud providers will let you scale as far as you need. That part is a no brainer. "Thousands of users" is a little vague. Depends totally on how many of them are active at the same time and intensive is what they are doing. I would think potentially something like a small 1 gig image might handle this in the low end
  • kiss (Score:4, Insightful)

    by crutchy (1949900) on Saturday April 13, 2013 @07:50PM (#43443437)

    keep it simple stupid

    the more complex you make the app, the bigger the load on your infrastructure and bandwidth

    if you follow google's lead, they developed everything in house. same with pixar, which develops software to handle very high end graphics performance, and even linux started off by taking a problem and solving it with a home grown solution

    if you want a specialized application to handle that many users without running into software performance issues (nevermind server infrastructure and bandwidth, which can probably be gradually improved), you want to make it efficient... so you will probably need to develop it yourself

    if you use off the shelf packages like wordpress and the like, they are full of all sorts of features that you might not need but will still pay for performance-wise

    many people will try to tell you that there is no point reinventing the wheel and that existing wheels will always be better than anything you can come up with, but they are full of shit. if everyone stuck with that ideal we would all have wooden wheels on our cars. there is a lot of merit in reinventing wheels, not only to make better wheels, but in understanding wheels to learn how to better use them. be a little selective about where you want to start customizing from... i wouldn't recommend reinventing the operating system, although google did (based on the linux kernel) and they are reaping the rewards of a more efficient search platform than might otherwise have been possible.

    if you're handy with microcontroller programming you might be able to make a pretty efficient microcontroller-based server cluster, sort of similar to what HP is doing with their new SOC blade technology. microcontrollers and SOC are the future, so if you want to get involved in future tech today, pay attention to what is going on with ucs... a simple example is sheevaplugs and its derivatives. this is also where linux probably has a major leg up on windows because microsoft has been so focused on the x86 platform that (even with the recent release of WIndows RT) they are lagging a ways behind linux in multi-architecture support (have to wonder how much of the linux kernel has been plagiarized in WinRT).

    other things that affect scalability and performance include the efficiency of algorithms... if you haven't done a CS degree, go onto youtube and watch lectures on data structures and algorithm optimization. there are free CS lecture series from MIT and UNSW that I know of. Richard Buckland of UNSW also makes the lectures a little less boring with his antics.

    how you develop your app will also depend on your goal to get 100,000+ users on the site...

    security is probably the hardest and most significant hurdle you'll face... if you fuck security up (either the app isn't secure enough or it's a pain in the ass for users to authenticate) then your app will be a flop

    you also need to think like a user, not like a developer... this is probably where having a small team will help at some point (a few eyes with different perspectives)

    many developers fall into the trap of developing software that is easy for the programmer and thinking that the user will get used to it... which is fine if you have a monopoly. unfortunately by the time you have 10,000 users, your idea will be copied to create competition, and if they do a better job with the user experience you're dead in the water.

    make sure you are standards compliant. use the HTML 5 and CSS 3 validators, but i would recommend avoiding features that aren't also in HTML 4.01 and CSS 2.1 until HTML 5 and CSS 3 become fully implemented and debugged. the exception would be that if you want a feature that would otherwise require flash or java, use html5 instead of flash. if you want 100,000+ users, don't use flash or java!

    i would use a linux distro such as debian with all the fat trimmed. it should be obvious, but don't use a WISA stack.

    keep your service clear of advertising, 3rd party cookies and any 1x1 hidden iframes. don

  • Regardless of which language or platform you use, a common bottleneck for web applications is the database resource. Most developers don't take large scalability into consideration when building the service architecture. If you plan to scale large in the future, I recommend you stop thinking of the database as the main source for all queries in your system. The basic idea is that costly and complex queries/searches can be given to an external scalable service. Take for instance, the Solr project (http://lu
  • Google App Engine apps can be written in Python 2.5, 2.7, Java, or Go. If you ever want to move it to something else, I think you can just change the way it communicates to the new database -- the rest should be pretty portable.

  • You have a lot of good comments so far, but none particularly directed to your specific question. I have recently come to a framework that I *really* like, coming from a similar background to yours.

    node.js (server, business logic)
    nginx (web server, proxy to node for business logic)
    postgres (For relational/transactional data. There's a nice node.js driver for postgres)
    mongodb (For larger datasets that don't need the transactional stability or quite so structured data)
    Angular/Bootstrap with some jquery
    • by Ash-Fox (726320)

      Regarding your recommendation of mongodb, I am interested in hearing how you counter the arguments in this article [hackingdistributed.com] and why the other solutions are worse.

  • Based on my experience at a fortune 100 company with a heavy interest in Java. Don't use Java. Use PHP or LUA as a cgi. Your sysadmins who have to keep your application up will thank you.

    • Do not use java. To make it work rigth you have to go against everything the community says you should do.
    • Do not use NFS
    • For file storage use something like MogileFS. It is not likely the best, but it's a proper example of what you will want
    • If you use a database you MUST understand and use the relational aspects of thin
    • by bored (40072) on Saturday April 13, 2013 @09:39PM (#43443875)

      If you use a database you MUST understand and use the relational aspects of things. If you use the database as just a key:value store I will personally beat the ever living shit out of you.

      Like all simple rigorous rules. This is sort of bad advice in a lot of circumstances. Sure inventing your own hashing function and using the hashes as the keys in a relational DB is stupid. That said, focusing on the main relationships with your tables, and not trying to describe every single edge case will massively simplify the schema. Plus, there are tons of little pieces of information that often need to be persisted, that just don't tend to have any kind of obvious relationship to anything else in the schema. Being able to add key:value attributes on the fly in the code without screwing with the schema can be a huge bonus to initial productivity. Sure if at some point you discover common, frequently used attributes, or you have some kind of performance issue because your reading some value out of a key:value store frequently then by all means fix it.

      All that said, I'm not really a fan of trying to eak performance out of a databases. Use the database for what its good at, complex relationships, and easy storage/retrieval of information. But if your app is trying to do 500k updates per second to a single table, its probably a better idea to seek alternatives rather than throw a bunch of money at database hardware. I have my own mental rule, is this code path going to be a hot one? Yes, then no database queries. There are a ton of strategies for moving the queries/updates out of the paths that are performance sensitive.

  • API first (Score:4, Informative)

    by foniksonik (573572) on Saturday April 13, 2013 @08:36PM (#43443625) Homepage Journal

    Write your public and private Apis first. Then implement them quick and dirty. Get feedback. Get users. Keep working on the API to make improvements. As you get more traffic hire good people to reimplement those same APIs on a better tech stack. Runs and repeat. You can even mix and match platforms, just use a smart routing proxy like HAProxy to send requests to the appropriate places. Static files go to a CDN, logins can go to something small but secure, high volume requests can go to a big cluster or IaaS like Amazon or Google for on demand scaling.

    API first.

    • Re:API first (Score:4, Insightful)

      by c0lo (1497653) on Saturday April 13, 2013 @10:56PM (#43444149)

      Write your public and private Apis first. Then implement them quick and dirty....

      API first.

      So true, it can't be stressed enough. Supplementary:

      1. when considering API-s, consider them in term of service interfaces: even better if these services are stateless.

      2. implement the services as different processes, exchanging data in whatever serialization format you fancy (Java serialization, JSON, Google's protocol buffers). Use the quick-and-dirty for their first cycles of implementation: as long as you maintain the interfaces unchanged, one can later come and re-implement them better.

      3. pay attention to what needs to be shared across the whole system and what can be divided/partitioned on different hosts.
      E.g. highly probable that "subscription info/user identity/login services" may need to be supported by a single "database" but, once the user finishes the login, she gets her data from a storage hosted else, supported by whatever later development cycles would find appropriate (of course, at later stages, one will need to implement a "registry" mapping a user identity to where the data is stored. But the first implementation can use a single database for the data of all users as long as you do not tie in the login service with other services

  • Break your whole project down into little tiny modules, with a configuration file to provide various host names when you start breaking things up onto different hosts. And, where possible, use wrapper functions for things like DB calls.

    This way when you move from mysql to postgres to oracle to NextBigDBPlatform you change the one wrapper function, not every part of your code. When you see that Java isn't the best tool for a particular job, re-write the small module in charge of that job in some other lang

  • by MarkRose (820682) on Saturday April 13, 2013 @11:13PM (#43444199) Homepage

    As someone who has written an application that scales to over 1 billion requests per day, let me offer my thoughts.

    Scaling your application should be as trivial as launching more application server nodes. If you can't add/remove application nodes painlessly, you've probably done something wrong like keep state on them (this includes sessions).

    Don't worry about scaling your application layer at all (within reason). You can always throw more machines at the application side in a pinch, and for a long while it will be cheaper to add servers than to hire someone. When your application servers are costing you more than a salary, hire someone to find the hotspots in the code and make them faster. Until then it's a waste of your time.

    Scaling state, aka your datastores, is where the challenge lies. You need to spend a large amount of time sitting down and analysing every operation you plan to do with your data. SQL is great for a lot of things, but you will eventually run into a point where heavy updates make SQL difficult to scale. Mind you, decent hardware (lots of cores, RAM, and SSD) running MySQL should scale to several thousand active users if your queries are not expensive. The Galera patches to MySQL (incorporated into Percona XtraDB Cluster and MariaDB) can give you true high-availability, but you will still have write-throughput limitations.

    I would also highly recommend you look into Cassandra (especially 1.2+, with CQL 3), which was built from the ground up to scale thousands of low end machines that often fail (if you can't tolerate hardware failure, you messed up). Cassandra is more limited in the kinds of queries you can execute, more relaxed with data consistency, and more thought is needed ahead of time. On the other hand, it can also be used for global replication, which is something you are interested in. At the very least, having a good understanding of its data and query model will open your mind to the kinds of tradeoffs that must be made to enabling scaling.

    Contrary to what others are saying, you are correct to think about scaling now before you even start! Doing a rewrite is costly and expensive in money and time. Why set yourself up for that? Planning for scale before you start is the best time! If you start with a scalable datastore like Cassandra, and structure all your queries to work within its model, it is no more work than doing things in SQL, and you're way ahead of the game!

    The most important part is spending time modeling how you will access your data. Think about how you'll avoid hot spots (which make scaling writes difficult), and think about how to make reads fast by reading as little as possible. Think about caching, and how you'll invalidate the cache of a piece of your data without having to invalidate caches for things that didn't change. (Think about updating on data ingestion instead of running statistics later.) If you can't avoid hot spots, make only small reads, and cache independently, you are not done.

    Good luck!

    • by MacDork (560499)

      This is the best response I've read in the entire thread. I just wanted to add, you are probably okay with SQL if you are familiar with that and you're expecting "thousands of users simultaneously." Postgresql 9.2 can hit around 14,000 writes per second. [postgresql.org] I'm sure MySQL is similarly capable. If you need more than that, then you have to have go with something like Cassandra.

      Netflix has demonstrated Cassandra can hit 1.1 million writes per second [netflix.com] on Amazon's commodity hardware. You just have to be willing to

  • OP did not tell us WHAT he needs to scale.

    Front-end is a given. But that is relatively simple to handle. What else? Database? Business logic?

    Kind of hard to give advice when you don't know what you're giving advice for.
  • It's cool that you're interested in developing to an open standard, but I think it's worth noting that there are two kinds of proprietary platforms.

    The first is platforms like Google App Engine or Windows. These platforms lock you in, by forcing you to write your code to a certain API. If you decide you don't want to keep using this platform, it's really hard to move to something else. The bottom layer of the platform forces a lot of implementation details in the upper layers of the system.

    Then there are th

  • First things first... Do you have hundreds of thousands of simultaneous users?

    I worked at a shop where we did. I've known a lot of others who didn't, but their goal was so lofty. Some actually said "millions of simultaneous....", but only ever managed to get dozens, or even only one dozen at peak times, including themselves and their friends.

    Even at the shop where we had hundreds of thousands of simultanious users, they didn't start out like that. It grew in

  • You might want to have a look at Command Query Responsibility Segregation (CQRS) as a concept.
  • by bryan1945 (301828) on Sunday April 14, 2013 @05:04AM (#43445041) Journal

    If we're doing work for you, how much do we get?

  • 1st Rule on scaling: If you have a scaling problem, you don't have a problem.

    Wrong approach. Yes, many have said it and I'll say it again and it will remain true for all eternity.

    If you think you've got the next Google or Facebook up your sleve - well so be it.

    Build your app, use regular common sense when doing it and the rest just happens. I've handled upwards of 20 Million active users with user tracking and billing with a few thousand hits per second per product in an internet gaming company and I can t

  • You mentioned wanting to stick with open standards.

    I would point out that if this is ultimately to run as a business, you need to make decisions based on what's best for the business. Which may or may not be something based around open standards.

    Making a decision early on and sticking to it dogmatically even when there is no business benefit in doing so - and refusing to even contemplate alternatives simply because they're "not open" sounds dangerously close to operating a religion rather than a business.

  • Unless you have the money and management experience to start a company, hire developers, etc, then the technology that you'd hypothetically use is irrelevant. OTOH if you do have the money to hire developers with the right skills then the problem will solve itself.

    The notion of not having the money or management skills but somehow bootstrapping yourself up from nothing is almost certainly not going to happen. Even those who did start major companies without venture capital did so by borrowing significant mo

  • Not sure this is right for you, but it was once upon a time designed to match some of your buzzwords.

    http://sourceforge.net/apps/trac/reddwarf/ [sourceforge.net]

  • While I agree with everyone that says build small to start and scale up later as needed, the one caveat I'd give is whatever technology you use, design with the THOUGHT of clustering from the start. I've seen many designs fall down when scaled because, for example, the app used session too liberally and now session replication across clustered nodes is a serious problem.

    There's nothing that says you must use clustering later, there's other approaches, but if your app inherently can't be clustered because y

Machines certainly can solve problems, store information, correlate, and play games -- but not with pleasure. -- Leo Rosten

Working...