Ask Slashdot: Cloud Service On a Budget? 121
First time accepted submitter MadC0der writes "We just signed a project with a very large company. We are a computer vision based company and our project gathers images from a facility from PA. Our company is located in TN. The company we're gather images from is on a very high speed fiber optic network. However, being a small company of 11 developers, and 1 systems engineer, we're on a business class 100mb cable connection which works well for us but not in this situation. The information gathered from the client in PA is s 1½mb .bmp image, along with a 3mb Depth map file, making each snapshot a little under 5 megs. This may sound small, but images are taken every 3-5 seconds. This can lead to a very large amount of data captured and transferred each day. Our facility is incapable of handling such large transfers without effecting internal network performance. We've come to the conclusion that a cloud service would be the best solution for our problem. We're now thinking the customer's workstation will sync the data with the cloud, and we can automate pulling the data during off hours so we won't encounter congestion for analysis. Can anyone help suggest a stable, fairly price cloud solution that will sync large amounts for offsite data for retrieval at our convenience (nightly Rsync script should handle this process)?
BYOS (Score:2)
Re: (Score:3, Funny)
Is the data being generated 24/7? If so, that's 432 GB/day, pretty much exactly 12 hours worth of your 100 Mbps bandwidth. So some spooling is needed, but why in the cloud? The main goal would seem to be avoiding paying twice to move the data, so you'd want to avoid using through a 3rd party if at all possible.
1. The simplest solution would appear to be to put a laptop with a 500+ GB HD at their facility. A laptop because it essentially has a built-in UPS, and the CPU can sleep much of the time.
2. Devel
Re: (Score:1)
Or instead of the cloud, pay for a faster internet connection.
Or transport the data on HDs via Fedex if latency isn't a problem [I would use SSDs in a metal padded case knowing Fedex].
Re:BYOS (Score:5, Funny)
[I would use SSDs in a metal padded case knowing Fedex].
Fedex is like UDP, an unreliable delivery service. In fact there is only one fault of UDP it does not duplicate. Things can arrive broken, out of order, delayed, or not at all but I have never heard of Fedex delivering multiple copies!
Re:BYOS (Score:4, Funny)
I have. Sometimes Amazon messes up. This is how I have a copy of XCOM. :)
Re: (Score:1)
Your math is off just a little bit there.
60sec * 60min * 24hours = 86400 seconds in day
86400 seconds / 5 seconds = 17280 pictures
17280 pictures * 5 MB = 86400 MB
Or roughly 84.5 GB which should lower the bar even more for that 100Mbps connection.
Re: (Score:1)
Yeah, furthermore; the summary lists these as being a bmp and a depth map (which I assume is also raster data like a bmp). Although I am sure they don't want to compress these to something like jpeg which is lossy, something lossless like tar/gzip before they are sent would probably cut that down by a good margin (20% is likely and depending on the data maybe 50% or more). At that point you are talking something on the order of 50 gb per day which shouldn't cause any problems at all if you throttle the tr
Re: (Score:2)
rsync can easily compress the stream when sending and decompress when receiving.
This company could market their service as an appliance + service; have whatever computing and data storage power on-site, so only analysis would be sent over the network instead of the raw data.
Re: (Score:2)
What? (Score:1)
Re: (Score:2)
I'll be the one to say it... (Score:4, Insightful)
Re: (Score:1)
...WHY are you using BMP in the first place? Does whatever you're generating these on not have the processing capability to compress to PNG before transferring? I mean it SOUNDS like it'd save 10-20% off the total transfer...Anyways, what I'd do is I'd simply plop a server rack at the source that takes all the images for a given hour or whatever, tar.gz.bz2.whatevers them & send them over. Otherwise, I mean, Amazon wouldn't be TERRIBLE?
My (wild) guesses:
* can't ask HD security cameras their customers are using (e.g. 4 "face recog") to waste time with compression
* can't plop a server rack near any (and all) those security cameras, would make them too obvious.
Re:I'll be the one to say it... (Score:4, Interesting)
Re: (Score:1)
that's pretty much what we do in house, machine generating the image data stores locally and on schedule a server pulls the data, that server gets incrementally backed up offsite when no one is around.
problem with having systems rely on a network to function is you can halt production or loose valuable data (cause that massive failure that got to the customers customer WILL happen while your connection is down)
Re: (Score:2)
Sounds like... (Score:4, Interesting)
Re: (Score:3)
Or that the business and product owners under-priced the monthly contract with the client.
And what the heck does your internal network have anything to do with the performance of your product? Separate your general business network from your server network if not for performance or HIPPA, but for the day when one of your developers or unpatched machines do something to DoS your business.
Also, you might want to read up on MTU [wikipedia.org]. Large file transfers might be better served with an MTU larger than 1500.
Re: (Score:2)
MTU is enforced at the hardware layer, which he does not have total control over.
Better to change the window sizing, and use HPNSSH which will allow you set the window sizing. The developer of SSH made the decision that
Of course you have to own both sides to do that as well, and make changes to the stack... royal pain in the arse.
Easier to install WAN accelerators like Silver Peak, CISCO WAAS, or Riverbed Steelhead devices to get better performance and compression.
Re: (Score:2)
The developer of SSH made the decision that
...?
Re: (Score:2)
I have to agree that the host/server/bandwidth costs should be a relatively small factor on your calculation. Reliability, security and responsiveness really should be more important. The difference between top tier and bottom tier hosting/cloud is probably no more than a factor of 2 -- you can easily burn thru that savings with a couple of hours of downtime or a hosting vendor screw up.
If cost is really important, I'd get it working first at a top tier vendor and then overtime try to squeeze out costs--eit
Snail Mail and a hardrive (Score:5, Informative)
Re: (Score:1)
This is actually a good solution.
I am informed that google does this regularly to save money transmitting large volumes of data.
Re: (Score:2)
Re: (Score:2)
"never underestimate the bandwidth of a semi full of mag tapes".
Agreed! My recollection is that it was from Andrew Tanenbaum, and it was "never underestimate the bandwidth of a stationwagon full of tapes hurtling down the highway". http://en.wikiquote.org/wiki/Andrew_S._Tanenbaum [wikiquote.org]
Re: (Score:1)
Re: (Score:2)
Shuttling a couple hard disks back and forth every day of the week using overnight shipping would be a fairly expensive option. You would have to have at minimum 2 sets of disks, sending them both ways every day, the shipping costs alone would be high if you do this on a daily basis. We are talking 2x the daily overnight shipping costs for a 2 pound package, multiplied by an average 21 working days/month. I don't know what the typical costs for overnight shipping in the US these days, but let's say $25 per
Re: (Score:2)
You'd want to be able to save them for a week so you can retransmit if one of them gets lost in the mail.
-- hendrik
Re: (Score:2)
Shuttling a couple hard disks back and forth every day of the week using overnight shipping would be a fairly expensive option. You would have to have at minimum 2 sets of disks, sending them both ways every day, the shipping costs alone would be high if you do this on a daily basis. We are talking 2x the daily overnight shipping costs for a 2 pound package, multiplied by an average 21 working days/month. I don't know what the typical costs for overnight shipping in the US these days, but let's say $25 per shipment and $50/day. The monthly shipping costs work out to be $1050. And that does not include all the "manual" labor of copying data to/from the disks, packing, shipping paperwork, etc. The cost of the disks would be fairly trivial in comparison to the shipping costs.
Also, you would likely want a larger pool of disks to spread the failure rate, as all the bumps and shocks they receive every day being shipped back and forth is very likely to result in damage and short lifespan.
I totally agree on this method for one-time or infrequent large transfers, but I think you are creating more problems by trying to use this method for daily transfer of data.
You are correct about the logistics problems of this approach, but there are ways to make this more cost effective. As another poster mentioned, SD cards would be a good way to reduce the weight and fragility of the package. LTO tapes would be another option. If the OP can accept another day of latency, the Postal Service offers flat Priority Mail boxes that could ship 2-day instead of overnight for a little over $5. By investing in some more media to add a buffer to the system, the return trip could b
Re: (Score:3)
If a bigger pipe is too expensive, overnight shipping of a hard drive every day is going to be WAY too expensive.
Is Amazon S3 an option? (Score:2, Informative)
Assuming 5MB of data every 5 seconds, you're dealing with ~90GB of data a day. So, looking at Amazon's pricing model (http://aws.amazon.com/s3/pricing/), assuming you delete the data after you pull it, the storage total should be in the range of $0.095 * 90GB = $8.55/mo. Transfers into S3 are free. You'll be transfering ~2.7TB/mo out (90GB*30), at $0.120/GB, that's $324.00/mo in transfer fees.
Now, if that data isn't being accumulated 24/7 (ie. if it's only 8/5 for example), that lowers your monthly fees
Re: (Score:2, Insightful)
Dropbox is just a VAR for Amazon S3 [dropbox.com], so it couldn't possibly be cheaper. Most people don't know that half of Silicon Valley is running off Amazon AWS.
Re: (Score:2)
Alternatively, an $80/mo Linode (or similar) plan would cache 2 days of data (~200GB storage), offer some capacity to 'cook' it a bit before re-downloading (say, do some compression) and have enough transfer (8TB/mo) all in one shiny package. For pure storage, I think Dropbox and similar AWS-hosted services weigh in around the $60/mo mark at what would be needed.
Personally, I would spend money on an additional, dedicated Internet connection or (better) WAN tail to the customer and drop some staging hardware
Re: (Score:3)
Bittorrent Sync [bittorrent.com] is exactly what you're looking for.
I just setup this same thing to backup all my photos. I was bouncing between rsync, samba and other random different programs. I wanted something to sync between numerous different computers and off site.
Bittorrent sync solved all of this. It's almost as if they planned for people using it the way I am. In addition to having Mac and Windows clients. They also have
Re: (Score:2)
Mod parent up. BTSync is a useful, low-maintenance/headache sync implementation.
Wildlife, production run or "other" pics? (Score:2)
eg. If you are counting wildlife, ask the gov/state for more hardware.
Cash might be very tight but gov data storage options should be usable.
Is it OCR on cars? Changes in activity around buildings?
If the "facility" has the need and cash to pay for images to be taken, optical and your work - ask for more cheap, fast storage.
As for the "cloud" and the nature of your work be aware that the US and a few
Egnyte (Score:1)
There's always Egnyte (https://www.egnyte.com/)
They're not very expensive and they offer what they call an "ELC" (enterprise local cloud) or "OLC" (office local cloud). The way it works is you store the files in their datacenter and you can use their elc/olc clients effectively as a caching mechanism that is sync'd with cloud contents. This happens in such a way that anyone in your office/datacenter can access files from a common interface/api without having to saturate your 100meg pipe by fetching the same
Re: (Score:1)
Re: (Score:1)
There's always Egnyte (https://www.egnyte.com/)
They're not very expensive and they offer what they call an "ELC" (enterprise local cloud) or "OLC" (office local cloud). The way it works is you store the files in their datacenter and you can use their elc/olc clients effectively as a caching mechanism that is sync'd with cloud contents. This happens in such a way that anyone in your office/datacenter can access files from a common interface/api without having to saturate your 100meg pipe by fetching the same file multiple times.
This is actually the solution I'm looking at now. Plus, I like the fact they have an API we can hook in to. On a side note: I'm very surprised by the immaturity of the responses from a lot of the slashdot community.
Redesign (Score:2)
why bother? (Score:3)
Re: (Score:2)
White, if it existed. Black. otherwise.
Re: (Score:1)
+1
According to my calcs, you've got bandwidth to spare to complete the transfer overnight. If for some reason you do need to do this during the day, find a competent network engineer to implement QoS on your network so your VoIP/pr0n/WoW doesn't suffer.
Re: (Score:2)
Re: (Score:1)
My first thought was scameras / licence plate cameras at intersections, etc. I hope it's not something malicious like that!
Re: (Score:2)
What produces images so fast and a that depth? Computer animation work done extra cheap?
The optical part is ? too. That does not sound best effort average telco loop cheap. The term analysis but at their convenience?
Re: (Score:2)
From looking over the specs, my guess is that the new customer is pushing the data to them and
it is slowing down the office line while people are there working so they want to be able to allow
their new customer to push it somewhere else and then they can download it at night when their
office isn't using the bandwidth for day to day operations.
Some possibly better solutions in decending order:
1) switch new customer from pushing to allowing you to pull from them. i.e. ask them to cache it somewhere.
2) instal
Are you planning to process the data in the cloud? (Score:2)
Don't use the cloud! (Score:1)
Google Drive, 100GB, $4.99/mo (Score:2)
Since you are just spooling, that should be more than adequate.
(Not sure how someone else calculated 432GB/day, and I am horrified by the suggestion to overnight mail hard drives - way too expensive.)
Get another line. (Score:3)
You're probably going to pay less for a second cable modem line than you will to store that much data in the cloud. Cloud processing is fairly cheap - cloud storage is expensive.
And then you won't have to re-tool anything else in your processes, except maybe adding another route or two. If you're doing that much data processing, the $200/mo for the line shouldn't really be a huge expense on the contract.
If you're looking to scale out this service to lots of companies, then the calculus might be different.
Glacier (Score:1)
Amazon has a low-cost version of S3 called Glacier, the downside of which is slow data retrieval time.
Also, on the extremely unlikely chance you're using Apple, there's a solid tool called Arc which will front-end for Glacier, and add encryption and automation to boot.
Don't bother (Score:2)
It is more expensive than a cloud unless you are really big. Many startups that used to use Amazon's service decided with virtualization it was cheaper to use their own after they needed fiber connections and others to host massive bandwidth for all the boxens on the cloud.
With 1/2 down your speed will be adversely affected. With VSphere is about $7,000 including a CentOS or Windows Server License and Windows Server 2012 with HyperV is the same price. You can host VMs and have data backed up elsewhere for r
Re: (Score:2)
It is more expensive than a cloud unless you are really big.
Or indeed really small, so you would only need a fraction of a server.
Re: (Score:1)
You can get a Lacie server/Network Appliance for $599.
It includes a 5 disk raid and Linux running Samba or if you want $1099 for one with Windows Server Small Business Edition and some SSDs for a few hundred more after that if you need Windows specific stuff and more performance and ram.
Fast 140 megs a second system that can serve as a print server as well. I see big enterprises use them to for small branch offices with 50 people where only a T1 is on the wan. These save network bandwidth too as they cache
5mbytes every 3seconds is only 13.333 mbits/s. (Score:5, Informative)
we're on a business class 100mb cable connection
100mbps = 12mbyte/s (give up 15-20% for the packet overhead, 10megabytes/sec).
Distilling that summary into the data that mattered:
1.5mb image, 3mb file each under 5 megs.
and
images every 3-5 seconds
The files are 5megabytes total.
In a perfect world, they'd transfer in 0.5 seconds.
Leaving 2.5 - 4.5 seconds for the porn.
Let's assume they are the bigger size, 5megabytes, and they transfer in the more frequent number, every 3 seconds.
5MBytes/3s = 1.66667 Mbytes/s = 13.33333 mbits/s.
Why is a facility with a 100mb/s line incapable of handling this?
How did a problem where a 100mb/s line can't handle 13.3333mb/s come to a conclusion of "Fix it with the cloud?"
In any case, if you want to do a cloud setup, just about all of them will handle small 13.3mb/s constant rates and you'll pay for it more than if you figured out why your line isn't keeping up.
Expen (Score:2)
Storing 140 gigabytes a day is going to be expensive with any cloud service; you will essentially be using 4 Terabytes per month in bandwidth; as well as a lot of disk storage --- cloud providers charge dearly for this.
You might be better off getting your local network's connection upgraded. Obviously; this has benefits beyond merely offloading storage.
We're now thinking the customer's workstation will sync the data with the cloud, and we can automate pulling the data during off hours so we won't en
Re: (Score:2)
cloud providers charge dearly for this.
Not necessarily. FranTech's BuyVM [frantech.ca] will rent you a KVM VM with 500GB of space and 5TB of bandwidth for only $15/month. If you need more, you can step up to their 1TB of space with 10TB of bandwidth plan that costs just $30/month. These plans are listed under Las Vegas - KVM Storage in the order form. It looks like they are out of stock right now (you can quickly check their stock here [doesbuyvmhavestock.com],) but I think they restock all of their plans on Mondays. BuyVM has a pretty good reputation and I have been enjoying th
Install a dedicated link (Score:2)
I'd install a dedicated link and just add the cost of the link to the project's expense list.
Sooner or later you're downloading the data, and most customers I've dealt with would have an issue with spooling their data to the cloud in the first place -- it's why they would have contracted a small firm to do the processing in the first place.
Let's face it -- network capacity is just not that expensive nowadays, especially seeing as you sound like you're primarily interested in download speed, which means
11 devs + 1 sysadm (Score:2)
The cloud is nearly never a good idea (Score:2)
Hire a network consultant to fix your broken internet. After that's done have them figure out how you guys can scale. It's probably not a great idea to have to send all this stuff to you office. I am assuming your using GPU's those can be rented and/or bought. You probably want a system that can be distributed fairly well.
The cloud is a buzzword not a product. A coloed 1ru server can hold about 40TB of bulk storage, Most colos will lets you use nearly unlimited inbound traffic (normal ratio is 1 to 10 i
Re: (Score:2)
Except it is not the internet:
"Our facility is incapable of handling such large transfers without effecting internal network performance."
Which should be even easier to fix. He is trying to fix the wrong problem.
Re: (Score:2)
Lol if 13mbs is affecting lan performance you have serious issues. Still using a 10mbs hub?
Re: (Score:2)
Well, I am going by what he said was the problem.
I agree, if loading 100m from the internet breaks the internal network, something is very wrong.
One option is an image processing co. (Score:2)
There are several companies out there who do nothing but handle image processing "in the cloud". They could be used as simple bulk file transfers, or they might help solve the real problem â" dealing with large, uncompressed images.
I know of two off the top of my head:
this is just way too simple (Score:2)
So 12mb/s (max) of transfers will bog down your 100mb/s connection so badly that you just cannot do it??? Uhm, are you sure about that???
Well, OK then. Get another one.
The first thing? Learn English... (Score:2)
The company we're gather images from ... ...without effecting internal network performance.
I mean really... If you can't manage to write a coherent, error-free paragraph written in fairly simple SVO sentences or can't be bothered to proofread an article submission before posting, what makes you think that you could effectively manage a cloud-based infrastructure (or any other kind, for that matter)?
Hell, with your skills just burn the files onto DVD's and toss them in the rubbish bin. It'll work just as we
Re: (Score:1)
EMC Syncplicity (Score:2)
Solving the wrong problem? (Score:3)
I don't understand what the issue is here. What the OP seems to be really asking is how to move the bandwidth requirement to overnight, when no one is using their connection for other business purposes.
If time-shifting the syncing to off-hours is acceptable, why do you not install a server with a beefy hard drive at the client location to do just that?
Have you explored the idea of compressing the data at the client side before sending it your way? Bitmaps often compress very well, especially if you can batch very similar ones together. A script to make a gzipped tar file every 5 minutes might do wonders for your data requirements.
If you're ready to shell out the money for a cloud provider, why not instead shell out the money for a second connection to dedicate to this client?
What does moving the data through a third party in "the cloud" offer over any of (or a combination of) these three approaches?
IaaS clouds will charge for storage & I/O $$ (Score:1)
Remember with a Cloud provider you have to pay to transfer the data IN and to transfer the data OUT.
Have you priced what a faster internet connection would cost you?
Or a 2nd Internet connection just for this video traffic?
Look beyond the Cable MSO's also, what is a FIOS based serv