Ask Slashdot: Is the Gap Between Data Access Speeds Widening Or Narrowing? 92
New submitter DidgetMaster writes: Everyone knows that CPU registers are much faster than level1, level2, and level3 caches. Likewise, those caches are much faster than RAM; and RAM in turn is much faster than disk (even SSD). But the past 30 years have seen tremendous improvements in data access speeds at all these levels. RAM today is much, much faster than RAM 10, 20, or 30 years ago. Disk accesses are also tremendously faster than previously as steady improvements in hard drive technology and the even more impressive gains in flash memory have occurred. Is the 'gap' between the fastest RAM and the fastest disks bigger or smaller now than the gap was 10 or 20 years ago? Are the gaps between all the various levels getting bigger or smaller? Anyone know of a definitive source that tracks these gaps over time?
The gaps are getting smaller (Score:3)
The distance between the "fastest" and "slowest" gets larger and larger, but the gaps are getting smaller because things like SSDs fill them.
Re: (Score:1)
The distance between the "fastest" and "slowest" gets larger and larger, but the gaps are getting smaller because things like SSDs fill them.
Except the bus it (and everything else) is connected to hasn't kept up. Try building a computer to solve real problems, and you'll quickly learn this.
A gap not normally considered (Score:2)
Re: (Score:2)
Re: (Score:2)
Memory busses already transfer more than one octet at a time, in fact, more than the 64-bit architecture size for typical implementations of x64. Having the effective address space for 64-bit words be 61 bits isn't really much of a problem. Who is going to have nearly 2.15GiGiB of memory (attached to one CPU) any time in the next century?
I program the TI C2000 series 32-bit microcontrollers, where the 16-bit word size can be a significant headache when trying to deal with 8-bit IO streams.
So I'd opt to
Can do basic maths (Score:1)
...but not just because someone else is too lazy to do it themselves. Soing the maths would have taken less effort than writing this /. fill-piece.
Does it matter? (Score:3)
Does it matter? Fast CPU, fast RAM, fast disks is like having no speed limits on every race track in the world - but in order to get from track to track you have to go on the interstates or perhaps back country roads (PCI bus, etc). Sure, each component is fast and getting faster, but the way those components connect to each other hasn't changed all that much...
Re: (Score:3, Interesting)
I disagree, the interconnecting paths may not be keeping up as fast but look back at the original ISA bus or before that S100. The PC/AT 16 bit slots more than doubled the speed of the original ISA. Specialized video busses were all the rage not that long ago. PCI has grown to become PCIX and multi lane. For us greybeards, that a serial bus operated faster than a parallel bus remains one of those great mysteries.
Re: Does it matter? (Score:1)
PCIE. Not x. X died a while ago.
Serial outperforming is simple. Multiple serial busses with switching on each end. They're independent but they send packetised data in parallel.
Re: (Score:3)
Yes. It matters. Price also matters. If the price/performance gap is eliminated then major architectural changes are both possible and likely. We're already seeing this with tape vs harddrive where people are just archiving to an external harddrives instead of tape. Think about what would be possible if RAM was cheaper than harddisks and also non-volatile. Then everything could be in ram and even "saving to disk" becomes a thing of the past. Likewise, if SSDs becomes just as fast as ram, then why dif
Re: (Score:2)
wait until they start sticking sad'd on the ram bus
assuming you mean SSDs, this exists already (Score:2)
https://www.sandisk.com/busine... [sandisk.com]
Re: (Score:3)
I had RAM on the 8-bit ISA bus back in the PC-1 days, but back then the CPU was so slow that wasn't a serious impediment. Today, however, we have specialized (and seriously fast) memory buses, and we have massive interconnects between the CPU and the chipset, and we have large numbers of PCI-E lanes. High-speed SSDs are designed to sit right on a useful number of PCI-E lanes. These days a PC actually does have massive bandwidth, in a way it never did before; it was crippled between basically the time PCs go
Re: (Score:1)
Yeah, things like this matter. I was once told by a person who worked for Microsoft that this person was assigned the task of overseeing some research for an upcoming Xbox system. The research involved having prototype units run various pieces of code to see which piece of code ran the fastest. It was rather assumed that the different versions of the code were equally compatible, coming up with the same result. However, automated benchmarking was measuring speed.
See, it doesn't even matter which piece o
Re: (Score:3)
That is so untrue. What are you talking about!?
PCI-express is much faster than the late AGP, PCI and ISA buses, you get over 15GB/s from one GPU to an other one. SATA and SCSI have had a good run, but are being remplaced by PCI-express for high throughput devices. In clusters, Infiniband decreased latency massively compared to ethernet, and gives you bandwidth of over 100Gb/s.
So surely, the interconnect is a factor slower than devices, but that's pretty much always has been the case.
Re: (Score:3)
" The bus is more often the bottleneck than anything connected to it in modern computing systems."
Not even close. Quite often it's the underlying architecture itself causing the bottlenecks.
Example, Intel's latest and greatest Xeons FUCKING SUCK. Why? Because their internal architecture to deliver data across cores is gimped beyond belief. You can run 2 CPU x 4 GPU, 4 CPU x 2 GPU, but you can't do 4 CPU x 4 GPU. Meanwhile, I've got far older AMD systems that run 4 CPU x 4 GPU without a problem.
Yes, it does it matter (Score:1)
For example, if you want to sort objects which does not fit at one level, the sorting may spill over to a next level of much slower memory. This imbalance of speed is relevant regardless of the interconnect speed limits, as a delta as such is the culprit.
There are some ways to improve on that.
From Wikipedia > Sorting_algorithm > Memory_usage_patterns_and_index_sorting
https://en.wikipedia.org/wiki/... [wikipedia.org]
That is one example of a solution to a problem that matters.
Re: (Score:3)
fast disks
I showed your mom my fast disks, but she only had an ISA port.
WTF post, come on kids (Score:1)
This could literally be answered with three google searches. '2015 l2 cache speed' 'ddr4 speed' '2015 ssd speed'
Re:WTF post, come on kids (Score:4, Funny)
Ancient scrolls of dubious provenance hint darkly that DDR4 was not the first inhabitant of the RAM slots we consider so permanent. Debased cultists still sometimes mutter chants mentioning "PC100", or even uncouth syllables such as "korr"...
Re: (Score:1)
Re: (Score:2)
Re: (Score:1)
Re: (Score:2, Interesting)
This /. article, plus one called "Casino lock flooring [...] play casino online" which 404s in Norwegian when you click on it.
As it turns out, a whole bunch of really technical reviews on DDR4 memory, plus 'scope/test gear for testing DDR4 bus access. At least partially, potentially, useful, if you're prepared to wade through a bunch of dense stuff.
A whole bunch of MacBook reviews/unpaid ads, follow
Re: (Score:1)
(Whether /. counts as that is an entirely different matter, though there's usually at least one person who appears to know what they're talking about and will answer the question usefully and honestly without being a smug stuck up prick.)
So, you're ruling yourself out for an answer then?
Re: (Score:3)
Yes, he was ruling himself out as unable to answer. So am I. And it would take a *LOT* more than a Google search to answer. I lean towards agreeing with the people who cite bus speed as the limiting factor, but I'm not sure, and there could certainly be special circumstances where something else was the limit.
I *do* know that it's not an easy question to answer, and that any answer is going to depend for its correctness on a presumed workload. (Some things are CPU bound, and don't even use much RAM. Ot
What are you really asking? (Score:4, Insightful)
I'm not sure what a historic timeline of these ratios (not "differences", please) would gain you.
These ratios can have a big impact on what algorithms and implementations you choose to maximize performance. I suppose if, say, the ratio of RAM to disk speed increased by a factor of 10 over the decade before last, then decreased back to its original ratio in the last decade, it might be worth trawling through some old papers (or old source trees) to revisit lessons learned in the earlier period -- but that seems like a bit of a stretch.
If you're just curious, it shouldn't be too hard to generate timelines of CPU cycle speeds, cache and RAM latencies and bandwidths, disk performance, and so on. But really, each of those has enough factors that a simple "ratio" would probably conceal more than it illuminates.
Re: (Score:1)
Writing a paper (Score:3, Insightful)
for a CS or IT class?
Is the Gap Widening Or Narrowing? (Score:3, Informative)
Re: (Score:2)
Yes, yes it is.
It depends on your diet and how much exercise.... oh, you mean tech, not your waist line....
Re: (Score:2)
It's unlikely (but still possible) that it's staying the same.
The memory wall (Score:1)
is still in effect. Every memory speed increase is perforce dwarfed by speed increases above in the hierarchy. The ongoing stall in increasing CPU clock speeds has obscured the nature of the problem, but we'll never go back to the 1980s when CPUs were not fast enough to cause memory hot spots.
No one has attacked the problem successfully except for architects who have designed split-phase memory transactions for loads and stores along with the capability for many of them to be simultaneously in flight. Th
Re: (Score:2)
My first computer was a ZX81.
This had a 16K RAM pack, with 250ns RAM.
This could read a random byte of RAM in 470ns.
My current system can access a byte of RAM in around 50ns.
That is a factor of ten improvement, when RAM size has increased a million fold.
Caches ammeliorate this.
But they do not eliminate it.
#cachemissesmatter
lolwut? (Score:3)
"Everyone knows that CPU registers are much faster than level1, level2, and level3 caches."
I'd argue that most people don't even know what a CPU register is, never mind what it's faster than.
Re: (Score:3)
Too complicated to answer (Score:3)
Originally, there was CPU registers, and memory. Then there was registers, memory, and disk. Then there was registers, SRAM cache, memory, and disk. Then there was registers, L1 cache (on CPU), L2 cache (on mobo), DRAM, and disk. Then the L2 moved onto the CPU. Then there was L3. Then SSDs were added between RAM and disk. Now some chips have an L4 cache on the CPU package (but not the CPU die).
Oh, and there's a difference between latency and bandwidth. DRAM latency has not significantly improved over time, particularly compared to DRAM bandwidth.
And with multiple cores, some levels are core-specific while others are not. You can even have a bizarre situation where L1 cache is per-core, L2 cache is shared between two cores, and L3 cache is per-CPU (in SMP setups, that means main RAM is the first level shared among all cores).
Re: (Score:3)
And IBM has made external memory controllers with 16MB L4 cache in each of them, called "Centaur".
Re: (Score:2)
Well I've checked and it seems three z13 CPU share 480MB L4, and one CPU ("Storage Controller") can have 480MB to its own.
Three full POWER8 would have 128MB L4 each, or 384MB if you add them up.
You can add up cache figures like that but it seems suprisingly close.
Re: (Score:2)
can't
Objectives are different. (Score:2)
Depends (Score:1)
I think that a lot of good IT folks are looking at where the bottlenecks are within the technologies available and making implementation decisions in a much more detailed way. Right now, and has often been the case, the bus becomes the bottleneck for a lot of applications. Sure there are memory bound, and I/O bound problems that still see the gaps between memory speed and disk read/write speeds as an issue, but there are far more problems that are handicapped due to the system bus speed being hobbled, compa
RAM latency is not getting much faster (Score:5, Informative)
The latency of RAM is improving very slowly, only something like 2x-4x improvement in last 20 years.
Only the bandwidth of the memory is growing faster, and that's just because they have been putting more dram cells in parallel, always doing bigger data transfers and having faster memory bus.
Same is true for hard disk drive speed, the rotation speeds dictates the random access latency and the rotation speed of average hard disk has only gone up from 4200 or 5400 to 7200 rpm in the last 20 years, meaning only 1.7 or 1.33 times improvement in random access latency
Though replacing hard disks with flash-based SSD storage has improved latency by a huge margin.
Re: (Score:2)
And CPU clock-speed hasn't really increased at all in the last 10 years. We are still at just over 3GHz. Of course we have had massive performance improvements in other areas.
Re: (Score:1)
Replacing hard drives with SSDs still leaves another bottleneck. The disks have to connect to the cpu somehow. If they are internal (as in a home pc), then you only get a few disks, but they connect at PCIe speeds. If you need more disks, you go to a SAN. But then you're putting your disks at the end of *network* latencies; there's definitely a wall there. You can't cache your way out of the transmission delays on the SAN... Other solutions are used which essentially move the software and/or data closer to
Gaps are smaller (Score:1)
Disk *seek* times are still very slow (Score:2)
A modern cheap laptop from today is faster than a Convex supercomputer I was using 20 years ago. But in that same time span disk seek times dropped from 15 milliseconds down to about 7.5 milliseconds which is only a factor of two.
Re: (Score:2)
A modern cheap laptop from today is faster than a Convex supercomputer I was using 20 years ago. But in that same time span disk seek times dropped from 15 milliseconds down to about 7.5 milliseconds which is only a factor of two.
Manufacturers were exploring expensive strategies to speed up seek times, like multiple armatures, independent arms, etc. Then hybrid drives and massive arrays became things and there was no more reason to bother trying to improve seeks on spinning rust. We're all just hoping that it will be cost-effective to stop using it sometime soon.
The gaps are still there. (Score:3, Interesting)
20 years ago main memory was 10-14 ns, instruction cycle time was 2-4ns (Cray)
Guess what? it still is.
Memory has grown, it has gotten cheaper.
What HASN'T changed? Access to memory. That is how Cray got its speed - instead of a single port to memory, it used a crossbar switch - 4 ports for each processor. 1 instruction bus, 2 input data busses, and one output bus; even I/O got its own port to memory; all with overlapping address/data cycles.
The effect was that all of main memory worked at the speed of cache, thus the CPU had no need to waste silicon on cache memory - and the entire system ran full speed.
What slows down the current systems? Memory access. Most systems only have a single port to main memory. Some servers and "high performance" desktops have dual ported memory. Yet even dual ported memory access is slow when you have to share it among 4/8 cores... plus I/O (which isn't dual ported). Interrupt latency on PCs is really horrible. Still only 15 IRQs? and have to share them? No direct vectoring? Forced interrupt chain actions? Even the old PDP 11 with ONE interrupt request line allowed direct interrupt vectoring (64 basic vectors) to reduce overhead.
There hasn't been much innovation in architecture in over 20 years.
Re: (Score:2)
If we could fit the entire system on a chip, Then you could speed up. But choice goes out. No choice at all.
Re: (Score:2)
there has been MUCH improvement to computer architecture, but most slashdotters think only desktop PCs and desktop PC-esque server exist.
a mainframe has a thousand or more channels to memory
Re: (Score:2)
And even if you had a trillion channels to main memory, you'd still only improve bandwidth and keep the same latency.
Re: (Score:1)
Right, and bandwidth is more important for most workloads. Gotta move more bits around in one go if you've got (relatively) more latency to deal with in each generation, otherwise the newer chips just wait for i/o faster than the older ones did...
Shed some light (Score:2)
Calculating is easy (Score:4, Informative)
You can look up the specs easy.
Back in the 80286 days, there was not even an L1 cache however the memory and ISA bus ran at CPU speed 8-20Mhz. Hard drive latency was ~65ms.
In the 80486 days L1 cache was introduced and L2 was sometimes available in (very) expensive modules. I remember buying 256kb for the same price as 16MB RAM. The L1 caches ran (if I remember correctly) at CPU speed, 1 cycle. However the bus speed started to slow down compared to the CPU. The VLB ran at CPU bus speed ("local" bus) and was often used for graphics but PCI (an inferior bus) ran at 33MHz so for anything over 33MHz, we started needing dividers. The RAM ran at 80-120ns so it started being slower than the CPU bus. Hard drive speeds were however up to ~30ms.
In the Pentium age memory slowed even farther compared to the CPU bus. Now it took several cycles to access memory, buses ran even slower (still PCI mostly, eventually PCI-X (133MHz?) until PCI-e (serial buses running) came along. Hard drive speeds went up to ~15ms
In modern age, L1 caches have slowed even further requiring 4 cycles for L1 cache and up to 30 for L3 caches. RAM is even slower access with bus speeds about a quarter of a single CPU but sometimes 16 CPU's need to share those lanes. Peripheral bus speeds however have gone up and PCIe 3.0 is now directly integrated into CPU 80486 VLB-style. Hard drives have latencies of 10ms (we have a mechanical issue there) still but even cheap SSD's can go down to ~1-2ms.
Re: (Score:2)
Not only is the underlying physical technology getting better, but the software (aka filesystem) utilizing that hardware is also becoming more efficient. The likes of ZFS and ext4 are far better than predecessors (UFS or ext2/3). No troll-o, but I think NTFS and FATx are static in performance across hardware revisions.
Gah, forgot to log in.