Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Programming IT Technology

Researching Searching Algorithms? 62

NiN3x asks: "I have recently written a sorting algorithm that can be close to four times as fast as quicksort and never much slower. I was wondering if there are lots of sorting algorithms like this out there. I don't think I could be the only one that has thought of something like this. I am only in my second year of computer science, so I don't know a lot about these things. I have tried searching the net and can't find anything. My algorithm is NOT an inplace sorting algorithm. Can anyone point me to some sources for this type of thing?"
This discussion has been archived. No new comments can be posted.

Researching Searching Algorithms?

Comments Filter:
  • Bible (Score:4, Informative)

    by Anonymous Coward on Tuesday October 22, 2002 @08:56AM (#4503366)
    Knuth, Donald E., The Art of Computer Programming Vol 3.
  • Publish it? (Score:5, Interesting)

    by jukal ( 523582 ) on Tuesday October 22, 2002 @08:56AM (#4503367) Journal
    I have recently written a sorting algorithm that can be close to four times as fast as quicksort and never much slower

    Is there some reason why you cannot publish it here for a review? A basic C reference cannot be very many lines? This way people could give real input for you. I don't believe you have much to lose. If you do, attach your favorite license to it :)

  • by inerte ( 452992 )
    Or provide a link to a file so we can properly answer your question. How can we know that your algorithm works or that haven't been implemented before with such a vague description?
  • by mikehoskins ( 177074 ) on Tuesday October 22, 2002 @09:00AM (#4503395)
    They've used modern algorithms extensively throughout, since Perl is geared mostly to string (scalar) processing.

    If your algorithm is general-purpose and happens to be faster, why not GPL/LGPL/BSD it and contribute to Perl's, Python's, and PHP's sorting and searching code?
  • by ed1park ( 100777 ) <ed1park@hotmail . c om> on Tuesday October 22, 2002 @09:05AM (#4503428)
    As I was just studying algorithms last night. The following is considered one of the most definitive books on algorithms.

    Art of Computer Programming, Volume 3: Sorting and Searching by Donald Knuth

    http://www.amazon.com/exec/obidos/tg/detail/-/02 01 896850/qid=1035291515/sr=8-3/ref=sr_8_3/002-899931 0-6640041?v=glance&n=507846

    And since you are a student, go show it to a CS professor for immediate feedback. My guess will be that you haven't properly analyzed the O(n) performance.
    • If the submitter is basing his algorithm on comparison operations, it's not possible to do better than O(nlogn), so I would guess too that he hasn't performed the O() analysis correctly.
    • by joto ( 134244 ) on Tuesday October 22, 2002 @09:53AM (#4503788)
      Well, he doesn't have to. Because he didn't claim an asymptotically better algorithm. He claimed to have an algorithm better than quicksort (by a factor of 4). This can still mean an O(nlogn) algorithm.

      My guess, is that it really isn't. There are many algorithms that are faster than quicksort for smaller inputs. For less than 100 elements, insertion sort wins (hands down). For less than "really more than you usually need to sort, but not really extreme values", shell-sort usually wins.

      Also, there are lots of algorithms that are faster than O(nlogn) when you have more information than that given to you by &lt:. One example would be bucket-sort.

      Personally, I couldn't care less. The chances of a second-year student inventing a better sorting algorithm that isn't already published somewhere is nada. That doesn't mean that he is not smart, and that his algorithm is stupid. It *is* probably better than quicksort for the test-cases he has tested it on (and maybe for many more). What it means is that the field of sorting is so well-studied that it is very unlikely to get real contributions from somebody without at least two PhD's in mathematics these days.

      • The interesting thing about O(nlogn) versus O(n) or O(whatever-you-want) is that an "operation" is not often the same between algorithms.

        So an O(nlogn) implementation is still faster than an O(n) implementation when the input set is less than 1,000,000 items and the latter implementation requires 6 times the time per operation compares to the former.

        So while it is unlikely that someone without a vast theoretical background will discover a better algorithm for all cases (or even the extreme cases), as you point out; there is a significantly greater liklihood that he could have discovered an algorithm which provides improvement for data sets which are commonly used in certain - even many - fields. Without having multiple doctorates.

        Sorting in general is a well-studied field; but as the application of computers grows, the need for less general sorting grows as well. Many data structures and algorithms are not considered "sorted" because they are partitioned, or implicitly ordered, yet they make use of sorting theory.

        • Yes, thanks for the clarification.

          To clarify even more, insertion sort is O(n^2), but really fast for smaller datasets, and for data already sorted (in that case it is O(n)).

          Shell sort is like an advanced insertion sort, that makes it faster for larger datasets, but also adds a larger overhead. It is also O(n^2) in the worst case, but if I remember correctly, Sedgewick proves a much better average-time for (different variants of) it in his algorithm book (which I unfortunately not have handy).

          In general, people should avoid quicksort. The obvious implementation (recursion) is not particularly efficient, either speed-wise or in terms of space. Selecting a good value to divide the dataset on is hard (if you want to avoid O(n^2) performance). And it's overkill for anything but quite large datasets. In general, insertion-sort or shellsort is much better.

          In general, you should also avoid the C-library routine qsort(). It requires a function pointer for comparison, making it extremely slow. It's also hard to remember how to use it.

      • And at this point, I'd even suggest digging up the Standard C library implementation of quicksort - I remember coming across something somewhere that mentions that the standard implementation is a hybrid quicksort that is actually more optimal than the theoretical / canonical quicksort. I'm thinking it was either a hashed-quicksort or a multi-quicksort, but I could just be making those up - and if so I'm sure someone will correct me. As joto mentions, any sorting is still at best O(n log n), just a lower constant term, and a greater likelyhood of better-case over worst-case.

        The original poster mentions that his algorithm isn't a simple in-place sort - which, based on my best guess, means that he's probably sorting pointers quickly then re-arranging the elements in a single pass. Maybe, maybe not - but still no faster than O(n log n).

        But at this point, I'd start worrying more about scalability, memory usage, etc. - one of the nice things about quicksort is that it is so general, it can quickly be adapted to low-memory or multi-processor implementations. Sure the algorithm is faster - but faster does not necessarily mean better. If my implementation guess above is correct, the algorithm would use significant additional memory, making it unsuitable for sorting simple numbers (where the pointers would dwarf the elements themselves) or for use in low-memory (i.e. embedded) applications.

        The true test of whether the algorithm is innovative is: 1) have a professor look over it, and then 2) get a (respectable) journal to publish it. If it passes the peer review required for publication, then it's an innovative algorithm. If not, it's one of thousands of alternative implementations of the same idea. Clever, maybe even useful, but not spectacular.

      • Fast vs. Optimal (Score:2, Insightful)

        by cmpalmer ( 234347 )
        Where the art and the science of algorithms diverge is in their implementation. Does a bubble sort in hand optimized assembly language on 3GHz machine beat a quicksort written in GW Basic running on an antique 386? What differences are there between data already stored in memory vs. sorting data that is on a remote server? What if two algorithms are both O(n), but one's "operations" involves multiple calls to slow libraries, the other one involves just two integer compares.

        I know the question raised was on the correct analysis of the algorithm in question, but several of these postings have diverged into languages and types of data. This is good and it is the whole point of a well grounded CS education -- knowing all of the strengths and weaknesses of available algorithms so you can make the correct choices based on the size and type of the data and the language (and libraries of choice).
        • Does a bubble sort in hand optimized assembly language on 3GHz machine beat a quicksort written in GW Basic running on an antique 386?

          That would depend on the data-size of course. Since GW-basic only supports 65k data, I would assume yes, even if the one for the fast machine was written in GW-basic...

  • Wrong title? (Score:2, Insightful)

    by Anonymous Coward
    Shouldn't be: Researching Sorting Algorithms?
  • by Eagle7 ( 111475 ) on Tuesday October 22, 2002 @09:06AM (#4503440) Homepage
    I take it your algorithm is redundant?
  • Algorithm resources (Score:5, Informative)

    by Twylite ( 234238 ) <twylite&crypt,co,za> on Tuesday October 22, 2002 @09:25AM (#4503581) Homepage

    The definitive online resource for algorithms is NISTS's Dictionary of Algorithms and Data Structures [nist.gov]. There is a list of algorithm resources [evergreen.edu], and you can also find some free e-books using The Assayer [theassayer.org].

    In print you should be looking for "Introduction to Algorithms, 2nd edition". It is the bible of the field. Other excellent candidates are "Data Structures and Algorithms" ( / in Java / in C).

    Google will also tell you to look here [monash.edu.au], here [uwa.edu.au] and here [uchile.cl].

  • Have you communicated it to him yet? "G. A new, super sorting technique that is a definite improvementover known methods. (Please communicate this to the author at once.)" From TAOCP chapter 5.2 introduction.
  • Apples and Oranges? (Score:5, Informative)

    by Bazzargh ( 39195 ) on Tuesday October 22, 2002 @09:30AM (#4503619)
    Quicksort is an in-place sorting algorithm. If you're not sorting in place it's well known that you can do better. Telling us the sort isn't in-place means your either doing an external sort (which isnt where you'd use quicksort) or you're breaking the other condition - the constant memory limit. I'm guessing you're doing the latter - trading time for space - and its /well known/ that better sorts exist in this case.

    The easy way to show that faster sorts exist is to demonstrate absurd limit case of a tradeoff of space for speed. Consider you have an unlimited amount of memory available for your sort results, and that you are sorting a finite number of keys N for which a mapping M(n) exists to the positive integers. Then, since there can be at most N duplicates of any given key, scanning the data once and placing each key n(i) in memory address N*M(n(i))+i sorts all the data. This is O(N), and pretty much optimal.

    If you know there are no duplicates and the keys fill a known range this can be practical.

    A good place to start reading about other sorting algorithms is here: http://www.nist.gov/dads/HTML/sort.html
    • by mr3038 ( 121693 ) on Tuesday October 22, 2002 @10:25AM (#4504047)
      The easy way to show that faster sorts exist is to demonstrate absurd limit case of a tradeoff of space for speed. Consider you have an unlimited amount of memory available for your sort results, and that you are sorting a finite number of keys N for which a mapping M(n) exists to the positive integers. Then, since there can be at most N duplicates of any given key, scanning the data once and placing each key n(i) in memory address N*M(n(i))+i sorts all the data. This is O(N), and pretty much optimal.

      Yeah, the sorting is O(N) but reading the result is O(N^2) and without knowing the result the sorting itself doesn't do much good. That's because you need N^2 of memory and in worst case you have to do linear scan through the whole memory. If you think about sorting it should be pretty clear that O(n log n) is best you can get if you have no information about the data and it's uniformly distributed. IIRC worst case for heap sort is O(n log n) and worst case for quicksort is O(n^2) so if you don't need in-place sorting you should use for example heap sorting. If you don't have many items to sort just do insertion sort because it has least overhead.

      • If you think about sorting it should be pretty clear that O(n log n) is best you can get if you have no information about the data and it's uniformly distributed.

        Sort of -- the information theory lower bound states that you cannot go faster than Omega(n log n) with a comparison-based sort. Note that "Big-Oh" notation is for stating upper bounds on function growth, not lower ones.

        Your quote above is correct, but should be more clearly stated as "...Theta(n log n)" or "Omega(n log n)" because both imply lower bounds, whereas "Big-Oh" does not.

      • No, the reading is better than M*N (where N is the number of elements and M is the key space.) The reading is O(M+N). It has two parts: going through the top level buckets (M reads). Where the top level bucket is full, decend down the chain until the chain ends. This takes N reads. So reading the array takes O(N+M).

        Allocating/initializing/garbage collecting the array may take O(N*M), depending on the implementation. A different implementation could skip the array and use a list off of the M different buckets, reducing the memory usage to O(N+M) and improving performance similarly.

        Happy Hacking,
        Tupper

    • > The easy way to show that faster sorts exist is to demonstrate absurd limit case of a tradeoff of space for speed. Consider
      > you have an unlimited amount of memory available for your sort results, and that you are sorting a finite number of keys N
      > for which a mapping M(n) exists to the positive integers. Then, since there can be at most N duplicates of any given key,
      > scanning the data once and placing each key n(i) in memory address N*M(n(i))+i sorts all the data. This is O(N), and pretty
      > much optimal.

      This isn't in O(N) unless your mapping meets certain criteria. Though your data will be "sorted", they will be spaced out in memory with large gaps between them -- gaps that you'd need to traverse to actually collect the result. In this case your algorithm is in O( N*(max(M(n(i)))-min(M(n(i)))) ). This is pretty good when the mapping is a small range (say, playing cards), but poor when the mapping could be large (say, strings).

      In the abstract, it's not possible to sort data that only have a binary comparison operator (say, the real numbers) with fewer than n*log n comparisons.
      • >This isn't in O(N) unless your mapping meets certain criteria.

        Er, oh yes it is. If the memory I'm writing to is the lights in the scoreboard at Yankee stadium - or directly into video memory - why on earth would I need to collect the results? Collection is not actually part of the sorting problem.

        Anyway, I did say:
        * that it was absurd
        * that its /practical/ if you have /no duplicates/ and you /fill a range/ with the keys (ie NO GAPS).

        Its not bounded by O(N ln N) because it doesn't use comparisons between keys.

        Read e.g. this paper [nada.kth.se] which mentions this /and other/ algorithms that beat that bound, because they don't use comparisons.

        BTW, if you are the same Tom7 who makes the fonts, I bow humbly before you - they are brilliant :)

        -Baz

        PS as for the Anon comment on storage in memory taking O(ln N), this is true of quicksort as well, but is ignored - complexities for in-place sorts assume that memory access is instant. If you do take that into account you'll find quicksort has an O(N ln N ln N) term but the constant in front of it is tiny, and in practice the assumption of instant memory - on which the quicksort bound is based - is true. Consider that even if the N ln N term was just 100 times larger than the N ln N ln N term you'd need of the order of 2^100 items to sort before it dominated!
        • > Collection is not actually part of the sorting problem.

          Well, you can define the sorting problem however you like, but to me it means taking an array (or better, list) and returning an array (or list) with the sorted result. If I'm allowed to arrange the results anyhow I like, then I can do all sorts of crazy things ... sort in constant time by using some lazy data structure (that actually does the sorting when I ask it to enumerate elements) or "sorting" in zero time by "writing" (discarding) the elements into write-only memory. It's important that all the sorting algorithms have the same interface (ie, meet the same specification), otherwise, how can we compare them? Maybe instead I should have said, "this is not a complete sorting algorithm."

          Yes, I am the fonts guy. Thanks!

    • Once this is done, how do you find the lowest element on the list, second lowest, and so on?
      And, random access to memory should be considered O(log(N)) for a given memory size ...
    • by merdark ( 550117 )
      I believe the technique the parent described is a form of hash sort. You must be able to calculate any giving instance of the mapping function M in O(1) time. The situation of infinite memory described is indeed a special case.

      A more common and real example of this is sorting a specific range of consecutive integers [a,b]. The function M can simply be M(x)=x-a and is considered a simple hash function. Once all the elements have been hased according to M(x) into some array A, the array A will contain the sorted elements. This clearly takes only O(n) in the worst case.

      I beleive it's been proven that any general sort algorithm that does not reley on special conditions on the data or memory has a lower bound of Omega(nlog). Look up decision tree proofs in you favorite algorithm book.

      That's not to say that your algorithm is not usefull though. Often people use things like randomized algorithms as they have better average times. So, talk to your professors. They would be happy to explain the specifics of your algorithm to you.

      Also, if your library has online journals, you can search those for sorting algorithms. And don't be afriad to show the algorithm to people. If your worried about IP, don't be. Chances are the university already has IP rights to it as your a student. That doesn't mean you can market your algorithm if it turns out to be somethings spectacular. Just talk to your professors, they are the experts!

      • "I believe the technique the parent described is a form of hash sort."

        You'll find more techniques [nada.kth.se] like this in the literature if you look for integer sorting [google.com].
      • Please tell me about a good randomized sorting algorithm. I haven't heard of any (well, except for those that assume the existence of a sorting network, or other special hardware...)
        • I'm not sure what you mean by "good". But here's an example. Suppose the space constraints of mergesort are too expensive. Quicksort has worst-case running time of O(n^2). Randomized quicksort can be used to avoid hitting the worst case runtime for *most* runs.

          If you mean better than O(nlogn), then I don't know of one either. I just used random algorithms as an example of algorithms that may not have the best big oh runtime but can still be usefull.

          The poster did not specify any runtime properties, average, worst case, or best case. It's not clear that his algorithm is any better than randomized quicksort. But depending on the properties of the poster's algorithm it could be useful so he should properly analyse it.

          • Thanks, I forgot about that one. Although it isn't very interesting, in the sense that it is still just quicksort, always terminates successfully, and is simple to analyze ;-) I guess I was thinking of probabilistic algorithms, of which I don't know anyone that can be applied to sorting on normal hardware and is actually better than a deterministic one.
  • by photon317 ( 208409 ) on Tuesday October 22, 2002 @09:31AM (#4503625)

    There are lots of algorithms out there that seem to be multiples faster than quicksort in daily use. The problem is when you formally analyze them, you find out their not gauranteed to be faster, and could in fact be horribly slower. If you run it enough times against various wierd data, you'll probably hit one of these slow runs yourself. I can only guess because you haven't mentioned anything about the algorithm or shown any sample code. Read up on how to analyze your code for O() notation and figure out what the real worst case is mathematically.
    • There are lots of algorithms out there that seem to be multiples faster than quicksort in daily use.

      For example: IIRC, A bubble sort appears to be efficient if the data being sorted is already mostly sorted. But on mostly random data, it sucks.

      • For example: IIRC, A bubble sort appears to be efficient if the data being sorted is already mostly sorted. But on mostly random data, it sucks.

        Nope, that would be insertion sort. Bubble sort is O(n^2) both on average and in worst case. It's the archetypical bad sorting algorithm (well, except for the "1: try a random permutation 2: return if result sorted 3: goto 1" algorithm which requires O(n!) time on average, but may not terminate at all :-)

  • Google? (Score:4, Informative)

    by mnordstr ( 472213 ) on Tuesday October 22, 2002 @09:34AM (#4503651) Journal
  • I am only in my second year of computer science, so I don't know a lot about these things.


    Why don't you go and talk to your professor, then? At least someone from the CompSci faculty should be able to help you. If not, then you're probably studying at the wrong university :-).
  • Well, post it. (Score:4, Interesting)

    by Tom7 ( 102298 ) on Tuesday October 22, 2002 @10:34AM (#4504140) Homepage Journal

    Post your algorithm, then!

    There are many implementations of quicksort-like sorts, but when done well, quicksort is pretty damn fast. For instance, if you're measuring against a version that doesn't special case arrays of size 3, 4, and maybe even bigger, then it will run many times slower than a tuned version. You won't be able to beat quicksort (with proper pivot-picking) asymptotically, so there won't be any ground-breaking result here, but it's probably possible to beat its constant factors (which may be practically useful)... and there are probably many people here who'd be willing to look at the algorithm if you posted it.

  • *sigh* (Score:3, Informative)

    by SomeGuyFromCA ( 197979 ) on Tuesday October 22, 2002 @11:09AM (#4504423) Journal
    #1, Google and see if someone else has already worked on the algorithm. "Nothing new under the sun".

    #2, if that gives you no results, post the code or at least pseudo-code so we can comment. Not "I have a new miracle development that could be revolutionary; please comment on it with no clue as to exactly what it is or how it works."

    #3, talk to a CS professor. You should know plenty of them.

    And the obligatory link - http://www.cs.ubc.ca/spider/harrison/Java/sorting- demo.html [cs.ubc.ca].

    • That's a poorly designed animation page.

      I used to have a set of pages up (currently dead) that launched 6 different implementations with a single click... and the animations had 50, 100, and 250 lines. Not isolated sorts

      With 50 points, you think quick sort is faster, but think the simplicity of some of the other sorts may make them preferable.

      But with just 250 points, you have no doubts about the relative performance of the various algorithms.
  • amazing (Score:5, Funny)

    by larry bagina ( 561269 ) on Tuesday October 22, 2002 @11:27AM (#4504597) Journal
    What's the deal with Ask Slashdot lately?

    "I'm a 2nd year cs student, and I discovered an unknown searching algorithm. I won't provide any details, though"

    "I'm not that good at math, but I just invented a new unbreakable multipad encryption. I won't provide any details, though"

    "During my lunch break, I used a couple coat hangers, some aluminum foil, a 3 hole punch, a spare xeon-argon laser, and 32oz of diet pepsi to create my own home-brewed cold fusion reactor."

    Can you identify the PhysicsGenius troll from the Ask Slashdot question?

    And people complain about what gets through the US Patent Office!
    • Yeah, I have to agree that these Ask Slashdot questions are getting worse and worse. In a comment I made to a previous Ask Slashdot [slashdot.org], I suggested that the forum should be reserved for questions that stimulate a large amount of discussion on issues that do not have any obvious answers. I really don't understand what criteria the slashdot editors use to determine whether to post an Ask Slashdot submission or not. Every so often there is a really good Ask Slashdot question so I don't want to filter this out from my preferences. But part of me gets annoyed with people posting crap Ask Slashdot questions even if I have the sense not to read them.

      I've seen journalists and students asking questions here along the lines of "please do my investigation for me" and this gets accepted. Then we have these people who think they've developed some new ingenious algorithm but haven't bothered to have ANYONE look at their work. I realize my post is starting to ramble at this point so I think I'll end with one request that I pray the slashdot editors will read and consider: please tighten up the criteria for what gets accepted as an Ask Slashdot question. So many of these questions are simply the result of laziness on the part of the submitter. My earlier post (that I linked to above) is one suggestion for what Ask Slashdot questions should be like. I welcome other people's ideas as well.

      GMD

    • So *that's* what I've been doing wrong all these years!
    • HEY that is EVIL PhysicsGenius Troll to you buddy!

      smeat!
    • Dear Slashdot,

      I just finished Kindergarten, where we learned our ABC's, and I've invented an alphabet that can be read 7 times as fast, with only one third the effort. Why didn't some PhD come up with this before? Oh well. I don't remember what my question was, but just post this, ok?
      • I just finished Kindergarten, where we learned our ABC's, and I've invented an alphabet that can be read 7 times as fast, with only one third the effort. Why didn't some PhD come up with this before?

        There are alphabets that are better than Latin in some respect. Take Tengwar [geocities.com] for instance. The script is designed along sound phonological principles (no pun intended): voiced consonants, fricatives, and nasals have a predictable change in shape from the basic voiceless stops (p, t, c, q). It's been adapted to at least English, Sindarin, Quenya [geocities.com], Polish [elvish.org], Lojban [tuxedo.org], Esperanto [tuxedo.org], and Toki Pona [angelfire.com]. On the other hand, some people have expressed increased dyslexia due to use of Tengwar [zompist.com].

  • I have tried searching the net and can't find anything. It appears the new algoritm has been sufficiently tested.
  • If you want to prove that you've had the algorithm longer, discovered it, etc.. print it out and have it notorized. You can go to a notary for $5-10 (US).

    First, as others said.. profile it. What is the asymptotic upper bound (O-notation)? Read "Introduction to Algorithms",written by Cormen, Leiserson, and Rivest. Dispite the name, reading that book is not a simple task ;)

    Oh, and go to one of your professors for verification.
  • I usually use ORDER BY like in this example:

    SELECT * FROM Articles ORDER BY ArticleId

    It works every time. Don't need any log() or sin() or any of that fancy stuff.
  • you already said it is not in-place

    another catagory I know is:

    Is it stable [everything2.com] or un-stable?

  • If you've come up with a truly asymptotically better algorithm, then why don't you talk with someone at you CS department? I'm sure someone would like to take a quick look, at least it shows that their students actually want to learn and do study/think/research on their own.

    Have you analyzed your algorithm yourself? Is it asymptotically equivalent to something like quicksort, but it just has a tighter inner-loop? Does it have a better average running time on random input? Does it work always (or is it flawed/probabilistic)? I think those are the questions you need to have answered.

One man's constant is another man's variable. -- A.J. Perlis

Working...