Yes, that's me alright.
I didn't change my mind - I said that in RPL, binary insertion is as good as it gets: few timeconsuming comparisions, and fast stack exchanges.
In Saturn assembly, there's no match for the median-of-three Quicksort I used. It will sort just about anything you can throw at it, save units, and at a speed comparable to the best type-dedicated programs. And it's stable, by the way, using the very technique you described.
I have a much more recent version (only a few months ago) lying around somewhere.. I think this one still has a bug, and won't do binary integers. I'll see if I can upload a version to hpcalc.org.
Cheers, Werner
About 15 years ago I designed a "Modified CombSort Algorithm" by significantly enhancing the CombSort algorithm which itself is an enhancement of the slow Bubble Sort. You can find my article
here.
Namir
I implemented all 3 methods for testing, since the answer wasn't clear after all the discussions.
I tested by sorting the same lists with 2, 4, 6, ... 1998 elements (tested on the PC, I might test on calc later just to get a feeling for the real timings). In the tests, I added significant overhead to the comparison, to make it similar to our use case.
Results were:
Fully random case:
Binary insertion: 2.01 sec
Shell sort: 2.12 sec
Quicksort: 1.83 sec
Reversed list:
Binary insertion: 0.56 sec
Shell sort: 1.67 sec
Quicksort: 2.28 sec
90% sorted, 10% random:
Binary insertion: 1.91 sec
Shell sort: 2.02 sec
Quicksort: 2.17 sec
So in the end, binary sort beats shell sort (point for Werner, good call there).
The fully random test is a best-case for quicksort, but it only outruns binary insertion by 10%. Binary insertion shines on fully reversed lists, and both shell sort and binary sort can beat quicksort with almost sorted or reversed lists.
So in the end, I think binary insertion is the method we'll use, since it's consistently fast, versus a quicksort that can fall down.
For the record:
I tuned each algorithm as follows:
- Shell sort: Nothing that could be improved.
- Binary sort: Once the position is found, memory is moved as a block using memmove(), rather than moving item by item in a loop. This increased the speed by a small percent, since memmove may use optimized CPU instructions (it may use SIMD on x86, and on ARM it uses STM/LDM). Still need to investigate what's the ideal threshold to switch from item-by-item loop to a memmove call due to the overhead of making the call.
- Quicksort: Pivot: Tried using the center element and the median of 3. Didn't see any difference in speed on the cases I was evaluating. In the end I left the median of 3.
Recursion: Non-recursive with a local stack for up to 2^32 elements.
Fall-back: For small lists falls back to plain insertion, I tuned the threshold until I found optimum value, which was between 6 and 8.
EDIT: Sort was made stable for all methods using method described in a post above.
Claudio
In a situation where the bulk of the work is in the comparisons, binary insertion would take constant time, so I don't understand why it would be so much faster for a reversed list?
Werner
(06-21-2014 02:32 PM)Werner Wrote: [ -> ]In a situation where the bulk of the work is in the comparisons, binary insertion would take constant time, so I don't understand why it would be so much faster for a reversed list?
Werner
Well, the way I implemented it is like this:
You are in the middle of sorting, so you have several elements already sorted, and you are evaluating element An:
A0 ... An-1 , An , ... Unsorted ...
The algorithm takes An, and it compares it with A0 (left of the interval), and if it's less, then it inserts the element at the beginning. If not, it compares with An-1, and if it's bigger, then you leave it there. Only if it's less you start doing the bisection.
So I did the left comparison first, this means on a reversed list, it simply does one comparison, and inserts the element at the beginning of the list. So it's reversing the list with N comparisons.
I could've done the An-1 comparison first, then it would've required 2N comparisons to do the reverse-list case.
That should explain why so fast.
On the other hand, a list that's already sorted would needlessly do 2N comparisons, so for the final implementation I reversed it, now it does the An-1 comparison first.
Claudio
(06-20-2014 02:12 PM)Claudio L. Wrote: [ -> ]EDIT: Sort was made stable for all methods using method described in a post above.
What is the cost of making unstable sort stable?
I assume comparisons now cost doubled.
(L[i] < L[j]) → (L[i] < L[j] or (L[i]==L[j] and i<j))
Swapping elements may also cost more.
Index had to carry along, to break ties if needed.
swap(a[i], a[j]) → swap({a[i],i}, {a[j],j})
Another way to look at this, with index attached, all elements are different.
Sort stability issue is now moot.