implicity: Quicksort

23.8.05

Quicksort

i may have mentioned that i came up with my own variant of Quicksort. i usually use it when i'm sorting a bunch of stuff into alphabetical order by first letter only, so there are lots of items starting with the same letter which can be in any order in relation to each other, and the frequency of the different letters is kind of randomly distributed.

Quicksort is a very good way of sorting that starts to become much faster than simpler, more 'intuitive' ways above about 20 items. I usually use it when I have at least thirty or so files. it works like this:

choose a file at random, say the one on top

then sort all the others into two piles; the ones that go before it or after it.

put the first pile on the left, and the second pile on the right

then you can add the file you chose at random to either pile, it doesn't matter, although putting it on the smaller pile is a good choice.

now pick up the pile on the left and start again doing exactly the same thing

put the two piles you split that one into back down with the first one on the left, both to the left of the other pile

you'll notice your stacks of files are in order, in the sense that all the files in a stack to the left will come before anything in a stack further to the right, but within each individual pile, nothing will be ordered

keep on doing the same thing to each pile, splitting it into a stack that comes before the top file {or some other randomly chosen one} and a pile that comes after, then putting the two stacks back next to each other in the same place in your row.

eventually, you'll be left with stacks which only contain one file. since the stacks themselves are in order, you have sorted all your files. just combine all your one-element stacks, starting of course from the left [earlier] side.

[OR if you're slightly impatient like me, when you have a pile with only a few files in it, you can sort it 'by hand' by any quick method you choose, since further subdividing say, four files into different piles isn't necessarily the quickest way to do it. then combine your sorted stacks.]

as i mentioned earlier, i often use this to sort things only by first letter, so i simply stop as soon as i have a pile only containing one letter of the alphabet.

the reason this can be so efficient is that the thing you spend most of your time doing is just choosing one of 2 stacks to add the file in your hand to. so if you start by picking mr keown's file at random {lucky you}, all you have to do with all the rest is decide whether they come before or after 'keown' in alphabetical order. unless the name starts with 'k', this is very quick and simple. even if it does start with 'k', it's not too hard to choose the right stack. then, since you {roughly} halve the size of your stacks each time, they get small pretty fast.

the drawback is as follows: mr able's file {or mr zanzibar's} might be on top, or be the one you choose at random some other way. then you're going to spend a lot of time flicking through a bunch of files, and all you're going to get out of it is having 'able' {and possibly 'aardvark'} put to one side, and a massive, unsorted stack of everything which comes after able on the other side.

there are a few ways round this. one solution is to choose 3 files, and sort them. then put the first one on the left, the last one on the right, and use the middle one as your 'splitting point'. this at least guarantees that you won't choose the first or last file to split the rest with.

another way is to try and guess what letter {or number, or whatever} might come in the middle. if you chose keown at random, you were definitely luckier than if you chose able, since the optimum is to split the stack into two that are as equal in size as possible. there's usually a similar number of a-k's and l-m's. however, this method relies on you making a good choice, not just at this early and easy stage, but also later, when you've already got a few smaller stacks, and you have to guess where the middle point of 'roberts' to 'williams' might be. halfway through s? halfway through t?

my own variation goes as follows. take the top two files {or two random ones}. put the earlier one on the left, and the later one on the right. suppose you've chosen F and T. F goes on the left, and T on the right. then as you work through the rest, put anything starting A-F {or 'before F'} on the left, and anything starting T-Z {or 'after T'} on the right. until you get to something that starts with a letter G-S, say I. put this on the smaller stack, say it's the 'after T' one, and change that stack to the 'after I' stack. you now have somewhere to put A-F and I-Z, so continue until you reach a G or a H, and put that on the stack that's now smaller. say A-F is smaller, so you put G there and change it to A-G. eventually your 'before' and 'after' letters will meet up in the middle, and you'll just be dividing into A-H and I-Z or something like that.

with your smaller stacks, you do exactly the same thing. you'll have two piles something like R-T and W-Z, but this doesn't matter. you only need to remember 'before T' and 'after W' and it wont matter whether the 'before T' stack starts with A or some other letter.

this method seems to offer a good way of roughly finding the midpoint, even when you have quite a random distribution of letters, since it adapts itself to the distribution as you go along, adding more letters to the smaller stack. until the two stacks meet up in the middle, the algorithm is still adjusting itself to make them as equal as possible. all it requires is for you to have a good way of comparing the size of your stacks [i do - my thumb and forefinger], and to remember 2 'pointers' to Quicksort's one {'before F' and 'after T' instead of 'before and after K'}. the drawback, similar to above, is of course when your first 2 files are A and B or something like that, so your 2 stacks become A and B-Z. this means it's not very good for files that are already sorted, or almost sorted.

i haven't looked into the subject much but to my knowledge it is original, although i would be surprised if someone hadn't come up with it, or something remarkable similar before.

# posted by weierstrass @ 20:17

Comments: Post a Comment

<< Home

implicity

23.8.05

Quicksort

About Me

life

the internet

& everything

archives