After my past attempt at writing a concurrent Qsort algorithm I had a few more thoughts about how to change the architecture.
Instead of creating new channels to pass to every recursively-called goroutine and then blocking on them until they return, I wanted to create only a single channel for communication and find a way to return from goroutines immediately. The solution I came up with is to send back the integer value of the number of elements in their sorted position when a goroutine returns. In the naive case, this is when the list is 0 or 1 elements long, leaving this fairly simple code.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
|
This still creates too many goroutines and uses too much memory. So taking a page from most quicksort implementations, lets use a selection sort for the last few elements. Empirically I found a value of about 75 to be ok. Our overall architecture doesn’t have to change, just add the selection sort call in the termination condition of our qsort2 function.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
Finally when using GOMAXPROCS=2 I got a better running time for the concurrent version over the non-concurrent Qsort from my last post. This isn’t quite a fair comparison though because I wasn’t using the selectionSort trick in Qsort. Adding it in to Qsort and our previous Cqsort brings all three implementations to about 7s to sort 1 milllion elements. So still no improvements over a sequential implementation despite using both of my cores at 100% in the concurrent cases. Memory and communication overhead still provide too much of a slow down. I’m hoping to get my hands on a computer with 4 cores to see if the concurrent version will see a speedup when doubling the core count again, or if communication overhead is still too high.
Again, all of the code and some benchmarking is in a github repository at https://github.com/azundo/cqsort.git.