I am amazed by the many discussion regarding the existence of any linear and in-place sorting algorithm, and variants, see e.g. is-this-implementation-of-bucket-sort-considered-in-place is-counting-sort-in-place-stable-or-not sorting-in-linear-time-and-in-place can-a-counting-sort-be-considered-in-place-when-the-problem-constrains-the-numbe fastest-in-place-sorting-algorithm-for-epochtime in-place-and-out-place-sorting-meaning fast-stable-almost-in-place-radix-and-merge-sorts sorting-in-place-stable-in-linear-time
I am even more amazed by the lack of a clear and definitive answer, due in part to the fact that "in-place" is loosely defined.
I would therefore like to give it one more try, aiming at being more precise and rigorous, with your help.
Sorry if this is too redundant; if so, please do not hesitate to tell me where.
Problem statement.
Input: $n$ positive integers of values lower than $k$, with $k\le n$.
Output: the $n$ integer in non-decreasing order.
Proposal.
This is the closest to in-place and linear sorting algorithm that I could find. It relies on a variant of counting sort that parses the original array and counts occurrences of integers in-place, within this array. To do so, it swaps the content of cells that it modifies, and encodes cells that contain a counter by a negative integer. Then, it prints the result without storing it.
This is a python implementation, with comments explaining the details:
import random,sys
random.seed()
create input
n = 30
k = 4
l = [random.randrange(0,k) for i in range(n)]
for debug: print original list and its native sort
print(" ".join([str(x) for x in l]))
print(" ".join([str(x) for x in sorted(l)]))
"in-place" counting sort
i = 0
while i<n:
x = l[i] # the value to count
if l[x]<0: # there alreaty is a counter for it
l[x] -= 1 # (negatively) increment it
l[i] = -1 # set current cell to 0, encoded as -1
else: # first occurrence of x
l[i] = l[x] # store the value in l[w] before erasing it
l[x] = -2 # new counter in l[x], init to 1, encoded as -2
while i<n and l[i]<0: # go to next non-counter cell
i += 1
print result
l[i]=-x-1 means there are x occurrences of i
for i in range(k):
for j in range(-l[i]-1):
sys.stdout.write("%d "%i)
sys.stdout.write("\n")
Remarks.
This algorithm seems to be in-place, since it seems to only use a constant amount of memory in addition to its input.
However, it is not really in-place, because:
- It uses negative numbers to encode counter information within the input array, which actually costs $n$ bits.
- Its output is just printed, not stored (not even in the original array) and so it is not available for further computations.
Questions.
A first set of questions is about possible improvements:
- Is it possible to save even more space? For instance, only $k$ bits are really needed, as there are $k$ counters. Anything else?
- Is it possible to write the sorted values in the input array? or may we prove that this is not possible?
- Is it possible to extend it to more general inputs? Clearly, any other integer interval of length at most $k$ is possible. Anything else?
A second set of questions is about possible uses:
- Are there cases where one wants to print the result, without storing it? It seems related to streaming algorithms.
- Does this algorithm help in finding better solutions to other problems, like median or cumulative distribution computations? or removing duplicate values?