For many purposes in Calc, it is pretty easy to define data ranges with sort specifications (sort order, headers, etc.), and to redo a sort with a couple of clicks and/or keystrokes, and of course for many purposes it's better to use a database. But it might also be useful sometimes to have an array function automatically sort a range. I have examples, and others may as well. I think it's probably preferable to the tricks in Sorting and Filtering data with formulas.
I have created a function with "prototype"
Code: Select all
any[][] SortArray(any [][] inRange, any[][] fieldspecs, int HasHeaders, int ByColumns)
inRange (required): a cell range or array to be sorted. Could be an in-line array or an array returned by another function.
fieldspecs (required): 1 to 4 columns giving the sort specifiers. Can be a cell range, in-line array, etc.
The first column of fieldspecs specifies the columns (or rows sorting by columns, left to right). Columns (or rows) are numbered base 0 from left to right (top to bottom). If a field number is <0 or greater than the number of columns (rows) - 1 of inrange, it will be ignored, and if an otherwise valid field number is specified more than once, only the first instance will be used.
The second column specifies sort order for each field: 0 for ascending, 1 for descending. This is optional and defaults to ascending.
The third column specifies whether to sort numeric data after or before text data: 0 for text first, 1 for numbers first. This is also optional and defaults to text first (maybe backward). Note that this doesn't apply to numeric text, which will be treated by the text collation rules.
The fourth column is 0 for case sensitive, 1 for ignore case. Optional and defaults to case sensitive.
HasHeaders (optional): 0 to sort all of inRange, 1 to treat first row or column as field headers. Default 0.
ByColumns: (optional): 0 to sort top to bottom, 1 for left to right. Default 0.
Hopefully this picture will clarify things. E9 contains the formula (using Ctrl+Shift+Enter)
Code: Select all
=SORTARRAY(A9:C29;A1:D3;1)
Code: Select all
=SORTARRAY(A9:C29;I1:L3;1)
Code: Select all
import uno
import unohelper
from com.sun.star.lang import Locale
from com.pysortarray.ArraySort import XSortArray
# SortArray Calc Add-in implementation.
global xCollator #UNO Collator for string comparisons
global NumbersFirst #0 if numbers sorted after text, 1 if numbers before
class SortArrayImpl( unohelper.Base, XSortArray ):
def __init__( self, ctx ):
self.ctx = ctx
def SortArray(self, inRange, fieldspecs, HasHeaders, ByColumns, docProps):
global xCollator
global NumbersFirst
aLoc = docProps.getPropertyValue("CharLocale") #Use locale of document for collation (could) be a problem with multilingual spreadsheets.
xCollator = self.ctx.getServiceManager().createInstance("com.sun.star.i18n.Collator")
fieldspecs = CheckFields(fieldspecs,len(inRange[0])) #Checks fields for validity
l = len(fieldspecs) #Number of fields to sort by
s = inRange #Array to sort
if l == 0:
return s #Just echo back inRange if no valid sort fields provided.
if not ByColumns:
if not HasHeaders:
for i in range(l):
idx = l - i - 1 #Fields are processed in reverse order, taking advantage of sort stability.
xCollator.loadDefaultCollator(aLoc,fieldspecs[idx][3]) #fieldspecs[idx][3] == 0 for case sensitive sort, 1 to ignore case.
NumbersFirst = fieldspecs[idx][2]
#fieldspecs[idx][0] is field number, fieldspecs[idx][1] == 0 for ascending, 1 for descending sort order.
#See disussion in post about key and comparison functions.
s = sorted(s,key=cmp2key(fieldspecs[idx][0],fncmp),reverse = fieldspecs[idx][1])
else:
h = s[0] #Header line
s = s[1:] #inRange excluding header
for i in range(l):
idx = l - i - 1
xCollator.loadDefaultCollator(aLoc,fieldspecs[idx][3])
NumbersFirst = fieldspecs[idx][2]
s = sorted(s,key=cmp2key(fieldspecs[idx][0],fncmp),reverse = fieldspecs[idx][1])
sout = []
sout.append(tuple(h))
s = sout + s #Put header and sorted array back together.
else:
if not HasHeaders:
t = zip(*s) #Transpose range to sort by columns (left to right).
for i in range(l):
idx = l - i - 1
xCollator.loadDefaultCollator(aLoc,fieldspecs[idx][3])
NumbersFirst = fieldspecs[idx][2]
t = sorted(t,key=cmp2key(fieldspecs[idx][0],fncmp),reverse = fieldspecs[idx][1])
s = zip(*t) #Reverse transpose after sort.
else:
t = zip(*s)
h = t[0]
t = t[1:]
for i in range(l):
idx = l - i - 1
xCollator.loadDefaultCollator(aLoc,fieldspecs[idx][3])
NumbersFirst = fieldspecs[idx][2]
t = sorted(t,key=cmp2key(fieldspecs[idx][0],fncmp),reverse = fieldspecs[idx][1])
tout = []
tout.append(tuple(h))
tout = tout + t
s = zip(*tout)
return(tuple(s))
#Comparison function. i is field index, a and b are entire rows.
#Note type checking. Is there a better way?
def fncmp(i,a,b):
global xCollator
global NumbersFirst
atype = type(a[i]).__name__
btype = type(b[i]).__name__
if atype == "unicode" and btype == "unicode":
return xCollator.compareString(a[i],b[i])
elif atype == "float" and btype == "float":
return numcmp(a[i],b[i])
elif atype == "float":
return -1 if NumbersFirst else 1
else:
return 1 if NumbersFirst else -1
#Compare two numerics
def numcmp(a,b):
if a < b:
return -1
elif a > b:
return 1
else:
return 0
#See discussion in post about cmp2key.
def cmp2key(i,mycmp):
'Convert a cmp= function into a key= function'
class K(object):
def __init__(self, obj, *args):
self.obj = obj
def __lt__(self, other):
return mycmp(i,self.obj, other.obj) < 0
def __gt__(self, other):
return mycmp(i,self.obj, other.obj) > 0
def __eq__(self, other):
return mycmp(i,self.obj, other.obj) == 0
def __le__(self, other):
return mycmp(i,self.obj, other.obj) <= 0
def __ge__(self, other):
return mycmp(i,self.obj, other.obj) >= 0
def __ne__(self, other):
return mycmp(i,self.obj, other.obj) != 0
return K
#Checks fieldspecs validity, even though the final result is pretty compact
#I actually spent more time on this than the actual sort function :)
def CheckFields(inArray,maxfields):
outArray = []
l = len(inArray)
l0 = len(inArray[0])
fieldnumbersin = [getfieldnumber(inArray[i][0],maxfields) for i in range(l)]
testfields = [tuple([fieldnumbersin[i]] + [getfieldvalue(inArray[i][j]) if j < l0 else 0 for j in range(1,4)]) for i in range(l)]
for i in range(len(testfields)):
if testfields[i][0] != -1:
outArray.append(tuple(testfields[i]))
return outArray
def getfieldnumber(n,maxfields):
nType = type(n).__name__
if nType == "float":
return int(n) if 0 <= n < maxfields else -1
elif nType == "int":
return n if 0 <= n < maxfields else -1
else:
return -1
def getfieldvalue(n):
nType = type(n).__name__
if nType == "float" or nType == "int":
return 0 if n == 0 else 1
else:
return 0
def createInstance( ctx ):
return SortArrayImpl( ctx )
g_ImplementationHelper = unohelper.ImplementationHelper()
g_ImplementationHelper.addImplementation( \
createInstance,"com.pysortarray.ArraySort.python.SortArrayImpl",
("com.sun.star.sheet.AddIn",),)
Code: Select all
#include <com/sun/star/uno/XInterface.idl>
#include <com/sun/star/beans/XPropertySet.idl>
module com { module pysortarray { module ArraySort {
interface XSortArray
{
sequence< sequence< any > > SortArray( [in] sequence< sequence< any > > inRange, [in] sequence< sequence< any > > fieldspecs, [in] any HasHeaders, [in] any ByColumns, [in] ::com::sun::star::beans::XPropertySet docProps );
};
}; }; };
Finally, I have tested this pretty thoroughly. It should work well on moderately sized arrays, but when I test it on an array of 10,000 rows and 5 columns, it takes a significant fraction of a minute to recalculate, and as I indicated, this might be improved by using the Python collation somehow. Now, I actually first wrote this in c++, which version does the same data in a fraction of a second, but I have that for Windows only. I did the Python version to practice my Python and to come up with something operating system independent, so here it is.