[Python] Array Sort add-in function.

Shared Libraries
Forum rules
For sharing working examples of macros / scripts. These can be in any script language supported by OpenOffice.org [Basic, Python, Netbean] or as source code files in Java or C# even - but requires the actual source code listing. This section is not for asking questions about writing your own macros.
Post Reply
User avatar
Charlie Young
Volunteer
Posts: 1559
Joined: Fri May 14, 2010 1:07 am

[Python] Array Sort add-in function.

Post by Charlie Young »

There has been some desire for something like this, so here goes.

For many purposes in Calc, it is pretty easy to define data ranges with sort specifications (sort order, headers, etc.), and to redo a sort with a couple of clicks and/or keystrokes, and of course for many purposes it's better to use a database. But it might also be useful sometimes to have an array function automatically sort a range. I have examples, and others may as well. I think it's probably preferable to the tricks in Sorting and Filtering data with formulas.

I have created a function with "prototype"

Code: Select all

any[][] SortArray(any [][] inRange, any[][] fieldspecs, int HasHeaders, int ByColumns)
The arguments:

inRange (required): a cell range or array to be sorted. Could be an in-line array or an array returned by another function.

fieldspecs (required): 1 to 4 columns giving the sort specifiers. Can be a cell range, in-line array, etc.

The first column of fieldspecs specifies the columns (or rows sorting by columns, left to right). Columns (or rows) are numbered base 0 from left to right (top to bottom). If a field number is <0 or greater than the number of columns (rows) - 1 of inrange, it will be ignored, and if an otherwise valid field number is specified more than once, only the first instance will be used.

The second column specifies sort order for each field: 0 for ascending, 1 for descending. This is optional and defaults to ascending.

The third column specifies whether to sort numeric data after or before text data: 0 for text first, 1 for numbers first. This is also optional and defaults to text first (maybe backward). Note that this doesn't apply to numeric text, which will be treated by the text collation rules.

The fourth column is 0 for case sensitive, 1 for ignore case. Optional and defaults to case sensitive.

HasHeaders (optional): 0 to sort all of inRange, 1 to treat first row or column as field headers. Default 0.

ByColumns: (optional): 0 to sort top to bottom, 1 for left to right. Default 0.

Hopefully this picture will clarify things. E9 contains the formula (using Ctrl+Shift+Enter)

Code: Select all

=SORTARRAY(A9:C29;A1:D3;1)
and I9 is

Code: Select all

=SORTARRAY(A9:C29;I1:L3;1)
sortarray.jpg
When perusing the following code, it might be handy to reference the Python Sorting HowTo. Notice that Python is phasing out the use of comparison functions in favor of a key function, but since I wanted something that uses the OpenOffice collator, I didn't see any alternative to using a comparison function, and so I wrote my own cmp2key function. The cmp_to_key function wouldn't work for me since I needed a function specifying a field index as well as two items to compare. The default Python sort order is straight byte order, which gives some undesirable results, hence xCollator. Python does have its own collation mechanism which I haven't tried, and I suspect it's probably equivalent to the OO version.and I also suspect it might be faster, but see comments about speed below.

Code: Select all

import uno
import unohelper
from com.sun.star.lang import Locale
from com.pysortarray.ArraySort import XSortArray

# SortArray Calc Add-in implementation.

global xCollator     #UNO  Collator for string comparisons
global NumbersFirst  #0 if numbers sorted after text, 1 if numbers before

class SortArrayImpl( unohelper.Base, XSortArray ):
    def __init__( self, ctx ):
        self.ctx = ctx
		
    def SortArray(self, inRange, fieldspecs, HasHeaders, ByColumns, docProps):
        global xCollator
        global NumbersFirst
        aLoc = docProps.getPropertyValue("CharLocale")  #Use locale of document for collation (could) be a problem with multilingual spreadsheets.
        xCollator = self.ctx.getServiceManager().createInstance("com.sun.star.i18n.Collator")
        fieldspecs = CheckFields(fieldspecs,len(inRange[0]))  #Checks fields for validity
        l = len(fieldspecs)  #Number of fields to sort by
        s = inRange #Array to sort
        if l == 0:
            return s  #Just echo back inRange if no valid sort fields provided.
        if not ByColumns:
            if not HasHeaders:
                for i in range(l):
                    idx = l - i - 1  #Fields are processed in reverse order, taking advantage of sort stability.
                    xCollator.loadDefaultCollator(aLoc,fieldspecs[idx][3]) #fieldspecs[idx][3] == 0 for case sensitive sort, 1 to ignore case.
                    NumbersFirst = fieldspecs[idx][2]
                    #fieldspecs[idx][0] is field number, fieldspecs[idx][1] == 0 for ascending, 1 for descending sort order.
                    #See disussion in post about key and comparison functions.
                    s = sorted(s,key=cmp2key(fieldspecs[idx][0],fncmp),reverse = fieldspecs[idx][1])
            else:
                h = s[0]  #Header line
                s = s[1:] #inRange excluding header
                for i in range(l):
                    idx = l - i - 1
                    xCollator.loadDefaultCollator(aLoc,fieldspecs[idx][3])
                    NumbersFirst = fieldspecs[idx][2]
                    s = sorted(s,key=cmp2key(fieldspecs[idx][0],fncmp),reverse = fieldspecs[idx][1])
                sout = []
                sout.append(tuple(h))
                s = sout + s #Put header and sorted array back together.
        else:
           if not HasHeaders:
                t = zip(*s) #Transpose range to sort by columns (left to right).
                for i in range(l):
                    idx = l - i - 1
                    xCollator.loadDefaultCollator(aLoc,fieldspecs[idx][3])
                    NumbersFirst = fieldspecs[idx][2]
                    t = sorted(t,key=cmp2key(fieldspecs[idx][0],fncmp),reverse = fieldspecs[idx][1])
                s = zip(*t) #Reverse transpose after sort.
           else:
                t = zip(*s)
                h = t[0]
                t = t[1:]
                for i in range(l):
                    idx = l - i - 1
                    xCollator.loadDefaultCollator(aLoc,fieldspecs[idx][3])
                    NumbersFirst = fieldspecs[idx][2]
                    t = sorted(t,key=cmp2key(fieldspecs[idx][0],fncmp),reverse = fieldspecs[idx][1])
                tout = []
                tout.append(tuple(h))
                tout = tout + t
                s = zip(*tout)
        return(tuple(s))			

#Comparison function.  i is field index, a and b are entire rows.
#Note type checking.  Is there a better way?
def fncmp(i,a,b):
    global xCollator
    global NumbersFirst
    atype = type(a[i]).__name__
    btype = type(b[i]).__name__
    if atype == "unicode" and btype == "unicode": 
        return xCollator.compareString(a[i],b[i])
    elif atype == "float" and btype == "float":
        return numcmp(a[i],b[i])
    elif atype == "float":
        return -1 if NumbersFirst else 1
    else:
	    return 1 if NumbersFirst else -1
        
#Compare two numerics
def numcmp(a,b):
    if a < b:
        return -1
    elif a > b:
        return 1
    else:
        return 0
#See discussion in post about cmp2key.
def cmp2key(i,mycmp):
    'Convert a cmp= function into a key= function'
    class K(object):
        def __init__(self, obj, *args):
            self.obj = obj
        def __lt__(self, other):
            return mycmp(i,self.obj, other.obj) < 0
        def __gt__(self, other):
            return mycmp(i,self.obj, other.obj) > 0
        def __eq__(self, other):
            return mycmp(i,self.obj, other.obj) == 0
        def __le__(self, other):
            return mycmp(i,self.obj, other.obj) <= 0
        def __ge__(self, other):
            return mycmp(i,self.obj, other.obj) >= 0
        def __ne__(self, other):
            return mycmp(i,self.obj, other.obj) != 0
    return K

#Checks fieldspecs validity, even though the final result is pretty compact
#I actually spent more time on this than the actual sort function :)
def CheckFields(inArray,maxfields):
    outArray = []
    l = len(inArray)
    l0 = len(inArray[0])
    fieldnumbersin = [getfieldnumber(inArray[i][0],maxfields) for i in range(l)]
        
    testfields = [tuple([fieldnumbersin[i]] + [getfieldvalue(inArray[i][j]) if j < l0 else 0 for j in range(1,4)]) for i in range(l)]
    for i in range(len(testfields)):
        if testfields[i][0] != -1:
            outArray.append(tuple(testfields[i]))
			
    return outArray
	
def getfieldnumber(n,maxfields):
    nType = type(n).__name__
    if nType == "float":
        return int(n) if 0 <= n < maxfields else -1
    elif nType == "int":
        return n if 0 <= n < maxfields else -1
    else:
        return -1 

def getfieldvalue(n):
    nType = type(n).__name__
    if nType == "float" or nType == "int":
        return 0 if n == 0 else 1
    else:
        return 0 

def createInstance( ctx ):
    return SortArrayImpl( ctx )

g_ImplementationHelper = unohelper.ImplementationHelper()
g_ImplementationHelper.addImplementation( \
	createInstance,"com.pysortarray.ArraySort.python.SortArrayImpl",
		("com.sun.star.sheet.AddIn",),)

Here is the idl:

Code: Select all

#include <com/sun/star/uno/XInterface.idl>
#include <com/sun/star/beans/XPropertySet.idl>

module com { module pysortarray { module ArraySort {

    interface XSortArray
    {
       sequence< sequence< any > > SortArray( [in] sequence< sequence< any > > inRange, [in] sequence< sequence< any > > fieldspecs, [in] any HasHeaders, [in] any ByColumns, [in] ::com::sun::star::beans::XPropertySet docProps );
    };

}; }; };

The CalcAddin.xcu, description.xml, and manifest.xml are in the attached .zip file, which should install through the Extension Manager if renamed to .oxt.

Finally, I have tested this pretty thoroughly. It should work well on moderately sized arrays, but when I test it on an array of 10,000 rows and 5 columns, it takes a significant fraction of a minute to recalculate, and as I indicated, this might be improved by using the Python collation somehow. Now, I actually first wrote this in c++, which version does the same data in a fraction of a second, but I have that for Windows only. I did the Python version to practice my Python and to come up with something operating system independent, so here it is.
Attachments
XSortArray.zip
Zipped extension.
(4.05 KiB) Downloaded 970 times
Apache OpenOffice 4.1.1
Windows XP
User avatar
Charlie Young
Volunteer
Posts: 1559
Joined: Fri May 14, 2010 1:07 am

Re: [Python] Array Sort add-in function.

Post by Charlie Young »

Eight downloads and no complaints or suggestions, so someone is testing this or actually using it.

My major thought for possible improvement was to try the Python collation, but a bit of googling suggests that it may not be so hot, and using the native AOO XCollator really is the way to go.

I did find that I had left a stray optional argument from the c++ in the CalcAddin.xcu. I'm a bit surprised this didn't cause an installation problem. Probably didn't because it was on the end.

Here in the new version I'm posting I have fixed that.

I changed the name of the function from SORTARRAY to PYSORTED, to distinguish it from my c++ Windows version, which I call SORTED. The only change in the function's behavior is that I reversed the sense of the Numbers First field option (column 3 of fieldspecs) so that the default 0 puts numbers first, which I think is more natural.

I also tightened up the code a bit. Since the sorting loop occurred 4 times, I put in in a separate routine dosort.

These changes are rather trivial all-in-all, but I recommend uninstalling the SORTARRAY version and installing the PYSORT version attached here. The only needed changes in an existing spreadsheet are then to change the function name and the third column of fieldspecs,if that is actually significant in any given case.

For the record, here is the revised code, with one or two changes to variable names, and the revision with dosort.

Code: Select all

import uno
import unohelper
from com.sun.star.lang import Locale
from com.pysorted import XPySorted

# SortArray Calc Add-in implementation.

global xCollator     #UNO  Collator for string comparisons
global NumbersFirst  #0 if numbers sorted before text, 1 if numbers after

class PySortedImpl( unohelper.Base, XPySorted ):
    def __init__( self, ctx ):
        self.ctx = ctx
		
    def PySorted(self, inRange, fieldspecs, HasHeaders, ByColumns, docProps):
        global xCollator
        global NumbersFirst
        aLoc = docProps.getPropertyValue("CharLocale")  #Use locale of document for collation. This could be a problem with multilingual spreadsheets.
        xCollator = self.ctx.getServiceManager().createInstance("com.sun.star.i18n.Collator")
        fieldspecs = CheckFields(fieldspecs,len(inRange[0]))  #Checks fields for validity
        nfields = len(fieldspecs)  #Number of fields to sort by
        s = inRange #Array to sort
        if nfields == 0:
            return s  #Just echo back inRange if no valid sort fields provided.
        if not ByColumns:
            if not HasHeaders:
                s = dosort(s,fieldspecs,nfields,aLoc)          
            else:
                h = s[0]  #Header line
                s = s[1:] #inRange excluding header
                s = dosort(s,fieldspecs,nfields,aLoc)
                sout = []
                sout.append(tuple(h))
                s = sout + s #Put header and sorted array back together.
        else:
           if not HasHeaders:
                t = zip(*s) #Transpose range to sort by columns (left to right).
                t = dosort(t,fieldspecs,nfields,aLoc)                
                s = zip(*t) #Reverse transpose after sort.
           else:
                t = zip(*s)
                h = t[0]
                t = t[1:]
                t = dosort(t,fieldspecs,nfields,aLoc)
                tout = []
                tout.append(tuple(h))
                tout = tout + t
                s = zip(*tout)
        return(tuple(s))			

def dosort(s,specs,nfields,aLoc):
    global xCollator
    global NumbersFirst
    for i in reversed(range(nfields)):
        xCollator.loadDefaultCollator(aLoc,specs[i][3])
        NumbersFirst = specs[i][2]
        s = sorted(s,key=cmp2key(specs[i][0],fncmp),reverse = specs[i][1])
    return s

#Comparison function.  i is field index, a and b are entire rows.
#Note type checking.  Is there a better way?
def fncmp(i,a,b):
    global xCollator
    global NumbersFirst
    atype = type(a[i]).__name__
    btype = type(b[i]).__name__
    if atype == "unicode" and btype == "unicode": 
        return xCollator.compareString(a[i],b[i])
    elif atype == "float" and btype == "float":
        return numcmp(a[i],b[i])
    elif atype == "float":
        return 1 if NumbersFirst else -1
    else:
	    return -1 if NumbersFirst else 1
        
#Compare two numerics
def numcmp(a,b):
    if a < b:
        return -1
    elif a > b:
        return 1
    else:
        return 0
#See discussion in post about cmp2key.
def cmp2key(i,mycmp):
    #Convert a cmp= function into a key= function
    class K(object):
        def __init__(self, obj, *args):
            self.obj = obj
        def __lt__(self, other):
            return mycmp(i,self.obj, other.obj) < 0
        def __gt__(self, other):
            return mycmp(i,self.obj, other.obj) > 0
        def __eq__(self, other):
            return mycmp(i,self.obj, other.obj) == 0
        def __le__(self, other):
            return mycmp(i,self.obj, other.obj) <= 0
        def __ge__(self, other):
            return mycmp(i,self.obj, other.obj) >= 0
        def __ne__(self, other):
            return mycmp(i,self.obj, other.obj) != 0
    return K

#Checks fieldspecs validity, even though the final result is pretty compact
#I actually spent more time on this than the actual sort function :)
def CheckFields(inArray,maxfields):
    outArray = []
    l = len(inArray)
    l0 = len(inArray[0])
    fieldnumbersin = [getfieldnumber(inArray[i][0],maxfields) for i in range(l)]
        
    testfields = [tuple([fieldnumbersin[i]] + [getfieldvalue(inArray[i][j]) if j < l0 else 0 for j in range(1,4)]) for i in range(l)]
    for i in range(len(testfields)):
        if testfields[i][0] != -1:
            outArray.append(tuple(testfields[i]))
			
    return outArray
	
def getfieldnumber(n,maxfields):
    nType = type(n).__name__
    if nType == "float":
        return int(n) if 0 <= n < maxfields else -1
    elif nType == "int":
        return n if 0 <= n < maxfields else -1
    else:
        return -1 

def getfieldvalue(n):
    nType = type(n).__name__
    if nType == "float" or nType == "int":
        return 0 if n == 0 else 1
    else:
        return 0 

def createInstance( ctx ):
    return PySortedImpl( ctx )

g_ImplementationHelper = unohelper.ImplementationHelper()
g_ImplementationHelper.addImplementation( \
	createInstance,"com.pysorted.python.PySortedImpl",
		("com.sun.star.sheet.AddIn",),)

Again, just change the extension from .zip to .oxt and install with Extension Manager or unopkg.
Attachments
XPySorted.zip
From SORTARRAY to PYSORTED
(3.76 KiB) Downloaded 916 times
Apache OpenOffice 4.1.1
Windows XP
User avatar
karolus
Volunteer
Posts: 1158
Joined: Sat Jul 02, 2011 9:47 am

Re: [Python] Array Sort add-in function.

Post by karolus »

Hello Charlie

First many Thanks to you - because this is the first working Python-addin I've seen, which returns an Array - very nice!
Please allow me to do some changes on the Python-Part:( for now without using xCollator nor locale-specific Comparisoms ):

Code: Select all

import uno
import unohelper
from com.sun.star.lang import Locale
from com.pysorted import XPySorted

# SortArray Calc Add-in implementation.



class PySortedImpl( unohelper.Base, XPySorted ):
    def __init__( self, ctx ):
        self.ctx = ctx

    default = ((0,0,0,0),)

    def PySorted(self, inRange, fieldspecs=default, HasHeaders=False , ByRows=False, docProps=None):
        inRange = list(inRange)
        out = []
        if ByRows:
            inRange = zip(*inRange)
        if HasHeaders:
            out.append(inRange[0])
            inRange = inRange[1:]


        sortrange = main_sort(inRange,fieldspecs)
        out.extend(sortrange)

        if ByRows:
            return tuple(zip(*out))

        return tuple(out)






def texttofront(rrange,field):
    i = 0
    while type(rrange[i][field])==float:
        i += 1
    return rrange[i:] + rrange[:i]


def main_sort( rrange, args):
    for arg in args[::-1]:
        try:
            field, rev, textfirst, ignore_case = map(int, arg )
            rrange.sort(key=lambda e: e[field].lower()
                    if ignore_case and type(e[field])==unicode
                    else e[field], reverse=rev)
            if textfirst:
                rrange = texttofront(rrange, field)
        except:
            pass
    return rrange


def createInstance( ctx ):
    return PySortedImpl( ctx )

g_ImplementationHelper = unohelper.ImplementationHelper()
g_ImplementationHelper.addImplementation(
	createInstance,"com.pysorted.python.PySortedImpl",
		("com.sun.star.sheet.AddIn",),)
It needs nothing to change outside 'PySorted.py' except Versionnumber in 'description.xml'

Karolus
AOO4, Libreoffice 6.1 on Rasbian OS (on ARM)
Libreoffice 7.4 on Debian 12 (Bookworm) (on RaspberryPI4)
Libreoffice 7.6 flatpak on Debian 12 (Bookworm) (on RaspberryPI4)
User avatar
Charlie Young
Volunteer
Posts: 1559
Joined: Fri May 14, 2010 1:07 am

Re: [Python] Array Sort add-in function.

Post by Charlie Young »

karolus wrote:Hello Charlie

First many Thanks to you - because this is the first working Python-addin I've seen, which returns an Array - very nice!
Please allow me to do some changes on the Python-Part:( for now without using xCollator nor locale-specific Comparisoms ):

Code: Select all

import uno
import unohelper
from com.sun.star.lang import Locale
from com.pysorted import XPySorted

# SortArray Calc Add-in implementation.



class PySortedImpl( unohelper.Base, XPySorted ):
    def __init__( self, ctx ):
        self.ctx = ctx

    default = ((0,0,0,0),)

    def PySorted(self, inRange, fieldspecs=default, HasHeaders=False , ByRows=False, docProps=None):
        inRange = list(inRange)
        out = []
        if ByRows:
            inRange = zip(*inRange)
        if HasHeaders:
            out.append(inRange[0])
            inRange = inRange[1:]


        sortrange = main_sort(inRange,fieldspecs)
        out.extend(sortrange)

        if ByRows:
            return tuple(zip(*out))

        return tuple(out)






def texttofront(rrange,field):
    i = 0
    while type(rrange[i][field])==float:
        i += 1
    return rrange[i:] + rrange[:i]


def main_sort( rrange, args):
    for arg in args[::-1]:
        try:
            field, rev, textfirst, ignore_case = map(int, arg )
            rrange.sort(key=lambda e: e[field].lower()
                    if ignore_case and type(e[field])==unicode
                    else e[field], reverse=rev)
            if textfirst:
                rrange = texttofront(rrange, field)
        except:
            pass
    return rrange


def createInstance( ctx ):
    return PySortedImpl( ctx )

g_ImplementationHelper = unohelper.ImplementationHelper()
g_ImplementationHelper.addImplementation(
	createInstance,"com.pysorted.python.PySortedImpl",
		("com.sun.star.sheet.AddIn",),)
It needs nothing to change outside 'PySorted.py' except Versionnumber in 'description.xml'

Karolus
Hi Karolus. I thought you'd probably have some ideas. Give me a little time to look this over though! Ill make a new function called KAROSORT or something and run it in parallel to see what's going on.
Apache OpenOffice 4.1.1
Windows XP
User avatar
Charlie Young
Volunteer
Posts: 1559
Joined: Fri May 14, 2010 1:07 am

Re: [Python] Array Sort add-in function.

Post by Charlie Young »

I think I'm ready. I did make a KAROSORT, and as expected, Karolus' Python improvements are helpful and instructive.

This thing really does need a collator though, or else, for example, "Z" comes before "a." It is blazingly fast using the default sort order, and I am still thinking about looking at Python's collation, but for now I'll stick with XCollator since that assures that the sort order matches Data > Sort.

I have made a hybrid version incorporating most of Karolus' enhancements, and I will discuss below the places where I didn't see how to use them just as given.

I took the hints and changed NumbersFirst to TextFirst, and ByColumns to ByRows. Though we could quibble about the latter, it isn't that important.

I got rid of the ugly global variables xCollator and TextFirst (old NumbersFirst), by passing them as arguments to the comparison function, which now looks like this, including cmp2key, which we still need here.

Code: Select all

def fncmp(i,a,b,xCollator,TextFirst):
    if type(a[i]) == unicode and type(b[i]) == unicode: 
        return xCollator.compareString(a[i],b[i])
    elif type(a[i]) == float and type(b[i]) == float:
        return numcmp(a[i],b[i])
    elif type(a[i]) == float:
        return 1 if TextFirst else -1
    else:
	    return -1 if TextFirst else 1
        
#Compare two numerics
def numcmp(a,b):
    if a < b:
        return -1
    elif a > b:
        return 1
    else:
        return 0

def cmp2key(i,mycmp,xCollator,TextFirst):
    #Convert a cmp= function into a key= function
    class K(object):
        def __init__(self, obj, *args):
            self.obj = obj
        def __lt__(self, other):
            return mycmp(i,self.obj, other.obj, xCollator, TextFirst) < 0
        def __gt__(self, other):
            return mycmp(i,self.obj, other.obj, xCollator, TextFirst) > 0
        def __eq__(self, other):
            return mycmp(i,self.obj, other.obj, xCollator, TextFirst) == 0
        def __le__(self, other):
            return mycmp(i,self.obj, other.obj, xCollator, TextFirst) <= 0
        def __ge__(self, other):
            return mycmp(i,self.obj, other.obj, xCollator, TextFirst) >= 0
        def __ne__(self, other):
            return mycmp(i,self.obj, other.obj, xCollator, TextFirst) != 0
    return K
Since the comparison functions handles the TextFirst option, I'm not using Karolus' texttofront,, even though I think it's way cool (and it might even be handy later).

Taking another hint, I changed dosort to main_sort, and I'm using array.sort instead of sorted since the former is said to be more efficient if the original array isn't needed. I'm using Karolus' for loop technique, but I'm handling the fieldspecs (specs) a bit differently, as I will discuss further below.

Code: Select all

def main_sort(rrange,specs,xCollator,aLoc):
    for spec in specs[::-1]:
        field, rev, TextFirst, ignore_case = map(int, spec)
        xCollator.loadDefaultCollator(aLoc,ignore_case)
        rrange.sort(key=cmp2key(field,fncmp,xCollator,TextFirst),reverse = rev)
    return rrange
Karolus uses this technique of giving default argument values in the function definition.

Code: Select all

    default = ((0,0,0,0),)

    def PySorted(self, inRange, fieldspecs=default, HasHeaders=False , ByRows=False, docProps = None):
I find this doesn't quite work. Since HasHeaders and ByRows are optional arguments, if they are omitted from the function call they test as False anyway, so, for example

Code: Select all

if ByRows:
works as desired either way.

But, when going back to the idl and making fieldspecs optional by changing its type from sequence< sequence< any > > to plain any, I find that in a call like

Code: Select all

=PYSORTED(A13:E1012)
fieldspecs is not assigned default = ((0,0,0,0),), as intended. So instead I'm using

Code: Select all

def PySorted(self, inRange, fieldspecs, HasHeaders, ByRows, docProps):
and I deal with the optional fieldspecs with

Code: Select all

        if not fieldspecs:
            fieldspecs = ((0,0,0,0),)
Which seems to work as desired.

Now whereas Karolus' main_sort (below) uses try/except to catch out-of-range field values, and this is mostly viable, it doesn't handle the possibility that otherwise valid field numbers are valid. So if the same field number is specified both first and third the result will be as if it had only been third. I stated above that the first occurrence would be used, and there is certainly no point in sorting on the same field twice, so I'm using my field validity checker, CheckFields. It wasn't working as advertised before to weed out the duplicates either, but I think I have that fixed.

Code: Select all

def main_sort( rrange, args):
    for arg in args[::-1]:
        try:
            field, rev, textfirst, ignore_case = map(int, arg )
            rrange.sort(key=lambda e: e[field].lower()
                    if ignore_case and type(e[field])==unicode
                    else e[field], reverse=rev)
            if textfirst:
                rrange = texttofront(rrange, field)
        except:
            pass
    return rrange
Note that I have adopted Karolus' method for checking the data types throughout.

I changed the version number in description.xml to version 2, and here is the revised idl

Code: Select all

#include <com/sun/star/uno/XInterface.idl>
#include <com/sun/star/beans/XPropertySet.idl>

module com { module pysorted {

    interface XPySorted
    {
       sequence< sequence< any > > PySorted( [in] sequence< sequence< any > > inRange, [in] any fieldspecs, [in] any HasHeaders, [in] any ByRows, [in] ::com::sun::star::beans::XPropertySet docProps );
    };

}; };
And here is the new code

Code: Select all

import uno
import unohelper
from com.sun.star.lang import Locale
from com.pysorted import XPySorted

# PySorted Calc Add-in implementation.

class PySortedImpl( unohelper.Base, XPySorted ):
    def __init__( self, ctx ):
        self.ctx = ctx
        
    def PySorted(self, inRange, fieldspecs, HasHeaders, ByRows, docProps):
        aLoc = docProps.getPropertyValue("CharLocale")  #Use locale of document for collation. This could be a problem with multilingual spreadsheets.
        xCollator = self.ctx.getServiceManager().createInstance("com.sun.star.i18n.Collator")
        if not fieldspecs:
            fieldspecs = ((0,0,0,0),)
        fieldspecs = CheckFields(fieldspecs,len(inRange[0]))  #Checks fields for validity
        inRange = list(inRange)
        if len(fieldspecs) == 0:
            return tuple(inRange)  #Just echo back inRange if no valid sort fields provided.
        out = []
        if ByRows:
            inRange = zip(*inRange)
        if HasHeaders:
            out.append(inRange[0])
            inRange = inRange[1:]

        sortrange = main_sort(inRange,fieldspecs,xCollator,aLoc)
        out.extend(sortrange)

        if ByRows:
            return tuple(zip(*out))

        return tuple(out)

def main_sort(rrange,specs,xCollator,aLoc):
    for spec in specs[::-1]:
        field, rev, TextFirst, ignore_case = map(int, spec)
        xCollator.loadDefaultCollator(aLoc,ignore_case)
        rrange.sort(key=cmp2key(field,fncmp,xCollator,TextFirst),reverse = rev)
    return rrange
    
#Comparison function.  i is field index, a and b are entire rows.
def fncmp(i,a,b,xCollator,TextFirst):
    if type(a[i]) == unicode and type(b[i]) == unicode: 
        return xCollator.compareString(a[i],b[i])
    elif type(a[i]) == float and type(b[i]) == float:
        return numcmp(a[i],b[i])
    elif type(a[i]) == float:
        return 1 if TextFirst else -1
    else:
	    return -1 if TextFirst else 1
        
#Compare two numerics
def numcmp(a,b):
    if a < b:
        return -1
    elif a > b:
        return 1
    else:
        return 0

def cmp2key(i,mycmp,xCollator,TextFirst):
    #Convert a cmp= function into a key= function
    class K(object):
        def __init__(self, obj, *args):
            self.obj = obj
        def __lt__(self, other):
            return mycmp(i,self.obj, other.obj, xCollator, TextFirst) < 0
        def __gt__(self, other):
            return mycmp(i,self.obj, other.obj, xCollator, TextFirst) > 0
        def __eq__(self, other):
            return mycmp(i,self.obj, other.obj, xCollator, TextFirst) == 0
        def __le__(self, other):
            return mycmp(i,self.obj, other.obj, xCollator, TextFirst) <= 0
        def __ge__(self, other):
            return mycmp(i,self.obj, other.obj, xCollator, TextFirst) >= 0
        def __ne__(self, other):
            return mycmp(i,self.obj, other.obj, xCollator, TextFirst) != 0
    return K


#Checks fieldspecs validity.
def CheckFields(inArray,maxfields):
    outArray = []
    speclen = len(inArray)
    speclen0 = len(inArray[0])
    
    fieldnumbersin = [getfieldnumber(inArray[i][0],maxfields) for i in range(speclen)]
    fieldnumbersout = []

    for fieldnumber in fieldnumbersin:
        if fieldnumber not in fieldnumbersout:
            fieldnumbersout.append(fieldnumber)
        else:
            fieldnumbersout.append(-1)
             
    testfields = [tuple([fieldnumbersout[i]] + [getfieldvalue(inArray[i][j]) if j < speclen0 else 0 for j in range(1,4)]) for i in range(speclen)]
    
    for i in range(len(testfields)):
        if testfields[i][0] != -1:
            outArray.append(tuple(testfields[i]))
    	
    return outArray

def getfieldnumber(n,maxfields):
    if type(n) == float:
        return int(n) if 0 <= n < maxfields else -1
    elif type(n) == int:
        return n if 0 <= n < maxfields else -1
    else:
        return -1 

def getfieldvalue(n):
    if type(n) == float or type(n) == int:
        return 0 if n == 0 else 1
    else:
        return 0 

def createInstance( ctx ):
    return PySortedImpl( ctx )

g_ImplementationHelper = unohelper.ImplementationHelper()
g_ImplementationHelper.addImplementation( \
	createInstance,"com.pysorted.python.PySortedImpl",
		("com.sun.star.sheet.AddIn",),)

There is a gotcha however. Since the field types have changed, the old version must be removed completely before the new version is installed or the Extension Manager will throw an error and not do the update. It should normally suffice to remove the old, then restart Office, install the new version, then restart again. I had so many old versions sitting in the uno_packages cache under the program folders (where the extension installations for "all users" go), that I had to go in and manually clean them out.

With that warning, here is version 2:
Attachments
XPySorted.zip
Version 2
(3.47 KiB) Downloaded 819 times
Apache OpenOffice 4.1.1
Windows XP
User avatar
Charlie Young
Volunteer
Posts: 1559
Joined: Fri May 14, 2010 1:07 am

Re: [Python] Array Sort add-in function.

Post by Charlie Young »

The fun never stops. I tried the Python collator with, I think, positive results.

Code: Select all

import locale
and in the class definition, do

Code: Select all

locale.setlocale(locale.LC_ALL, "")
And now, adopting Karolus' try/except trick for the field validation, but adding a gimmick to handle duplicated field values

Code: Select all

def main_sort( rrange, args):
    usedfields = []
    for arg in args[::-1]:
        try:
            field, rev, textfirst, ignore_case = map(int, arg )
            if field not in usedfields:
                usedfields.append(field)
                rrange.sort(key=lambda e: locale.strxfrm(e[field].lower())
                    if ignore_case and type(e[field])==unicode
                    else locale.strxfrm(e[field]) if type(e[field])==unicode else e[field], reverse=rev)
                if textfirst:
                    rrange = texttofront(rrange, field)
        except:
            pass
    return rrange
So this avoids the need for cmp2key.

One quirk though, about texttofront, what is the correct result of using TextFirst with a reversed field?

Here is the full code, though I won't attach an .oxt just yet. Anyone comfy with the SDK can easily make their own.

Note that I'm now calling the function PYSORT.

Code: Select all

import uno
import unohelper
import locale
from com.pysort import XPySort

# SortArray Calc Add-in implementation.

class PySortImpl( unohelper.Base, XPySort ):
    def __init__( self, ctx ):
        self.ctx = ctx

    locale.setlocale(locale.LC_ALL, "")

    def PySort(self, inRange, fieldspecs, HasHeaders, ByRows, docProps):
        inRange = list(inRange)
        out = []
        if not fieldspecs:
            fieldspecs = ((0,0,0,0),)
        if ByRows:
            inRange = zip(*inRange)
        if HasHeaders:
            out.append(inRange[0])
            inRange = inRange[1:]

        sortrange = main_sort(inRange,fieldspecs)
        out.extend(sortrange)

        if ByRows:
            return tuple(zip(*out))

        return tuple(out)

def texttofront(rrange,field):
    i = 0
    while type(rrange[i][field])==float:
        i += 1
    return rrange[i:] + rrange[:i]

def main_sort( rrange, args):
    usedfields = []
    for arg in args[::-1]:
        try:
            field, rev, textfirst, ignore_case = map(int, arg )
            if field not in usedfields:
                usedfields.append(field)
                rrange.sort(key=lambda e: locale.strxfrm(e[field].lower())
                    if ignore_case and type(e[field])==unicode
                    else locale.strxfrm(e[field]) if type(e[field])==unicode else e[field], reverse=rev)
                if textfirst:
                    rrange = texttofront(rrange, field)
        except:
            pass
    return rrange

def createInstance( ctx ):
    return PySortImpl( ctx )

g_ImplementationHelper = unohelper.ImplementationHelper()
g_ImplementationHelper.addImplementation(
   createInstance,"com.pysort.python.PySortImpl",
      ("com.sun.star.sheet.AddIn",),)
Apache OpenOffice 4.1.1
Windows XP
User avatar
Charlie Young
Volunteer
Posts: 1559
Joined: Fri May 14, 2010 1:07 am

Re: [Python] Array Sort add-in function.

Post by Charlie Young »

Charlie Young wrote: One quirk though, about texttofront, what is the correct result of using TextFirst with a reversed field?
I think the most natural thing for a reversed field, of course, is for textfirst = False to put the numbers (reversed) at the bottom, that is, just have the entries fully reversed.

As I understand Karolus' texttofront function then, we need to make it so it reads from the bottom of a reversed field, so we can pass rev to it and read it that way if rev is True. I have it working that way, I think, but I suspect it could by Pythonized further by judicious use of list slices. Note that I'm also reverting to using CheckFields on the fieldspecs because 1) The try/except approach will invalidate a duplicated field in backwards order (OK, this isn't that important since duplicate fields shouldn't be specified anyway). 2) My instinct is to use error trapping as a last resort.

Code: Select all

def texttofront(rrange,field,rev):
    if not rev:
        i = 0
        while type(rrange[i][field]) == float:
            i += 1
    else:
        i = len(rrange)
        while type(rrange[i - 1][field]) == float:
            i -= 1
    return rrange[i:] + rrange[:i]      
	
def main_sort( rrange, specs):
    for spec in specs[::-1]:
        field, rev, textfirst, ignore_case = map(int, spec )
        rrange.sort(key=lambda e: locale.strxfrm(e[field].lower())
                    if ignore_case and type(e[field])==unicode
                    else locale.strxfrm(e[field]) if type(e[field])==unicode else e[field], reverse=rev)
        if textfirst:
            rrange = texttofront(rrange, field, rev)
    return rrange
I really think we're onto something with this Python collation though. It's way faster than the XCollator, and I've tried pretty extensively to get different orderings , but I always wind up with the same thing. Of course I've only tried US English, so I guess it can't be that extensive.
Apache OpenOffice 4.1.1
Windows XP
User avatar
karolus
Volunteer
Posts: 1158
Joined: Sat Jul 02, 2011 9:47 am

Re: [Python] Array Sort add-in function.

Post by karolus »

Hi
I see we come closer..
But there is still one Problem, Calc pass <empty cells> as cells with value 0.0 and sort them in front of Text
( more precise the Sortorder is: negative numbers < 0 < positiv Numbers < <Empty> < "text" )
In my Opinion we need the <Emptys> at the End, like the behavior via →Data→Sort

I have now an own branch with NameSpaces different from yours, so you can test both at the Time.
The Arguments for Casesensitive, Textfirst, Reversed and (new)NaturalSort*** are implementet as Flags:
  • s : casesensitive
    t : Textfirst
    n : NaturalSort
    r : reversed
they can be set in second Fieldspecfield in all possible Kombinations.

***Natural Sort sorts Text mixed up with Group[s] of Digits in 'Human-Expected-Order'

Here is the Pythoncode: (as you can see - I freely borrow your Ideas - )

Code: Select all

import unohelper
import locale
import re
from functools import partial
from com.capylibre.sortaddin import XSortArray

# SortArray Calc Add-in implementation.

numrex = re.compile(r'[1-9]\d*')

class NaturalSort( unohelper.Base, XSortArray ):
    def __init__( self, ctx ):
        self.ctx = ctx
        locale.setlocale(locale.LC_ALL, "")



    def sortieren(self, inRange, fieldspecs , HasHeaders=False , ByRows=False):
        """
        Functionname : SORTIEREN
        Arguments:
        InRange : The Range to Sort
        fieldspecs: List of 2 Colums with Fieldnumber left and any
        Kombinations of Flags s, r, t, n on the Right Side
        HasHeaders: if True skip First Row/Column as header
        ByRows : if True sorts left to Right, otherwise Top to Botton
        """
        inRange = list(inRange)
        out = []
        if ByRows:
            inRange = zip(*inRange)
        if HasHeaders:
            out.append(inRange[0])
            inRange = inRange[1:]


        sortrange = main_sort(inRange,fieldspecs)
        out.extend(sortrange)

        if ByRows:
            return tuple(zip(*out))

        return tuple(out)




def main_sort( rrange, args):
    """
    """
    used = []
    for arg in args[::-1]:
        try:
            field, flags = int(arg[0]), arg[1]
            if field not in used:
                used.append(field)
                rrange.sort(key=partial( naturalkey, field, flags ),
                        reverse=('r' in flags ) )
        except Exception as error:
            print '%s'%error
    return rrange

def naturalkey( field, flags, key):
    key = key[field]
    if isinstance(key, unicode):
        if not 's' in flags:  # s >> case_s_ensitive
            key = key.lower()
        if 'n' in flags: # n >> NaturalSort
            key = numrex.sub(lambda m:'%03d%s' %(len(m.group()), m.group()), key)
        return locale.strxfrm(key.encode('utf8')) # strxfrm can't deal with unicode so encode to 'utf8' first
    else: #  Calc pass floats !
        if 't' in flags: # t: >> Textfirst
            return ('zzzzzzz', key)  # tze  zzzTrick ;-)
        return key




def createInstance( ctx ):
    return NaturalSort( ctx )

g_ImplementationHelper = unohelper.ImplementationHelper()
g_ImplementationHelper.addImplementation(
    createInstance,"com.capylibre.sortaddin.python.NaturalSort",("com.sun.star.sheet.AddIn",),)
 
See Attached zip:
Attachments
ksort.zip
(3.86 KiB) Downloaded 883 times
Last edited by RoryOF on Fri Nov 02, 2012 12:59 am, edited 1 time in total.
Reason: corrected typo to help clarity
AOO4, Libreoffice 6.1 on Rasbian OS (on ARM)
Libreoffice 7.4 on Debian 12 (Bookworm) (on RaspberryPI4)
Libreoffice 7.6 flatpak on Debian 12 (Bookworm) (on RaspberryPI4)
User avatar
karolus
Volunteer
Posts: 1158
Joined: Sat Jul 02, 2011 9:47 am

Re: [Python] Array Sort add-in function.

Post by karolus »

Hallo

Update:

Code: Select all

import unohelper
import locale
import re
from functools import partial
from operator import methodcaller
from com.capylibre.sortaddin import XSortArray


# SortArray Calc Add-in implementation.

NUMBER_RE = re.compile(r'[1-9]\d*')

class NaturalSort( unohelper.Base, XSortArray ):
    def __init__( self, ctx ):
        self.ctx = ctx
        locale.setlocale(locale.LC_ALL, "")



    def sortieren(self, inRange, fieldspecs , HasHeaders=False , ByRows=False):
        """
        Functionname : SORTIEREN
        Arguments:
        InRange : The Range to Sort
        fieldspecs: List of 2 Columns, with Fieldnumber left ,and any
                Combination of Flags s, r, t, n on the Right Side
        HasHeaders: if True skip First Row/Column as header
        ByRows : if True sorts left to Right, otherwise Top to Bottom
        """
        inRange = list(inRange)
        out = []
        if ByRows:
            inRange = zip(*inRange)
        if HasHeaders:
            out.append(inRange[0])
            inRange = inRange[1:]


        sortrange = main_sort(inRange,fieldspecs)
        out.extend(sortrange)

        if ByRows:
            return tuple(zip(*out))

        return tuple(out)




def make_key_func(index, flags):

    def key_func(item):
        key = item[index]
        if isinstance(key, unicode):
            for function in functions:
                key = function(key)
            return (0, key)
        else:
            return (number_sort_key, key)

    number_sort_key = -1 if 't' in flags else 1
    functions = list()
    if 's' not in flags:
        functions.append(methodcaller('lower'))
    if 'n' in flags:
        functions.append(
            partial(
                NUMBER_RE.sub, lambda m: '%03d%s' % (len(m.group()), m.group())
            )
        )
    functions.append(methodcaller('encode','utf8'))
    functions.append(locale.strxfrm)

    return key_func


def main_sort(items, sort_specs):
    for index, flags in reversed(sort_specs):
        index = int(index)
        items.sort(key=make_key_func(index, flags or 'n'), reverse='r' in flags)
    return items



def createInstance( ctx ):
    return NaturalSort( ctx )

g_ImplementationHelper = unohelper.ImplementationHelper()
g_ImplementationHelper.addImplementation(
    createInstance,"com.capylibre.sortaddin.python.NaturalSort",("com.sun.star.sheet.AddIn",),)
 
with some Help from BlackJack

Karolus
AOO4, Libreoffice 6.1 on Rasbian OS (on ARM)
Libreoffice 7.4 on Debian 12 (Bookworm) (on RaspberryPI4)
Libreoffice 7.6 flatpak on Debian 12 (Bookworm) (on RaspberryPI4)
User avatar
Charlie Young
Volunteer
Posts: 1559
Joined: Fri May 14, 2010 1:07 am

Re: [Python] Array Sort add-in function.

Post by Charlie Young »

karolus wrote:Hallo

Update:

Karolus
This seems to reverse the meaning of t?
Apache OpenOffice 4.1.1
Windows XP
User avatar
Charlie Young
Volunteer
Posts: 1559
Joined: Fri May 14, 2010 1:07 am

Re: [Python] Array Sort add-in function.

Post by Charlie Young »

The "t" issue aside, I have been looking at the problem of always moving the blanks to the end. I don't believe that the blank cells are being interpreted as zeros, I think it is that they are just sorting below non-null strings and above numbers. I find that if an element of inRange is of type == unicode and length 0, we may regard it as blank and give it special treatment.

Make a simple function

Code: Select all

def blankval(v):
    return 1 if type(v) == unicode and len(v) == 0 else 0
I haven't figured out how to use this with strxfrm, so I'm using a comparison function with strcoll.

This is perhaps a bit of a retrograde step, but it at least shows this is possible.

I have also adopted the letter flags, which I agree should be easier for the user (I assume that is your thinking as well). I have adapted my CheckFields bit to turn the flags into the lists of zeros and ones as before, so what I have in main_sort for the specs looks the same.

Code: Select all

def main_sort( rrange, specs):
    for spec in specs[::-1]:
        field, rev, textfirst, ignore_case = map(int, spec )
        rrange.sort(key=pycmp2key(field,pycmp,textfirst,ignore_case,rev),reverse = rev)
        if textfirst:
            rrange = texttofront(rrange, field, rev)
    return rrange
In the comparison function pycmp, the key thing to note is the use of blankval to force the blanks to always compare high, even if reverse is in effect.

Code: Select all

def pycmp(i,a,b,TextFirst,ignore_case,rev):
    acomp = a[i].lower() if ignore_case and type(a[i]) == unicode else a[i]
    bcomp = b[i].lower() if ignore_case and type(b[i]) == unicode else b[i]
    #Edit 11/2  -- need to check for both items blank and therefore equal  or else stability is lost.
    if blankval(acomp) and blankval(bcomp)):
        return 0
    elif blankval(acomp): 
        return 1 if not rev else -1
    elif blankval(bcomp):
        return -1 if not rev else 1
    elif type(acomp) == unicode and type(bcomp) == unicode: 
        return locale.strcoll(acomp,bcomp)
    elif type(acomp) == float and type(bcomp) == float:
        return numcmp(a[i],b[i])
    elif type(acomp) == float:
        return 1 if TextFirst else -1
    else:
	    return -1 if TextFirst else 1

We also need a comp-to-key function, of course.

Code: Select all

def pycmp2key(i,mycmp,TextFirst,ignore_case,rev):
    #Convert a cmp= function into a key= function
    class K(object):
        def __init__(self, obj, *args):
            self.obj = obj
        def __lt__(self, other):
            return mycmp(i,self.obj, other.obj, TextFirst,ignore_case,rev) < 0
        def __gt__(self, other):
            return mycmp(i,self.obj, other.obj, TextFirst,ignore_case,rev) > 0
        def __eq__(self, other):
            return mycmp(i,self.obj, other.obj, TextFirst,ignore_case,rev) == 0
        def __le__(self, other):
            return mycmp(i,self.obj, other.obj, TextFirst,ignore_case,rev) <= 0
        def __ge__(self, other):
            return mycmp(i,self.obj, other.obj, TextFirst,ignore_case,rev) >= 0
        def __ne__(self, other):
            return mycmp(i,self.obj, other.obj, TextFirst,ignore_case,rev) != 0
    return K

incidentally, I did see the bug reports about strxfrm, but when I didn't run into any problems when testing, I forgot about them.

The "natural sort" option also is nice, though I admit I haven't quite figured it out yet. I don't see where Data > Sort handles anything like that.

As I mentioned at first, a do have a c++ version of this for Windows, and I'm trying to keep it in sync with our version here. Right now I'm getting it going with the letter flags. The natural sort might be a real challenge for that. I haven't tried regex with c++ yet, though maybe another approach would work as well. Like I said, I haven't really fully grasped the concept yet.
 Edit: 11/2 Added necessary check for both a and b blank in pycomp. 
Last edited by Charlie Young on Fri Nov 02, 2012 7:35 am, edited 2 times in total.
Apache OpenOffice 4.1.1
Windows XP
User avatar
karolus
Volunteer
Posts: 1158
Joined: Sat Jul 02, 2011 9:47 am

Re: [Python] Array Sort add-in function.

Post by karolus »

Hallo
Charlie Young wrote:The "t" issue aside,
Sorry, the 't' issue was my fault its right now, see Code below.
Charlie Young wrote: I have been looking at the problem of always moving the blanks to the end. I don't believe that the blank cells are being interpreted as zeros, I think it is that they are just sorting below non-null strings and above numbers. I find that if an element of inRange is of type == unicode and length 0, we may regard it as blank and give it special treatment.
Good catch!
The solution is simple in the keyfunction:

Code: Select all

import unohelper
import locale
import re
from functools import partial
from operator import methodcaller
from com.capylibre.sortaddin import XSortArray


# SortArray Calc Add-in implementation.

NUMBER_RE = re.compile(r'[1-9]\d*')


class NaturalSort( unohelper.Base, XSortArray):
    def __init__( self, ctx ):
        self.ctx = ctx
        locale.setlocale(locale.LC_ALL, "")



    def sortieren(self, inRange, fieldspecs , HasHeaders=False , ByRows=False):
        """
        Function_name : SORTIEREN
        Arguments:
        InRange : The Range to Sort
        fieldspecs: List of 2 Columns with Fieldnumber left ,and any
                Combination of Flags s, r, t, n on the Right Side
        HasHeaders: if True skip First Row/Column as header
        ByRows : if True sorts left to Right, otherwise Top to Bottom
        """
        inRange = list(inRange)
        out = []
        if ByRows:
            inRange = zip(*inRange)
        if HasHeaders:
            out.append(inRange[0])
            inRange = inRange[1:]


        sortrange = main_sort(inRange,fieldspecs)
        out.extend(sortrange)

        if ByRows:
            return tuple(zip(*out))

        return tuple(out)




def make_key_func(index, flags):

    def key_func(item):
        key = item[index]
        if isinstance(key, unicode):
            for function in functions:
                key = function(key)
            return (2 if len(key)==0 else 0 , key) # here do we put the <empties> to the end
        else:
            return (number_sort_key, key)

    number_sort_key = 1 if 't' in flags else -1 # Here is the corrected 't' -behavior
    functions = list()
    if 's' not in flags:
        functions.append(methodcaller('lower'))
    if 'n' in flags:
        functions.append(
            partial(
                NUMBER_RE.sub, lambda m: '%03d%s' % (len(m.group()), m.group())
            )
        )
    functions.append(methodcaller('encode','utf8'))
    functions.append(locale.strxfrm)
    return key_func


def main_sort(items, sort_specs):
    for index, flags in reversed(sort_specs):
        index = int(index)
        items.sort(key=make_key_func(index, flags or 'n'), reverse='r' in flags)
    return items



def createInstance( ctx ):
    return NaturalSort( ctx )

g_ImplementationHelper = unohelper.ImplementationHelper()
g_ImplementationHelper.addImplementation(
    createInstance,"com.capylibre.sortaddin.python.NaturalSort",("com.sun.star.sheet.AddIn",),)

 
It's hard too say, but

Code: Select all

php[/color]] -tags works well.  :? 



[quote]I haven't figured out how to use this with strxfrm,  so I'm using a comparison function with strcoll.[/quote]
Look the 'return's of key-function above, its quite simple.



[quote="Charlie"]The "natural sort" option also is nice, though I admit I haven't quite figured it out yet.   I don't see where Data > Sort handles anything like that.[/quote]
Libreoffice have this Option in →Data→Sort..
AOO4, Libreoffice 6.1 on Rasbian OS (on ARM)
Libreoffice 7.4 on Debian 12 (Bookworm) (on RaspberryPI4)
Libreoffice 7.6 flatpak on Debian 12 (Bookworm) (on RaspberryPI4)
User avatar
Charlie Young
Volunteer
Posts: 1559
Joined: Fri May 14, 2010 1:07 am

Re: [Python] Array Sort add-in function.

Post by Charlie Young »

karolus wrote: The solution is simple in the keyfunction:

Code: Select all

            return (2 if len(key)==0 else 0 , key) # here do we put the <empties> to the end
 


Look the 'return's of key-function above, its quite simple.
I find, with a regular ascending, numbers first sort, that the blanks are between the numbers and text, not at the end.
karolus wrote:
Charlie wrote:The "natural sort" option also is nice, though I admit I haven't quite figured it out yet. I don't see where Data > Sort handles anything like that.
Libreoffice have this Option in →Data→Sort..
Good enough reason to include it.
Apache OpenOffice 4.1.1
Windows XP
User avatar
Charlie Young
Volunteer
Posts: 1559
Joined: Fri May 14, 2010 1:07 am

Re: [Python] Array Sort add-in function.

Post by Charlie Young »

The latest. I do think we're getting somewhere.

My newest version is called PYSORT, incorporating much of Karolus' work, with differences to be discussed.

I have added a new optional parameter called CollationType, also to be discussed.

Code: Select all

def PySort(self, inRange, fieldspecs, HasHeaders, ByRows, CollationType, docProps)
InRange is as before.

fieldspecs has adopted the single letter flags in column 2, though I have changed it a bit:

I changed "s" for sensitive, to "c" because I wanted the default to be case sensitive, whereas we had it so that "s" had to be specified to achieve this. I was going to use "i" for insensitive, but the autocorrect wants to change a solitary "i" to "I," and though I made it so that the flags themselves are case-insensitive, I found this irritating, as others might.

I have included the "natural sort" "n" flag which works with all the collation options except 3. Karolus' SORTIEREN was making this the default option when no other flags where specified, but I removed this behavior since it might cause unexpected results.

"r" is still reverse sort, and "t" is still text first.

I also made it so that if a single number is specified, it sorts on that number's field (if it's in range), with all the other defaults.

HasHeaders and ByRows are as before.

The new CollationType works as follows:

0 - default, uses a comparison function with Python's locale.strcoll(), I made this the default instead of strxfrm for reasons also to be discussed.

1 - locale.strxfrm.

2 - uses the Office XCollator.

3 - This has no collation, it just uses the straight byte order ascii sort. This is fast, but otherwise fairly useless, but since we had the code I figured I might as well include it.

The reason for using strcoll instead of strxfrm is that strxfrm is broken, requiring the utf8 encoding. I have been testing this with a gobbledygook of randomly generated strings, numbers, and blanks, and when I include unicode characters above 255 in the strings, strxfrm fails on a case-insensitive sort for some reason. If, on the other hand, I remove the utf8 encoding, strxfrm fails completely if the higher-code characters are included.

docProps, in case anyone is wondering, are the document properties automatically passed to the function, and are not specified by the user.

A word on "natural sort" for those who don't know what this is: it modifies the sort keys, though not the displayed results, so that a list like

File1.txt
File2.txt
File3.txt
File4.txt
File5.txt
File6.txt
File7.txt
File8.txt
File9.txt
File10.txt
File11.txt
File12.txt
File13.txt

sorts in that order instead of,

File1.txt
File10.txt
File11.txt
File12.txt
File13.txt
File2.txt
File3.txt
File4.txt
File5.txt
File6.txt
File7.txt
File8.txt
File9.txt


This is really only a special case of problems that humans have with computer generated lists, but I'll put further comments aside.

I dealt with the problem strxfrm was having with moving the blanks to the end by adding a function inspired by Karolus' texttofront:

Code: Select all

def blankstoend(rrange,field):
    i = firstblank = 0
    rlen = len(rrange)
    while i < rlen and not blankval(rrange[i][field]):
        i += 1
    firstblank = i
    while i < rlen and blankval(rrange[i][field]):
        i += 1
    lastblank = i
    if firstblank < rlen:
        return rrange[:firstblank] + rrange[lastblank:] + rrange[firstblank:lastblank]
    else:
        return rrange
I'm playing with some ideas for some additional options, but for now here is the full code and the zipped .oxt.

Code: Select all

import uno
import unohelper
import locale
import re
from com.sun.star.lang import Locale
from com.pysort import XPySort
from operator import methodcaller
from functools import partial

# PySort Calc Add-in implementation.
NUMBER_RE = re.compile(r'[1-9]\d*')

class PySortImpl( unohelper.Base, XPySort ):
    def __init__( self, ctx ):
        self.ctx = ctx

    def PySort(self, inRange, fieldspecs, HasHeaders, ByRows, CollationType, docProps):
        inRange = list(inRange)
        out = []
        if not fieldspecs:
            fieldspecs = ((0,""),)
        elif type(fieldspecs) == float:
            fieldspecs = ((fieldspecs,""),)
        
        specs = CheckFields(fieldspecs,len(inRange[0]))  #Checks fields for validity

        TypeCollation = 0	
        if type(CollationType) == float:
            TypeCollation = int(CollationType) if 0 <= CollationType <=  3 else 0		

        if ByRows:
            inRange = zip(*inRange)
        if HasHeaders:
            out.append(inRange[0])
            inRange = inRange[1:]
        
        if TypeCollation == 0:
            locale.setlocale(locale.LC_ALL, "")
            sortrange = pyCollator_sort(inRange,specs)
        elif TypeCollation == 1:
            locale.setlocale(locale.LC_ALL, "")
            sortrange = strxfrm_sort(inRange,specs)
        elif TypeCollation == 2:
            aLoc = docProps.getPropertyValue("CharLocale")  #Use locale of document for collation. This could be a problem with multilingual spreadsheets.
            xCollator = self.ctx.getServiceManager().createInstance("com.sun.star.i18n.Collator")
            sortrange = xCollator_sort(inRange,specs,xCollator,aLoc)
        else:
            sortrange = byte_sort(inRange,specs)

        out.extend(sortrange)

        if ByRows:
            return tuple(zip(*out))

        return tuple(out)

def blankval(v):
    return 1 if type(v) == unicode and len(v) == 0 else 0
	
def pyCollator_sort( rrange, specs):
    for spec in specs[::-1]:
        field, rev, textfirst, ignore_case, isnatural = map(int, spec )
        k = partial(pycmp2key,i = field,TextFirst = textfirst, ignore_case = ignore_case, isnatural = isnatural, rev = rev)
        rrange.sort(key = k(pycmp),reverse = rev)
    return rrange

def pycmp(i,a,b,TextFirst,ignore_case, isnatural, rev):
    #acomp = a[i].lower().encode("utf8") if ignore_case and type(a[i]) == unicode else (a[i].encode("utf8") if type(a[i]) == unicode else a[i])
    #bcomp = b[i].lower().encode("utf8") if ignore_case and type(b[i]) == unicode else (b[i].encode("utf8") if type(b[i]) == unicode else b[i])
    acomp = a[i].lower() if ignore_case and type(a[i]) == unicode else (a[i] if type(a[i]) == unicode else a[i])
    bcomp = b[i].lower() if ignore_case and type(b[i]) == unicode else (b[i] if type(b[i]) == unicode else b[i])
    acomp = naturalkey(acomp) if isnatural else acomp
    bcomp = naturalkey(bcomp) if isnatural else bcomp

    if blankval(acomp) and blankval(bcomp):
	    return 0
    elif blankval(acomp): 
        return 1 if not rev else -1
    elif blankval(bcomp):
        return -1 if not rev else 1
    elif type(acomp) == unicode and type(bcomp) == unicode: 
        return locale.strcoll(acomp,bcomp)
    elif type(acomp) == float and type(bcomp) == float:
        return numcmp(acomp,bcomp)
    elif type(acomp) == float:
        return 1 if TextFirst else -1
    else:
	    return -1 if TextFirst else 1

def pycmp2key(mycmp,i,TextFirst,ignore_case,isnatural,rev):
    #Convert a cmp= function into a key= function
    class K(object):
        def __init__(self, obj, *args):
            self.obj = obj
        def __lt__(self, other):
            return mycmp(i,self.obj, other.obj, TextFirst,ignore_case,isnatural,rev) < 0
        def __gt__(self, other):
            return mycmp(i,self.obj, other.obj, TextFirst,ignore_case,isnatural,rev) > 0
        def __eq__(self, other):
            return mycmp(i,self.obj, other.obj, TextFirst,ignore_case,isnatural,rev) == 0
        def __le__(self, other):
            return mycmp(i,self.obj, other.obj, TextFirst,ignore_case,isnatural,rev) <= 0
        def __ge__(self, other):
            return mycmp(i,self.obj, other.obj, TextFirst,ignore_case,isnatural,rev) >= 0
        def __ne__(self, other):
            return mycmp(i,self.obj, other.obj, TextFirst,ignore_case,isnatural,rev) != 0
    return K

def blankstoend(rrange,field):
    i = firstblank = 0
    rlen = len(rrange)
    while i < rlen and not blankval(rrange[i][field]):
        i += 1
    firstblank = i
    while i < rlen and blankval(rrange[i][field]):
        i += 1
    lastblank = i
    if firstblank < rlen:
        return rrange[:firstblank] + rrange[lastblank:] + rrange[firstblank:lastblank]
    else:
        return rrange

def make_key_func(spec):

    def key_func(item):
        key = item[field]
        if isinstance(key, unicode):
            for function in functions:
                key = function(key)
            return (1 if len(key)==0 else 0 , key) # here do we put the <empties> to the end
        else:
            return (number_sort_key,key)
    field, rev, TextFirst, ignore_case, isnatural = map(int, spec)
    number_sort_key = 1 if TextFirst else -1 # Here is the corrected 't' -behavior
    functions = list()
    if ignore_case:
        functions.append(methodcaller('lower'))
    if isnatural:
        functions.append(
            partial(
                NUMBER_RE.sub, lambda m: '%03d%s' % (len(m.group()), m.group())
            )
        )
    functions.append(methodcaller('encode','utf8'))
    functions.append(locale.strxfrm)
    return key_func


def strxfrm_sort(items, specs):
    for spec in reversed(specs):
        field, rev, TextFirst, ignore_case, isnatural = map(int, spec)
        items.sort(key=make_key_func(spec), reverse = rev)
        items = blankstoend(items,field)
    return items

def xCollator_sort(rrange,specs,xCollator,aLoc):
    for spec in specs[::-1]:
        field, rev, TextFirst, ignore_case, isnatural = map(int, spec)
        xCollator.loadDefaultCollator(aLoc,ignore_case)
        k = partial(cmp2key,i = field,xCollator = xCollator,TextFirst = TextFirst, isnatural = isnatural, rev = rev)
        rrange.sort(key=k(fncmp),reverse = rev)
    return rrange
    
#Comparison function.  i is field index, a and b are entire rows.
def fncmp(i,a,b,xCollator,TextFirst,isnatural,rev):
    acomp = naturalkey(a[i]) if isnatural and type(a[i]) == unicode else (a[i] if type(a[i]) == unicode else a[i])
    bcomp = naturalkey(b[i]) if isnatural and type(b[i]) == unicode else (b[i] if type(b[i]) == unicode else b[i])
    
    if blankval(acomp) and blankval(bcomp):
        return 0
    elif blankval(acomp): 
        return 1 if not rev else -1
    elif blankval(bcomp):
        return -1 if not rev else 1
    elif type(acomp) == unicode and type(bcomp) == unicode: 
        return xCollator.compareString(acomp,bcomp)
    elif type(acomp) == float and type(bcomp) == float:
        return numcmp(acomp,bcomp)
    elif type(acomp) == float:
        return 1 if TextFirst else -1
    else:
	    return -1 if TextFirst else 1

def naturalkey(arg):
    if type(arg) == unicode:
        nkey = partial(NUMBER_RE.sub, lambda m: '%03d%s' % (len(m.group()), m.group()))
        return nkey(arg)
    else:
        return arg

#Compare two numerics
def numcmp(a,b):
    if a < b:
        return -1
    elif a > b:
        return 1
    else:
        return 0

def cmp2key(mycmp,i,xCollator,TextFirst,isnatural,rev):
    #Convert a cmp= function into a key= function
    class K(object):
        def __init__(self, obj, *args):
            self.obj = obj
        def __lt__(self, other):
            return mycmp(i,self.obj, other.obj, xCollator, TextFirst, isnatural, rev) < 0
        def __gt__(self, other):
            return mycmp(i,self.obj, other.obj, xCollator, TextFirst, isnatural, rev) > 0
        def __eq__(self, other):
            return mycmp(i,self.obj, other.obj, xCollator, TextFirst, isnatural, rev) == 0
        def __le__(self, other):
            return mycmp(i,self.obj, other.obj, xCollator, TextFirst, isnatural, rev) <= 0
        def __ge__(self, other):
            return mycmp(i,self.obj, other.obj, xCollator, TextFirst, isnatural, rev) >= 0
        def __ne__(self, other):
            return mycmp(i,self.obj, other.obj, xCollator, TextFirst, isnatural, rev) != 0
    return K

def texttofront(rrange,field,rev):
    if not rev:
        i = 0
        while type(rrange[i][field]) == float:
            i += 1
    else:
        i = len(rrange)
        while type(rrange[i - 1][field]) == float:
            i -= 1
    return rrange[i:] + rrange[:i]      

def byte_sort( rrange, specs):
    for spec in specs[::-1]:
        field, rev, textfirst, ignore_case, isnatural = map(int, spec )
        rrange.sort(key=lambda e: e[field].lower()
                    if ignore_case and type(e[field])==unicode
                    else e[field], reverse=rev)
        if textfirst:
            rrange = texttofront(rrange, field, rev)
    return rrange
	
def CheckFields(inArray,maxfields):
    outArray = []
    speclen = len(inArray)
    speclen0 = len(inArray[0])
    
    fieldnumbersin = [getfieldnumber(inArray[i][0],maxfields) for i in range(speclen)]
    fieldnumbersout = []

    for fieldnumber in fieldnumbersin:
        if fieldnumber not in fieldnumbersout:
            fieldnumbersout.append(fieldnumber)
        else:
            fieldnumbersout.append(-1)
             
    testfields = [tuple([fieldnumbersout[i]] + [getfieldvalue(inArray[i][1],j) if speclen0 > 1 else 0 for j in range(0,4)]) for i in range(speclen)]
    
    for i in range(len(testfields)):
        if testfields[i][0] != -1:
            outArray.append(tuple(testfields[i]))
    	
    return outArray

def getfieldnumber(n,maxfields):
    if type(n) == float:
        return int(n) if 0 <= n < maxfields else -1
    elif type(n) == int:
        return n if 0 <= n < maxfields else -1
    else:
        return -1 

def getfieldvalue(s,i):
    flags = ["r","t","c","n"]
    if type(s) == unicode:
        return 1 if flags[i] in s.lower() else 0
    else:
        return 0 

def createInstance( ctx ):
    return PySortImpl( ctx )

g_ImplementationHelper = unohelper.ImplementationHelper()
g_ImplementationHelper.addImplementation(
   createInstance,"com.pysort.python.PySortImpl",
      ("com.sun.star.sheet.AddIn",),)
Attachments
XPySort.zip
PYSORT Version 1
(4.65 KiB) Downloaded 1092 times
Apache OpenOffice 4.1.1
Windows XP
mikekaganski
Posts: 12
Joined: Mon Oct 30, 2017 12:39 pm

Re: [Python] Array Sort add-in function.

Post by mikekaganski »

This fixes the code to be compatible with Python 3 (used in LibreOffice since version 4.0):

Code: Select all

 # SortArray Calc Add-in implementation.
 NUMBER_RE = re.compile(r'[1-9]\d*')
 
+# unicode from Python 2 is str in Python 3
+try:
+    unicode
+except NameError:
+    unicode = str
+
 class PySortImpl( unohelper.Base, XPySort ):
     def __init__( self, ctx ):
         self.ctx = ctx
LibreOffice 7.6 on Windows 10
User avatar
Villeroy
Volunteer
Posts: 31269
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany

Re: [Python] Array Sort add-in function.

Post by Villeroy »

mikekaganski wrote:This fixes the code to be compatible with Python 3 (used in LibreOffice since version 4.0):
No. There is something more than that. Still returns #VALUE! while working with AOO.
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
User avatar
Villeroy
Volunteer
Posts: 31269
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany

Re: [Python] Array Sort add-in function.

Post by Villeroy »

The thing that seems to work at a first glance is to replace all words "unicode" with "str". Don't ask me why. I'm tired of this extension shit.
"First glance" means =PYSORT(A1:A10) with numbers in A1:A10.
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
mikekaganski
Posts: 12
Joined: Mon Oct 30, 2017 12:39 pm

Re: [Python] Array Sort add-in function.

Post by mikekaganski »

Villeroy wrote:
mikekaganski wrote:This fixes the code to be compatible with Python 3 (used in LibreOffice since version 4.0):
No. There is something more than that. Still returns #VALUE! while working with AOO.
Hmm... I suppose you have made some mistake while applying the diff from my post (and I must admit that the diff formatted with [code] tag was still deformed by the forum to strip leading whitespaces, adding to confusion.)

I have tested the attached file with both LibreOffice 6.3.0.2 (and master) x64 and AOO 4.1.6 on Windows 10, and it WFM.
Villeroy wrote:The thing that seems to work at a first glance is to replace all words "unicode" with "str". Don't ask me why. I'm tired of this extension shit.
"First glance" means =PYSORT(A1:A10) with numbers in A1:A10.
"unicode" is a built-in type in Python 2, but not in Python 3. I have discussed that in the AskLibO question.
Attachments
XPySort.oxt
A patched extension working with both LO 6.3.0.2 and AOO 4.1.6
(4.67 KiB) Downloaded 741 times
LibreOffice 7.6 on Windows 10
Post Reply