Natural sort for OpenOffice Calc

Shared Libraries
Forum rules
For sharing working examples of macros / scripts. These can be in any script language supported by OpenOffice.org [Basic, Python, Netbean] or as source code files in Java or C# even - but requires the actual source code listing. This section is not for asking questions about writing your own macros.
Locked
User avatar
MrProgrammer
Moderator
Posts: 5264
Joined: Fri Jun 04, 2010 7:57 pm
Location: Wisconsin, USA

Natural sort for OpenOffice Calc

Post by MrProgrammer »

Several topics on the forum show that people want to sort data consisting of a name followed by an integer. For example: Unit4, Unit5, Unit9, Unit15, Unit1, Unit13, Unit2. They want this to sort based on the integer: Unit1, Unit2, Unit4, Unit5, Unit9, Unit13, Unit15. But OpenOffice Calc doesn't understand this method. It sorts the data alphabetically: Unit1, Unit13, Unit15, Unit2, Unit4, Unit5, Unit9. Unit13 and Unit15 sort before Unit2 because "1" comes before "2".

LibreOffice Calc does support this with the option Enable natural sort. However as with many parts of the LibreOffice and OpenOffice documentation, details about how the feature works are lacking. There's no explanation on what natural sort means. A single example is provided. It presents the data A1, A2, A3, …, A19, A20, A21. The end result isn't shown! The documentation just says it's not A1, A19, A2, A20, A21, …, A3.

From the example we can guess that natural sort will sort a letter followed by an integer. And it seems likely that the letter can actually be a word like Unit. But does natural sort support signed integers like -5 or +5? And what about trailing signs like 5-? Can the integer have leading or trailing spaces? What about numbers with decimal fractions like 3.14 or 3,14 or numbers in exponential notation like 1E7? Are any date formats supported? So many questions; no answers. I decline to research and document LibreOffice's natural sort feature.

I have written a StarBasic function, NSKEY, for Calc to provide a natural sort feature for OpenOffice. It supports a text string prefix followed by an unsigned integer. The integer can have leading or trailing spaces. I think this capability for NSKEY will allow it to provide a quick solution in many situations. Suppose the data is in column A. The function can be used in column B to right-justify the integer following the string and pad it with zeros to a specified length. For example Unit1 → Unit001, Unit2 → Unit002, …, Unit10 → Unit010, Unit11 → Unit011. Selecting columns A and B, then sorting by B, puts column A in natural sort order.
[Tutorial] How do I specify the formula for a column?

You will need to install the function in your OpenOffice system by following the instructions in [Tutorial] How to install a code snippet. The comments before the function explain the parameters and what result is returned. The attachment below gives examples of using NSKEY.
NSKEY.ods
Examples for NSKEY function
(18.08 KiB) Downloaded 64 times

Rem Generate keys for natural sort, string with integer (0-padded on left)
Rem      K: The data to convert to a natural sort key
Rem         Should be a text string followed by an unsigned decimal integer
Rem         If an array is passed, all but the first element are ignored
Rem         An empty string followed by an integer is acceptable
Rem         Leading and trailing spaces are ignored
Rem         Spaces between the string and the integer are ignored
Rem         If the integer after the string is missing, it is taken to be zero
Rem         Examples: NSKEY("Item12";5) is "Item00012"   Left pad with zeros
Rem                   NSKEY("Item 9";5) is "Item00009"   Spaces ignored
Rem                   NSKEY("5"     ;3) is "005"         String is null
Rem                   NSKEY("    "  ;2) is "00"          String null, integer 0
Rem                   NSKEY("Foo  " ;3) is "Foo000"      No integer, so 0
Rem                   NSKEY(" Foo"  ;3) is "Foo000"      No integer, so 0
Rem                   NSKEY("5.61"  ;4) is "5.0061"      String is "5."
Rem                   NSKEY("-25"   ;3) is "-025";       String is "-"
Rem                   NSKEY("Foo+2" ;2) is "Foo+02";     String is "Foo+"
Rem         If K is empty, the string is null and the integer is zero
Rem         If K is a number, its absolute value is truncated to an integer
Rem         Numbers > 999999999999997 are taken to be 999999999999997
Rem         Example:  NSKEY(-5.6;5)     is "00005";      String is null
Rem      L: The length for the zero-padded integer
Rem         If an array is passed, all but the first element are ignored
Rem         Lengths less than 1 are treated as 1
Rem         If K is a number, lengths larger than 15 are treated as 15
Rem         If K is a string, lengths larger than 99 are treated as 99
Rem         If the integer is more than L digits, question marks are returned
Rem         Example:  NSKEY("Item314";2) is "Item??"     314 is three digits
Rem      The function always returns a string and never returns an array
Rem      The length of the result = L + length of K's string
Rem V1R1M0 Written 2025-03-01
Function NSKEY(KK As Variant, LL as Variant) As String
Const Var_String = 8 : Const Var_Array = 8192            ' VARTYPE values
Const BigNum = 999999999999997                           ' Largest 15-digit integer
Dim K As Variant : Dim L As Variant                      ' Non-array variables
Dim S As String  : Dim I As String                       ' String and integer
Dim C As String  : Dim P As Long                         ' Character/position in string
K = KK : If VARTYPE(KK) >= Var_Array Then K = KK(1,1)    ' If array, use first element
L = LL : If VARTYPE(LL) >= Var_Array Then L = LL(1,1)    ' If array, use first element
If L < 1  Then L = 1                                     ' Minimum length is 1
S = "" : I = ""                                          ' String and integer defaults
If VARTYPE(K) = Var_String Then                          ' If a string
   If L > 99 Then L = 99                                 ' Maximum length is 99
   K = TRIM(K)                                           ' Ignore leading/trailing spaces
   For P = LEN(K) to 1 Step -1                           ' Get integer loop
      C = MID(K,P,1)                                     ' Get character
      If C < "0" Or C > "9" Then Exit For                ' End of integer
      I = C & I                                          ' Add character at front                
   Next P                                                ' End of integer loop
   S = RTRIM(LEFT(K,P))                                  ' String w/o training spaces
Else                                                     ' If a number
   If L > 15 Then L = 15                                 ' Maximum length is 15
   K = FIX(ABS(K))                                       ' Discard sign and fraction
   If K > BigNum Then K = BigNum                         ' Largest 15-digit integer
   I = FORMAT(K,"0")                                     ' Format integer
End If                                                   ' End number/string test
Select Case True                                         ' Check integer length
   Case LEN(I) > L : I = STRING(L,"?")                   ' Question marks if too long
   Case LEN(I) < L : I = RIGHT(STRING(L,"0")&I,L)        ' Pad with zeros if too short
End Select                                               ' End length check
NSKEY = S & I                                            ' Return string and integer
End Function                                             ' End NSKEY

The Code Snippets forum is not for asking questions. Direct questions to the Macros and UNO API forum.
Mr. Programmer
AOO 4.1.7 Build 9800, MacOS 13.7.5, iMac Intel.   The locale for any menus or Calc formulas in my posts is English (USA).
Locked