[Calc] Split string into an array of single words

Creating a macro - Writing a Script - Using the API (OpenOffice Basic, Python, BeanShell, JavaScript)
Post Reply
jari0471
Posts: 8
Joined: Fri Aug 13, 2010 4:21 am

[Calc] Split string into an array of single words

Post by jari0471 »

Split string into an array of single words (similar to separation of "arguments" in shell command line)

Code: Select all

Sub test
  Dim p As String
  p1="aa bb  cc   dd"
  a1=Split(p1," ")
  Print a1(0)
  Print a1(1)
  Print a1(2)
  Print a1(3)
End Sub
Displayed is "aa", "bb", " " and "cc".
Displayed should be "aa", "bb", "cc" and "dd".

There is 2 spaces between "bb" and "cc" and 3 spaces between "cc" and "dd" in p1.
Is there another command or different option for this command that can do separation of words (arguments)?
If a large file is read to calc sheet, it will take a long time if the splitting function is slow. Thus fast (and simple) operation is requested.
Last edited by jari0471 on Fri Aug 13, 2010 6:59 am, edited 1 time in total.
Open Office 3.2.1
Red Hat Enterprise Linux Client release 5.4 (Tikanga)
User avatar
Charlie Young
Volunteer
Posts: 1559
Joined: Fri May 14, 2010 1:07 am

Re: Split string into an array of single words

Post by Charlie Young »

I get aa bb cc dd running the code copied exactly from above, but where are the strings coming from? If they are in, or could be gotten into spreadsheet cells, you could use Data > Text to Columns.
Apache OpenOffice 4.1.1
Windows XP
jari0471
Posts: 8
Joined: Fri Aug 13, 2010 4:21 am

Re: Split string into an array of single words

Post by jari0471 »

The forum seemed to remove additional "spaces" between the quotes.
Last edited by jari0471 on Fri Aug 13, 2010 7:09 am, edited 1 time in total.
Open Office 3.2.1
Red Hat Enterprise Linux Client release 5.4 (Tikanga)
jari0471
Posts: 8
Joined: Fri Aug 13, 2010 4:21 am

Re: Split string into an array of single words

Post by jari0471 »

There is 2 spaces between "bb" and "cc" and 3 spaces between "cc" and "dd" in p1 (the forum posting seems to remove the additional spaces automatically).
Open Office 3.2.1
Red Hat Enterprise Linux Client release 5.4 (Tikanga)
User avatar
RoryOF
Moderator
Posts: 34610
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: Split string into an array of single words

Post by RoryOF »

jari0471 wrote:There is 2 spaces between "bb" and "cc" and 3 spaces between "cc" and "dd" in p1 (the forum posting seems to remove the additional spaces automatically).
In that case, and in all cases where the layout/spacing is important, you should post using the "Code" option in the Full Editor window. This option is not available in the Quick Reply window - you have to switch to Full Editor.
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
FJCC
Moderator
Posts: 9270
Joined: Sat Nov 08, 2008 8:08 pm
Location: Colorado, USA

Re: Split string into an array of single words

Post by FJCC »

Here is a not-very-elegant solution for the example given

Code: Select all

Sub test
p1="aa bb  cc   dd"
a1=Split(p1," ")
count = 0
Dim a2(ubound(a1))
For Each Word in a1
	If Word <> "" then
		a2(count) = Word
		count = count+1
	end if
next Word
Redim Preserve a2(count)
Print a2(0)
Print a2(1)
Print a2(2)
Print a2(3)
End Sub
OpenOffice 4.1 on Windows 10 and Linux Mint
If your question is answered, please go to your first post, select the Edit button, and add [Solved] to the beginning of the title.
jari0471
Posts: 8
Joined: Fri Aug 13, 2010 4:21 am

Re: Split string into an array of single words

Post by jari0471 »

The solution you give can count the number of "words", but can be come "painfully" slow do to the for-loop, when there is 1000's of strings to work with... (Thus I try find internal function, java -code or any other solution that gives very fast speed also).
Open Office 3.2.1
Red Hat Enterprise Linux Client release 5.4 (Tikanga)
User avatar
RoryOF
Moderator
Posts: 34610
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: Split string into an array of single words

Post by RoryOF »

If you take your strings into a word processor, such as Writer, its Find and Replace can quickly simplify multiple spaces to single spaces, then change them to new lines or other delimiters, so that the file can be reopened in Calc using the CSV import mechanism.
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
jari0471
Posts: 8
Joined: Fri Aug 13, 2010 4:21 am

Re: Split string into an array of single words

Post by jari0471 »

The idea using writer is possible. However the text is read from a file "iFile" with Line Input (that gives single string for a line).
The original file(s) should not be modified, since they are data files.

Code: Select all

  
  iFile = Freefile
  Open sPath+"/"+sFile For Input As iFile

  While (not eof(iFile))

    Line Input #iFile,sValue
    sValue=Trim(sValue)
    If sValue<>"" Then
Maybe there is another way reading the files that can do the splitting the words.
Some of the lines (documentation, etc. other data) in the original file are not intended to be used, thus the read-routine is complicated.
Open Office 3.2.1
Red Hat Enterprise Linux Client release 5.4 (Tikanga)
User avatar
Hagar Delest
Moderator
Posts: 32649
Joined: Sun Oct 07, 2007 9:07 pm
Location: France

Re: Split string into an array of single words

Post by Hagar Delest »

jari0471 wrote:The forum seemed to remove additional "spaces" between the quotes.
Use BBCode (I've edited your first post).
LibreOffice 7.6.2.1 on Xubuntu 23.10 and 7.6.4.1 portable on Windows 10
User avatar
RoryOF
Moderator
Posts: 34610
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: [Calc] Split string into an array of single words

Post by RoryOF »

The idea using writer is possible. However the text is read from a file "iFile" with Line Input (that gives single string for a line).
The original file(s) should not be modified, since they are data files
In that case, you should be working on copies of the files, not on the originals. The Writer path then seems most practical.
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
RPG
Volunteer
Posts: 2250
Joined: Tue Apr 14, 2009 7:15 pm
Location: Netherlands

Re: [Calc] Split string into an array of single words

Post by RPG »

Hello

Maybe you can use the function replace. It is not document but I believe it is normal function. I'm not sure if it is only in the go-oo version or in all version of OOo.
As far I knew it can have six parameters
a) the string
b) The new part
c) the old part
d) where to start
e) How often to change a part
f) Case sensitive

Code: Select all

Sub test
  Dim p As String
  p1="aa bb    cc   dd"
  p1=replace(p1,"  "," ")
  a1=Split(p1," ")
  msgbox  a1(0) & chr(13) & _
   a1(1) & chr(13) & _
   a1(2) & chr(13) & _
   a1(3)
End Sub

Romke
LibreOffice 7.1.4.2 on openSUSE Leap 15.2
B Marcelly
Volunteer
Posts: 1160
Joined: Mon Oct 08, 2007 1:26 am
Location: France, Paris area

Re: [Calc] Split string into an array of single words

Post by B Marcelly »

Hi,
I took this as a recreation challenge :P

Here is a general solution, it works with words separated by any length of space.
It tries to replace a long series of spaces by one space, then tries with a shorter series, etc.
See how powerful is split/join 8-)

Code: Select all

Dim phrase As String
phrase = "hello OOoBasic programmers, this   is       a  solution"

Dim nbSpaces As Long, v As Variant

nbSpaces = 5  ' initial value, greater or equal to 2
Do While nbSpaces > 1
  v = split(phrase, space(nbSpaces))
  if UBound(v()) = 0  then
    nbSpaces = nbSpaces -1
  else
    phrase = join(v, " ")
  end if
Loop
v = split(phrase, " ")
' *** display of vector v() ***
MsgBox(join(v, chr(13)), 0, "List of words")
The initial value of nbSpaces can optimize the number of loops, if you know what length of spaces you may usually have. But even a start value of 2 can do the job.
Bernard

OpenOffice.org 1.1.5 / Apache OpenOffice 4.1.1 / LibreOffice 5.0.5
MS-Windows 7 Home SP1
User avatar
Charlie Young
Volunteer
Posts: 1559
Joined: Fri May 14, 2010 1:07 am

Re: [Calc] Split string into an array of single words

Post by Charlie Young »

OK, here's my attempt, which doesn't use the Basic Split function. I, um, borrowed Bernard's trick for displaying the results.

Code: Select all

Sub TestStrSplit
	Dim a
	Dim s As String
	Dim i As Integer
	
	s = "hello OOoBasic programmers, this   is       another  solution"

	a = StrSplit(s," ")
	MsgBox(join(a,chr(13))
End Sub

Function StrSplit(s As String, SplitChar As String)
	Dim i As Integer
	Dim n As Integer
	Dim m As String
	Dim t As String
	Dim a(0) As String
	Dim InWord As Boolean
		
	t =""
	InWord = False
	i = 1
	n = 0
		
	do while i	<=	len(s)
		m = mid(s,i,1)
		if m <> SplitChar then
			InWord = True
			t = t & m
		else
			if InWord then
				Redim Preserve a(n) As String
				a(n) = t
				n = n + 1
				t = ""
				InWord = False
			endif
		endif
		i = i + 1
	loop
	if InWord then
		Redim Preserve a(n) As String
		a(n) = t
	endif
	
	StrSplit = a
End Function
Apache OpenOffice 4.1.1
Windows XP
jari0471
Posts: 8
Joined: Fri Aug 13, 2010 4:21 am

Re: [Calc] Split string into an array of single words

Post by jari0471 »

Since the original data is in text file (and used line input to place it to a string line by line),
it could be also possible to read the original text file word by word and no split is required at all.
But I could not find how to do this either.
Open Office 3.2.1
Red Hat Enterprise Linux Client release 5.4 (Tikanga)
eddyparkinson
Posts: 10
Joined: Thu Mar 09, 2017 5:08 am

Re: [Calc] Split string into an array of single words

Post by eddyparkinson »

Bug fix - handle blank cells

Code: Select all

REM  *****  BASIC  *****
Sub TestStrSplit_clone
   Dim a
   Dim s As String
   Dim i As Integer
   
   s = "hello OOoBasic programmers, this   is       another  solution"

   a = StrSplit(s," ")
   MsgBox(join(a,chr(13))
   a = StrSplit(Nothing," ")
   MsgBox(join(a,chr(13))
End Sub

Function StrSpliter(s, SplitChar As String)
  StrSpliter = "a"
End Function


Function StrSplit(SplitThis, SplitChar As String)
   Dim s As String

   if IsNull(SplitThis) or IsEmpty(SplitThis) then s = ""  else s =  SplitThis
   Dim i As Integer
   Dim n As Integer
   Dim m As String
   Dim t As String
   Dim a(0) As String
   Dim InWord As Boolean
      
   t =""
   InWord = False
   i = 1
   n = 0
      
   do while i   <=   len(s)
      m = mid(s,i,1)
      if m <> SplitChar then
         InWord = True
         t = t & m
      else
         if InWord then
            Redim Preserve a(n) As String
            a(n) = t
            n = n + 1
            t = ""
            InWord = False
         endif
      endif
      i = i + 1
   loop
   if InWord then
      Redim Preserve a(n) As String
      a(n) = t
   endif
   
   StrSplit = a
End Function
Open Office 4.1.2 - Windows 10
RPG
Volunteer
Posts: 2250
Joined: Tue Apr 14, 2009 7:15 pm
Location: Netherlands

Re: [Calc] Split string into an array of single words

Post by RPG »

Hello

It seems to me that in the last post is not understand that B Marcelly and Charlie Young did play a little with some commands. I think it is also good to make a real short example of how to solve the problem.

In starBASIC there are two commands JOIN and SPLIT who are real powerful as also B Marcelly pointed out. Most of the time those two commands are used with a single character to join or split values. But it is also possible to use more characters then one character, B Marcelly did use it in his example in the split part.

I do prefer the command replace. In the time when this thread started the replace command was not in all version of OOo. It is now in all new version of OpenOffice. I am not sure if it works in all cases but I have test it as much I can. It is also only an example to learn to use REPLACE and SPLIT.

It is me not clear what people want do with the StrSplit function. I want only place here how to use replace and split. When you knew the function replace, split, join then it is more easy to make short and easy code.

Code: Select all

Function StrSplit(SplitThis, SplitChar As String)
dim tempory  ' we define a tempory variable
' Be aware that we use the variabele first for the string value.
' We set it as an empty string
tempory="" ' Make sure we have an empty string.
if typename(SplitThis)="String" then
' The parameter is a string in this case we need a string	
	tempory=replace(SplitThis,"  ", " ") ' erase first all double spaces
' Be aware that that tempory is change from a single variabele to an array variabele
	tempory =SPLIT(tempory,SplitChar) ' Split string to array
end if
StrSplit=tempory ' Give back to the function an empty string or an array as the original function was
End Function
Romke
LibreOffice 7.1.4.2 on openSUSE Leap 15.2
jarkky
Posts: 14
Joined: Sat Oct 12, 2019 5:18 pm

Re: [Calc] Split string into an array of single words

Post by jarkky »

At the end (after 9 years) the below is the latest solution:

Code: Select all

Sub Test_string
  Print parse_string("a b c")(2)
  Print parse_string("")(0)
End Sub

Function parse_string(stri as String) as Array

  lst=Len(stri)
  Dim ssv(30) as String
  Dim ssvr(0) as String

  i=0 : n=0
  Do While i<lst
    Do While i<lst
      f=0
      If Mid(stri,i+1,1)=" " Or Mid(stri,i+1,1)=Chr$(9) Then
        f=1
        i=i+1
      End If
      If f=0 Then
        Exit Do
      End If
    Loop
    If i=lst Then
      Exit Do
    End If
    If Mid(stri,i+1,1)="'" Then
      f=0 : i=i+1
      ssv(n)=""
      Do While i<lst
        If Mid(stri,i+1,1)="'" Then
          f=1 : i=i+1
          Exit Do
        Else
          ssv(n)=ssv(n)+Mid(stri,i+1,1)
        End If
        i=i+1
      Loop
      If f=0 Then
        Print "String has no closing quotation!"
        parse_string="@closing_quotation_missing"
        Exit Function
      End If
      If i<lst Then
        f=0
        If Mid(stri,i+1,1)=" " Or Mid(stri,i+1,1)=Chr$(9) Then
          f=1
        End If
        If f=0 Then
          Print "After closing quotation at column "+Str(i)+" string separator is required!"
          parse_string="@string_sepator_missing"
          Exit Function
        End If
        i=i+1
      End If
      n=n+1
    Else
      ssv(n)=Mid(stri,i+1,1)
      i=i+1
      Do While i<lst
        f=0
        If Mid(stri,i+1,1)=" " Or Mid(stri,i+1,1)=Chr$(9) Then
          f=1
        End If
        If f=0 Then
          If Mid(stri,i+1,1)="'" Then
            Print "Quotation character at column "+Str(i)+" not allowed in middle of string!"
            parse_string="@wrong_quotation_type"
            Exit Function
          End If
          ssv(n)=ssv(n)+Mid(stri,i+1,1)
        End If
        i=i+1
        If f=1 Then
          Exit Do
        End If
      Loop
      n=n+1
    End If
  Loop
  If (n>0) Then
    Redim ssvr(n-1) as String
  End If
  For j=0 To n-1
    ssvr(j)=ssv(j)
  Next j
  parse_string=ssvr
  
End Function


It looks like the "parse_string" command as this is missing in the openoffice macro commands.
Last edited by RusselB on Wed Nov 13, 2019 12:08 am, edited 1 time in total.
Reason: Code tags added
OpenOffice 3.1 on CentOS Linux release 8.0.1905 (Core)
jarkky
Posts: 14
Joined: Sat Oct 12, 2019 5:18 pm

Re: [Calc] Split string into an array of single words

Post by jarkky »

Below is possibly the right method.
To test if string is empty it looks like requiring using ubound&lbound.
Not sure if there is any test command for empty arrays, such as "If v=[] Then ... End If"

Code: Select all

Sub Test_string
  Print parse_string("a b c")(2)
  ev=parse_string(" ")
  Print lbound(ev),":",ubound(ev)
End Sub

Function parse_string(stri as String) as Array

  lst=Len(stri)
  Dim ssv(30) as String
  Dim ssvr() as String

  i=0 : n=0
  Do While i<lst
    Do While i<lst
      f=0
      If Mid(stri,i+1,1)=" " Or Mid(stri,i+1,1)=Chr$(9) Then
        f=1
        i=i+1
      End If
      If f=0 Then
        Exit Do
      End If
    Loop
    If i=lst Then
      Exit Do
    End If
    If Mid(stri,i+1,1)="'" Then
      f=0 : i=i+1
      ssv(n)=""
      Do While i<lst
        If Mid(stri,i+1,1)="'" Then
          f=1 : i=i+1
          Exit Do
        Else
          ssv(n)=ssv(n)+Mid(stri,i+1,1)
        End If
        i=i+1
      Loop
      If f=0 Then
        Print "String has no closing quotation!"
        parse_string="@closing_quotation_missing"
        Exit Function
      End If
      If i<lst Then
        f=0
        If Mid(stri,i+1,1)=" " Or Mid(stri,i+1,1)=Chr$(9) Then
          f=1
        End If
        If f=0 Then
          Print "After closing quotation at column "+Str(i)+" string separator is required!"
          parse_string="@string_sepator_missing"
          Exit Function
        End If
        i=i+1
      End If
      n=n+1
    Else
      ssv(n)=Mid(stri,i+1,1)
      i=i+1
      Do While i<lst
        f=0
        If Mid(stri,i+1,1)=" " Or Mid(stri,i+1,1)=Chr$(9) Then
          f=1
        End If
        If f=0 Then
          If Mid(stri,i+1,1)="'" Then
            Print "Quotation character at column "+Str(i)+" not allowed in middle of string!"
            parse_string="@wrong_quotation_type"
            Exit Function
          End If
          ssv(n)=ssv(n)+Mid(stri,i+1,1)
        End If
        i=i+1
        If f=1 Then
          Exit Do
        End If
      Loop
      n=n+1
    End If
  Loop
  If (n>0) Then
    Redim ssvr(n-1) as String
  End If
  For j=0 To n-1
    ssvr(j)=ssv(j)
  Next j
  parse_string=ssvr
  
End Function
Last edited by RusselB on Wed Nov 13, 2019 12:09 am, edited 1 time in total.
Reason: Code tags added
OpenOffice 3.1 on CentOS Linux release 8.0.1905 (Core)
jarkky
Posts: 14
Joined: Sat Oct 12, 2019 5:18 pm

Re: [Calc] Split string into an array of single words

Post by jarkky »

Using "Redim preserve" at the last part will then help avoid to copy the vector contents into another vector:

Code: Select all

Sub Test_string
  Print parse_string("a b c")(2)
  ev=parse_string(" ")
  Print lbound(ev),":",ubound(ev)
End Sub

Function parse_string(stri as String) as Array

  lst=Len(stri)
  Dim ssv(30) as String
  Dim ssvr() as String

  i=0 : n=0
  Do While i<lst
    Do While i<lst
      f=0
      If Mid(stri,i+1,1)=" " Or Mid(stri,i+1,1)=Chr$(9) Then
        f=1
        i=i+1
      End If
      If f=0 Then
        Exit Do
      End If
    Loop
    If i=lst Then
      Exit Do
    End If
    If Mid(stri,i+1,1)="'" Then
      f=0 : i=i+1
      ssv(n)=""
      Do While i<lst
        If Mid(stri,i+1,1)="'" Then
          f=1 : i=i+1
          Exit Do
        Else
          ssv(n)=ssv(n)+Mid(stri,i+1,1)
        End If
        i=i+1
      Loop
      If f=0 Then
        Print "String has no closing quotation!"
        parse_string="@closing_quotation_missing"
        Exit Function
      End If
      If i<lst Then
        f=0
        If Mid(stri,i+1,1)=" " Or Mid(stri,i+1,1)=Chr$(9) Then
          f=1
        End If
        If f=0 Then
          Print "After closing quotation at column "+Str(i)+" string separator is required!"
          parse_string="@string_sepator_missing"
          Exit Function
        End If
        i=i+1
      End If
      n=n+1
    Else
      ssv(n)=Mid(stri,i+1,1)
      i=i+1
      Do While i<lst
        f=0
        If Mid(stri,i+1,1)=" " Or Mid(stri,i+1,1)=Chr$(9) Then
          f=1
        End If
        If f=0 Then
          If Mid(stri,i+1,1)="'" Then
            Print "Quotation character at column "+Str(i)+" not allowed in middle of string!"
            parse_string="@wrong_quotation_type"
            Exit Function
          End If
          ssv(n)=ssv(n)+Mid(stri,i+1,1)
        End If
        i=i+1
        If f=1 Then
          Exit Do
        End If
      Loop
      n=n+1
    End If
  Loop
  If (n>0) Then
    Redim preserve ssv(n-1) as String
    parse_string=ssv
  Else
    parse_string=ssvr
  End If
  
End Function
Last edited by RusselB on Wed Nov 13, 2019 12:09 am, edited 1 time in total.
Reason: Code tags added
OpenOffice 3.1 on CentOS Linux release 8.0.1905 (Core)
jarkky
Posts: 14
Joined: Sat Oct 12, 2019 5:18 pm

Re: [Calc] Split string into an array of single words

Post by jarkky »

I have tried to check more and thinking if function error return should occur with global variables (to return full error descriptions).
This case if openoffice had global global variable for error return for each command I could use those same variable here. But this may require more checking...
Also if the Mid()-function index starts at index=1 rather than at index=0 can be problem as well as that redim can not make empty strings.

Code: Select all

Sub Test_string
  Print parse_string("a b c")(2)
  Print erf
  ev=parse_string(" ")
  Print lbound(ev),":",ubound(ev)
  Print erf
  ev=parse_string("a b 'c")
  Print erf
  If erf=True Then
    Print ers
  End If
End Sub

REM Function error return is defined at the global variable level.
REM This could be possible method for all the functions.
REM erf : Flag TRUE/FALSE (error occur or not)
REM ers : Error name as string
REM erm : Error message as string
Global erf as Boolean
Global ers as String
Global erm as String

Function parse_string(stri as String) as Array

  lst=Len(stri)
  Dim ssv(30) as String
  Dim ssvr() as String

  i=0 : n=0
  Do While i<lst
    Do While i<lst
      f=0
      REM The Mid()-command requires i+1 argument when i starts from zero. Can this be problem?
      If Mid(stri,i+1,1)=" " Or Mid(stri,i+1,1)=Chr$(9) Then
        f=1
        i=i+1
      End If
      If f=0 Then
        Exit Do
      End If
    Loop
    If i=lst Then
      Exit Do
    End If
    If Mid(stri,i+1,1)="'" Then
      f=0 : i=i+1
      ssv(n)=""
      Do While i<lst
        If Mid(stri,i+1,1)="'" Then
          f=1 : i=i+1
          Exit Do
        Else
          ssv(n)=ssv(n)+Mid(stri,i+1,1)
        End If
        i=i+1
      Loop
      If f=0 Then
        erm="String has no closing quotation!"
        ers="@closing_quotation_missing"
        erf=TRUE
        parse_string=ssvr
        Print erm
        Exit Function
      End If
      If i<lst Then
        f=0
        If Mid(stri,i+1,1)=" " Or Mid(stri,i+1,1)=Chr$(9) Then
          f=1
        End If
        If f=0 Then
          erm="After closing quotation at column "+Str(i)+" string separator is required!"
          ers="@string_sepator_missing"
          erf=TRUE
          parse_string=ssvr
          Print erm
          Exit Function
        End If
        i=i+1
      End If
      n=n+1
    Else
      ssv(n)=Mid(stri,i+1,1)
      i=i+1
      Do While i<lst
        f=0
        If Mid(stri,i+1,1)=" " Or Mid(stri,i+1,1)=Chr$(9) Then
          f=1
        End If
        If f=0 Then
          If Mid(stri,i+1,1)="'" Then
            erm="Quotation character at column "+Str(i)+" not allowed in middle of string!"
            ers="@wrong_quotation_placement"
            erf=TRUE
            parse_string=ssvr
            Print erm
            Exit Function
          End If
          ssv(n)=ssv(n)+Mid(stri,i+1,1)
        End If
        i=i+1
        If f=1 Then
          Exit Do
        End If
      Loop
      n=n+1
    End If
  Loop
  REM redim does not seem to operate with negative value (to make empty string)!
  If (n>-1) Then
    Redim preserve ssv(n-1) as String
    parse_string=ssv
  Else
    parse_string=ssvr
  End If
  erm=""
  ers=""
  erf=FALSE
End Function
Last edited by RusselB on Wed Nov 13, 2019 12:10 am, edited 1 time in total.
Reason: Code tags added
OpenOffice 3.1 on CentOS Linux release 8.0.1905 (Core)
jarkky
Posts: 14
Joined: Sat Oct 12, 2019 5:18 pm

Re: [Calc] Split string into an array of single words

Post by jarkky »

The error handling in the user functions would be probably using different variables that as "erl" & "err" by openoffice own functions:

Code: Select all

Sub Test_String
  Print parse_string("a b c")(2)
  Print erf
  ev=parse_string(" ")
  Print lbound(ev),":",ubound(ev)
  Print erf
  ev=parse_string("a b 'c")
  Print erf
  If erf=True Then
    Print ers
  End If
End Sub

REM Function error return is defined at the global variable level.
REM This could be possible method for all the functions.
REM The user function can not stop into the error automatically or detect error line number.
REM Errors must be checked from the erf-flag. erf is operated by all commands, while
REM erc, ers and erm are filled in the function only in case erf=TRUE
REM If erc, ers and erm would be arrays, then all the errors in the program could be logged
REM into memory or a file for later inspection. Also I can not fill the "erl" and "err"
REM variables for user defined functions. That is if I have my own error message outside
REM the openoffice errors the program will not halt for that case.
REM Error handling should be available for the user defined functions also...
REM erf : Flag TRUE/FALSE (error occurred in last command or not)
REM erc : Command that produced the error
REM ers : Error name as string
REM erm : Error message as string
Global erf as Boolean
Global erc as String
Global ers as String
Global erm as String
REM The above parameters could be initialized globally, but how?

Function parse_string(stri as String) as Array

  lst=Len(stri)
  Dim ssv(30) as String
  Dim ssvr() as String

  i=1 : n=0
  Do While i<=lst
    Do While i<=lst
      f=0
      If Mid(stri,i,1)=" " Or Mid(stri,i,1)=Chr$(9) Then
        f=1
        i=i+1
      End If
      If f=0 Then
        Exit Do
      End If
    Loop
    If i>lst Then
      Exit Do
    End If
    If Mid(stri,i,1)="'" Then
      f=0 : i=i+1
      ssv(n)=""
      Do While i<=lst
        If Mid(stri,i,1)="'" Then
          f=1 : i=i+1
          Exit Do
        Else
          ssv(n)=ssv(n)+Mid(stri,i,1)
        End If
        i=i+1
      Loop
      If f=0 Then
        erc="parse_string()"
        erm="String has no closing quotation!"
        ers="@closing_quotation_missing"
        erf=TRUE
        parse_string=ssvr
        Print erm
        Exit Function
      End If
      If i<=lst Then
        f=0
        If Mid(stri,i,1)=" " Or Mid(stri,i,1)=Chr$(9) Then
          f=1
        End If
        If f=0 Then
          erc="parse_string()"
          erm="After closing quotation at column "+Str(i)+" string separator is required!"
          ers="@string_sepator_missing"
          erf=TRUE
          parse_string=ssvr
          Print erm
          Exit Function
        End If
        i=i+1
      End If
      n=n+1
    Else
      ssv(n)=Mid(stri,i,1)
      i=i+1
      Do While i<=lst
        f=0
        If Mid(stri,i,1)=" " Or Mid(stri,i,1)=Chr$(9) Then
          f=1
        End If
        If f=0 Then
          If Mid(stri,i,1)="'" Then
            erc="parse_string()"
            erm="Quotation character at column "+Str(i)+" not allowed in middle of string!"
            ers="@wrong_quotation_placement"
            erf=TRUE
            parse_string=ssvr
            Print erm
            Exit Function
          End If
          ssv(n)=ssv(n)+Mid(stri,i,1)
        End If
        i=i+1
        If f=1 Then
          Exit Do
        End If
      Loop
      n=n+1
    End If
  Loop
  REM redim does not seem to operate with negative value (to make empty string)!
  If (n>0) Then
    Redim preserve ssv(n-1) as String
    parse_string=ssv
  Else
    parse_string=ssvr
  End If
  erf=FALSE
End Function
Last edited by RusselB on Wed Nov 13, 2019 12:12 am, edited 1 time in total.
Reason: Code tags added
OpenOffice 3.1 on CentOS Linux release 8.0.1905 (Core)
jarkky
Posts: 14
Joined: Sat Oct 12, 2019 5:18 pm

Re: [Calc] Split string into an array of single words

Post by jarkky »

It looks like the "split string" could work as below:

Code: Select all

Sub Test_String
  Print parse_string("a b c")(2)
  Print erf
  ev=parse_string(" ")
  Print lbound(ev),":",ubound(ev)
  Print erf
  ev=parse_string("a b 'c")
  Print erf
  If erf=True Then
    Print ers
  End If
End Sub

REM Function error return is defined at the global variable level.
REM This could be possible method for all the functions.
REM The user function can not stop into the error automatically or detect error line number.
REM Errors must be checked from the erf-flag. erf is operated by all commands, while
REM erc, ers and erm are filled in the function only in case erf=TRUE
REM If erc, ers and erm would be arrays, then all the errors in the program could be logged
REM into memory or a file for later inspection. Also I can not fill the "erl" and "err"
REM variables for user defined functions. That is if I have my own error message outside
REM the openoffice errors the program will not halt for that case.
REM Error handling should be available for the user defined functions also...
REM erf : Flag TRUE/FALSE (error occurred in last command or not)
REM erc : Command that produced the error
REM ers : Error name as string
REM erm : Error message as string
Global erf as Boolean
Global erc as String
Global ers as String
Global erm as String
REM The above parameters could be initialized globally, but how?

Function parse_string(stri as String) as Array

  lst=Len(stri)
  Dim ssv(30) as String
  Dim ssvr() as String

  i=1 : n=0
  Do While i<=lst
    Do While (Mid(stri,i,1)=" " Or Mid(stri,i,1)=Chr$(9))
      i=i+1
      If i>lst Then
        Exit Do
      End If
    Loop
    If i>lst Then
      Exit Do
    End If
    If Mid(stri,i,1)="'" Then
      f=0 : i=i+1
      ssv(n)=""
      Do While i<=lst
        If Mid(stri,i,1)="'" Then
          f=1 : i=i+1
          Exit Do
        Else
          ssv(n)=ssv(n)+Mid(stri,i,1)
        End If
        i=i+1
      Loop
      If f=0 Then
        erc="parse_string()"
        erm="String has no closing quotation!"
        ers="@closing_quotation_missing"
        erf=TRUE
        parse_string=ssvr
        Print erm
        Exit Function
      End If
      If i<=lst Then
        If Not(Mid(stri,i,1)=" " Or Mid(stri,i,1)=Chr$(9)) Then
          erc="parse_string()"
          erm="After closing quotation at column "+Str(i)+" string separator is required!"
          ers="@string_sepator_missing"
          erf=TRUE
          parse_string=ssvr
          Print erm
          Exit Function
        End If
        i=i+1
      End If
    Else
      ssv(n)=Mid(stri,i,1)
      i=i+1
      Do While i<=lst
        If Mid(stri,i,1)=" " Or Mid(stri,i,1)=Chr$(9) Then
          i=i+1
          Exit Do
        End If
        If Mid(stri,i,1)="'" Then
          erc="parse_string()"
          erm="Quotation character at column "+Str(i)+" not allowed in middle of string!"
          ers="@wrong_quotation_placement"
          erf=TRUE
          parse_string=ssvr
          Print erm
          Exit Function
        End If
        ssv(n)=ssv(n)+Mid(stri,i,1)
        i=i+1
      Loop
    End If
    n=n+1
  Loop
  REM redim does not seem to operate with negative value (to make empty string)!
  If (n>0) Then
    Redim preserve ssv(n-1) as String
    parse_string=ssv
  Else
    parse_string=ssvr
  End If
  erf=FALSE
End Function

Not sure of the function error handling and also it looks like the "Redim" can not make empty array...
Last edited by RusselB on Wed Nov 13, 2019 12:12 am, edited 1 time in total.
Reason: Code tags added
OpenOffice 3.1 on CentOS Linux release 8.0.1905 (Core)
JeJe
Volunteer
Posts: 2777
Joined: Wed Mar 09, 2016 2:40 pm

Re: [Calc] Split string into an array of single words

Post by JeJe »

Getting rid of suplus spaces in the initial post can be done easily. Turning a string into a byte array *should* make things faster than using mid...

Code: Select all



Sub test
	Dim p As String
	p1="aa bb  cc   dd"

	dim b() as byte, c as long,start as boolean
	b() = p1: c =-2

	for i = 0 to ubound(b) step 2
		if b(i) = 32 and b(i+1) = 0 then
			if start = true then
				c= c+2
				b(c) =b(i)
				b(c+1) =0
				start = false
			end if
		else
			c= c+2
			b(c) =b(i)
			b(c+1) =b(i+1)
			start = true
		end if
	next
	redim preserve b(c)
	p1= b

	a1=Split(p1," ")
	Print a1(0)
	Print a1(1)
	Print a1(2)
	Print a1(3)

End Sub


Windows 10, Openoffice 4.1.11, LibreOffice 7.4.0.3 (x64)
jarkky
Posts: 14
Joined: Sat Oct 12, 2019 5:18 pm

Re: [Calc] Split string into an array of single words

Post by jarkky »

That looks possible, but I extended to remove the "tabular" or "\t" also:

Code: Select all

Sub Test_String
  Print parse_string("""a b c"" ""\"""" c")(1)
  Print parse_string("a b c")(2)
  Print erf
  ev=parse_string(" ")
  Print lbound(ev),":",ubound(ev)
  Print erf
  ev=parse_string("a b 'c")
  Print erf
  If erf=True Then
    Print ers
  End If
End Sub

REM Function error return is defined at the global variable level.
REM This could be possible method for all the functions.
REM The user function can not stop into the error automatically or detect error line number.
REM Errors must be checked from the erf-flag. erf is operated by all commands, while
REM erc, ers and erm are filled in the function only in case erf=TRUE
REM If erc, ers and erm would be arrays, then all the errors in the program could be logged
REM into memory or a file for later inspection. Also I can not fill the "erl" and "err"
REM variables for user defined functions. That is if I have my own error message outside
REM the openoffice errors the program will not halt for that case.
REM Error handling should be available for the user defined functions also...
REM erf : Flag TRUE/FALSE (error occurred in last command or not)
REM erc : Command that produced the error
REM ers : Error name as string
REM erm : Error message as string
Global erf as Boolean
Global erc as String
Global ers as String
Global erm as String
REM The above parameters could be initialized globally, but how?

Function parse_string(stri as String) as Array
  If Not (TypeName(stri)="String") Then
    erc="parse_string()"
    erm="Input argument can be only string type!"
    ers="@wrong_argument_type"
    erf=TRUE
    parse_string=""
    Print erm
    Exit Function    
  End If
  lst=Len(stri)
  Dim ssv(30) as String
  Dim ssvr() as String
  i=1 : n=0
  Do While i<=lst
    Do While (Mid(stri,i,1)=" " Or Mid(stri,i,1)=Chr$(9))
      i=i+1
      If i>lst Then
        Exit Do
      End If
    Loop
    If i>lst Then
      Exit Do
    End If
    If Mid(stri,i,1)="""" Then
      f=0 : i=i+1
      ssv(n)=""
      Do While i<=lst
        If Mid(stri,i,1)="""" Then
          If Mid(stri,i-1,1)="\" Then
            ssv(n)=Left(ssv(n),Len(ssv(n))-1)
          Else
            f=1 : i=i+1
            Exit Do
          End If
        End If
        ssv(n)=ssv(n)+Mid(stri,i,1)
        i=i+1
      Loop
      If f=0 Then
        erc="parse_string()"
        erm="String has no closing quotation!"
        ers="@closing_quotation_missing"
        erf=TRUE
        parse_string=ssvr
        Print erm
        Exit Function
      End If
      If i<=lst Then
        If Not(Mid(stri,i,1)=" " Or Mid(stri,i,1)=Chr$(9)) Then
          erc="parse_string()"
          erm="After closing quotation at column "+Str(i)+" string separator is required!"
          ers="@string_sepator_missing"
          erf=TRUE
          parse_string=ssvr
          Print erm
          Exit Function
        End If
        i=i+1
      End If
    Else
      ssv(n)=Mid(stri,i,1)
      i=i+1
      Do While i<=lst
        If Mid(stri,i,1)=" " Or Mid(stri,i,1)=Chr$(9) Then
          i=i+1
          Exit Do
        End If
        If Mid(stri,i,1)="""" Then
          If Mid(stri,i-1,1)="\" Then
            ssv(n)=Left(ssv(n),Len(ssv(n))-1)
          Else
            erc="parse_string()"
            erm="Quotation character at column "+Str(i)+" not allowed in middle of string!"
            ers="@wrong_quotation_placement"
            erf=TRUE
            parse_string=ssvr
            Print erm
            Exit Function
          End If
        End If
        ssv(n)=ssv(n)+Mid(stri,i,1)
        i=i+1
      Loop
    End If
    n=n+1
  Loop
  REM redim does not seem to operate with negative value (to make empty string)!
  If (n>0) Then
    Redim preserve ssv(n-1) as String
    parse_string=ssv
  Else
    parse_string=ssvr
  End If
  erf=FALSE
End Function

Here also even input arguments are defined as "String" -type the User should check that they come as "string"-type into the function...
 Edit: Please use the CODE tags around program code for readability and easy copying 
OpenOffice 3.1 on CentOS Linux release 8.0.1905 (Core)
KaduLeite
Posts: 5
Joined: Fri Apr 06, 2018 8:35 pm

Re: [Calc] Split string into an array of single words

Post by KaduLeite »

jari0471 wrote:Split string into an array of single words (similar to separation of "arguments" in shell command line)

Code: Select all

Sub test
  Dim p As String
  p1="aa bb  cc   dd"
  a1=Split(p1," ")
  Print a1(0)
  Print a1(1)
  Print a1(2)
  Print a1(3)
End Sub
Displayed is "aa", "bb", " " and "cc".
Displayed should be "aa", "bb", "cc" and "dd".

There is 2 spaces between "bb" and "cc" and 3 spaces between "cc" and "dd" in p1.
Is there another command or different option for this command that can do separation of words (arguments)?
If a large file is read to calc sheet, it will take a long time if the splitting function is slow. Thus fast (and simple) operation is requested.

As an exercise, I think this solve the problem with a smaller and simpler code.
To ignore other elements, like points or commas, just add them to the test with AND.

Code: Select all

Function SpaceSplit(p1 as String) as Array
  Dim i as Integer, a1(), a2()
  a1=Split(Trim(p1)," ")
  For i = 0 to Ubound(a1) 'For each element in a1
    If a1(i)<>"" Then 'If element is different to ""
    	Redim Preserve a2(UBound(a2)+1) 'Increases the size of a2 in 1 element
    	a2(UBound(a2))=a1(i) 'Add the non "" element to a2
    End If
  Next
  SpaceSplit = a2 ' Function returns a2, an array with each word
End Function


sub test
	dim s1()
	s1 = spacesplit("aa      bb   cc    dd    ee")
end sub
Kadu Leite
LibreOffice 5.2 on Windows 10 / LibreOffice 6.2 on Ubuntu 19.10
JeJe
Volunteer
Posts: 2777
Joined: Wed Mar 09, 2016 2:40 pm

Re: [Calc] Split string into an array of single words

Post by JeJe »

Redim Preserve is an expensive operation to have in a loop... that doesn't matter with just a few items of course...
Windows 10, Openoffice 4.1.11, LibreOffice 7.4.0.3 (x64)
KaduLeite
Posts: 5
Joined: Fri Apr 06, 2018 8:35 pm

Re: [Calc] Split string into an array of single words

Post by KaduLeite »

JeJe wrote:Redim Preserve is an expensive operation to have in a loop... that doesn't matter with just a few items of course...
Je Je,

You're absolutely right.
What about that?

Code: Select all

Function SpaceSplit(p1 as String) as Array
  Dim i as Integer, j as Integer, a1()
  a1=Split(Trim(p1)," ")
  Dim a2(Ubound(a1))
  For i = 0 to Ubound(a1)
    If a1(i)<>"" Then
    	a2(j)=a1(i)
    	j = j+1
    End If
  Next
  Redim Preserve a2(j-1)
  SpaceSplit = a2
End Function
It uses the same logic, but writes to an array with the same size as a1 and just when a2 is OK, it redim preserve with the correct size.

Kadu Leite
LibreOffice 5.2 on Windows 10 / LibreOffice 6.2 on Ubuntu 19.10
JeJe
Volunteer
Posts: 2777
Joined: Wed Mar 09, 2016 2:40 pm

Re: [Calc] Split string into an array of single words

Post by JeJe »

Yeah, its much better to redim outside the loop once at the end.

If anyone's interested in these things... OOBasic is very similar to Microsoft's now obselete Visual Basic 6 (in the non-office suite aspects)... and there is an old but very good (and still existing!) site which looked at the fastest ways to code various functions in Visual Basic...

http://xbeat.net/vbspeed/
Windows 10, Openoffice 4.1.11, LibreOffice 7.4.0.3 (x64)
Post Reply