Cursor word empty line

Creating a macro - Writing a Script - Using the API

Cursor word empty line

Postby BubikolRamios » Tue Mar 06, 2018 4:23 pm

Reads (should) word by word ...

does not msg out 'de'
does message out '.' multiple times where it finds it only once
does not get paste empty line to 'Beispiel:'

Suggestions ?



sample text:
Exkursionsflora der Alpen
und angrenzender Gebiete
© Dr. Thomas Götz, Singen (letzte Änderung: 18.04.2017)
Weitere Infos zum Projekt:
http: //www .tkgoetz .homepage .t-online. de/alpenflorahome.html Fehler und Anmerkungen bitte an: tk.goetz@t-online .de

Beispiel:



Code: Select all   Expand viewCollapse view
Dim oDoc As Object
Dim Proceed As Boolean
Dim Cursor As Object

oDoc = ThisComponent
Cursor = oDoc.Text.createTextCursor()
Cursor.gotoStart (false)'start of doc
Cursor.gotoEndOfWord(True)

do

MsgBox Cursor.String
Cursor.gotoNextWord (false)
Proceed = Cursor.gotoEndOfWord(True)

Loop While Proceed
OPen office 4.1.5/ win 7
BubikolRamios
 
Posts: 39
Joined: Sat Jan 04, 2014 1:28 pm

Re: Cursor word empty line

Postby FJCC » Tue Mar 06, 2018 4:56 pm

The unusual behavior I see is caused by the spaces in the middle of
www .tkgoetz .homepage .t-online. de
There is a space before .tkgoetz and .homepage and .t-online and de. The search gets confused and returns the . character. There is a similar space in
tk.goetz@t-online .de
before the .de. If these spaces are removed the search works as expected, as far as I can see. Apparently, the code for gotoNextWord() does not know what to do if the next word starts with a period.
AOO 3.4 or 4.1 on MS Windows XP ( before 2013-08-03) or Windows 7
If your question is answered, please go to your first post, select the Edit button, and add [Solved] to the beginning of the title.
FJCC
Moderator
 
Posts: 6704
Joined: Sat Nov 08, 2008 8:08 pm
Location: Colorado, USA

Re: Cursor word empty line

Postby RoryOF » Tue Mar 06, 2018 5:00 pm

The API says
gotoNextWord
boolean
gotoNextWord( [in] boolean bExpand );

Description
moves the cursor to the next word.

Note: the function returning true does not necessarily mean that the cursor is located at the next word, or any word at all! This may happen for example if it travels over empty paragraphs.
Returns
true if the cursor was moved. It returns false it the cursor can not advance further.


The OP will have to examine the value at the location and discard it if it is a ".", then continue with another gotoNextWord()
Apache OpenOffice 4.1.5 on Xubuntu 18.04 (mostly 64 bit version) and infrequently on Win2K/XP
14 October 2016 was Pooh's 90th birthday
User avatar
RoryOF
Moderator
 
Posts: 27184
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: Cursor word empty line

Postby JeJe » Tue Mar 06, 2018 9:51 pm

Openoffice counts a lot of punctuation as words... its terrible and completely unreliable as far as what a word is. You can see this by looking at the word counts it gives. LibreOffice does a much better job.
Openoffice 4.1.2
Windows 8
JeJe
 
Posts: 237
Joined: Wed Mar 09, 2016 2:40 pm

Re: Cursor word empty line

Postby BubikolRamios » Wed Mar 07, 2018 9:47 am

even skiping '.' does not get paste empty line, sample text
a b

c

will not msg out ''c'

looking at this: https://wiki.openoffice.org/wiki/Documentation/BASIC_Guide/Editing_Text_Documents#Navigating_within_a_Text
Since isEndOfParagraph () returns true on each line end (wrong I think), figured i could use gotoNextParagraph when it fails going to next word.


fails with: sub-procedure or function procedure not defined (compiles anyway as far as I can see)


Code: Select all   Expand viewCollapse view
oDoc = ThisComponent
'oText = oDoc.getText()
Cursor = oDoc.Text.createTextCursor()
Cursor.gotoStart (false)'start of doc
Cursor.gotoEndOfWord(True)
Proceed = true
do


if Cursor.String = "." then
'do nothing, skip
else
MsgBox Cursor.String
end if
'if Cursor.isEndOfParagraph () then
'Proceed = Cursor.gotoNextParagraph (false)
'Proceed = Cursor.gotoEndOfWord(True)
'else
Cursor.gotoNextWord (false)
if not Cursor.gotoEndOfWord(True) then
Proceed = gotoNextParagraph(false)
end if

'end if




Loop While Proceed
OPen office 4.1.5/ win 7
BubikolRamios
 
Posts: 39
Joined: Sat Jan 04, 2014 1:28 pm

Re: Cursor word empty line

Postby BubikolRamios » Wed Mar 07, 2018 10:15 am

Managed to go paste empty line with this
Code: Select all   Expand viewCollapse view
Cursor.gotoNextWord (false)
if not Cursor.gotoEndOfWord(True) then
  Proceed = Cursor.gotoNextParagraph(true)
  Cursor.gotoEndOfWord(True)
end if

OPen office 4.1.5/ win 7
BubikolRamios
 
Posts: 39
Joined: Sat Jan 04, 2014 1:28 pm

Re: Cursor word empty line

Postby BubikolRamios » Wed Mar 07, 2018 10:37 pm

also, sample text:
Zier-


Code: Select all   Expand viewCollapse view
Cursor.gotoStart (false)'start of doc
Cursor.gotoEndOfWord(True)
MsgBox Cursor.String'--> empty string



If one removes '-' from 'Zier-'
it works ok.


Is this a bug ?
OPen office 4.1.5/ win 7
BubikolRamios
 
Posts: 39
Joined: Sat Jan 04, 2014 1:28 pm

Re: Cursor word empty line

Postby JeJe » Wed Mar 07, 2018 10:57 pm

In your first code change to:


Code: Select all   Expand viewCollapse view
do
MsgBox Cursor.String
proceed = Cursor.gotoNextWord (false)
Cursor.gotoEndOfWord(True)
Loop While Proceed
End Sub
Openoffice 4.1.2
Windows 8
JeJe
 
Posts: 237
Joined: Wed Mar 09, 2016 2:40 pm

Re: Cursor word empty line

Postby UnklDonald418 » Fri Mar 09, 2018 1:31 am

In two of his documents Andrew Pintonyak mentions that goToNextWord() has bugs going as far back as OO version 1.1 but he didn't elaborate.
A quick check on Bugzilla didn't turn up anything.
For what it's worth here is what I came up with

Code: Select all   Expand viewCollapse view
REM  *****  BASIC  *****

Sub Main
Dim oDoc As Object
Dim Proceed As Boolean
Dim Cursor As Object
Dim words as string

   oDoc = ThisComponent
   Cursor = oDoc.Text.createTextCursor()
   Cursor.gotoStart (false)'start of doc
   Cursor.gotoEndOfWord(True)
   words = ""
do
   str1 = Cursor.String
REM goToNextWord() mis-handles words beginning with "."
REM  this bit corrects that behavior
   if str1 = "." then   
     Cursor.gotoPreviousWord(False)
     Cursor.goRight(2, False)
     Cursor.gotoEndOfWord(False)
     Cursor.gotoStartOfWord(True)
     str1 = "." & Cursor.String
   endif
REM now continue on building list
   words = words & CHR$(10) & str1
   Proceed =  Cursor.gotoNextWord (False)
REM put a marker in the list when starting a new paragraph
     If Cursor.IsStartOfParagraph Then   'isStartOfParagraph
        words = words & CHR$(10) & "**** New Paragraph ****"
   End If
REM check if all the words are on the list   
   If Cursor.gotoEndOfWord(True) = False then
     Proceed = False
   end if
Loop While Proceed
  MsgBox words, 0,  "Word List"
End Sub


 Edit: Apparently I need new glasses. When I looked at my results this morning I noticed the the code above miss-handles a stand alone “.” 
If your problem has been solved, please edit this topic's initial post and add "[Solved]" to the beginning of the subject line
Apache OpenOffice 4.1.5 & LibreOffice 6.0.4.2 - Windows 10 Professional
UnklDonald418
Volunteer
 
Posts: 930
Joined: Wed Jun 24, 2015 12:56 am
Location: Colorado, USA

Re: Cursor word empty line

Postby JeJe » Fri Mar 09, 2018 11:42 pm

gotoEndOfWord doesn't move the cursor when there are two paragraph marks. Its the design of the function. It remains in the original position as its not at the end of a word when there are two paragraph or two line feek marks.

gotoEndOfWord
boolean
gotoEndOfWord( [in] boolean bExpand );

Description
moves the cursor to the end of the current word.
Returns
true if the cursor is now at the end of a word, false otherwise. If false was returned the cursor will remain at its original position.


https://www.openoffice.org/api/docs/com ... oEndOfWord

you need to use gotoNextWord and remove the punctuation etc

Code: Select all   Expand viewCollapse view
do
MsgBox Cursor.String
cursor.collapsetoend
proceed=Cursor.gotoNextWord (true)
Loop While Proceed
Openoffice 4.1.2
Windows 8
JeJe
 
Posts: 237
Joined: Wed Mar 09, 2016 2:40 pm

Re: Cursor word empty line

Postby JeJe » Sat Mar 10, 2018 2:46 am

gotoNextWord gets thrown by periods as well - and I've played around with Xbreakiterator and that doesn't seem to work properly either.
Paragraph text seems to return okay so the alternative is to write your own code for each paragraph.

Code: Select all   Expand viewCollapse view
Sub EnumerateParagraphs
   REM Author: Andrew Pitonyak
   Dim oParEnum 'Enumerator used to enumerate the paragraphs
   Dim oPar 'The enumerated paragraph
   REM Enumerate the paragraphs.
   REM Tables are enumerated along with paragraphs
   oParEnum = ThisComponent.getText().createEnumeration()
   Do While oParEnum.hasMoreElements()
      oPar = oParEnum.nextElement()
      REM This avoids the tables. Add an else statement if you want to
      REM process the tables.
      If oPar.supportsService("com.sun.star.text.Paragraph") Then
         'MsgBox oPar.getString(), 0, "I found a paragraph"
         getwordsinpara(oPar.getstring)

      ElseIf oPar.supportsService("com.sun.star.text.TextTable") Then
         'Print "I found a TextTable"
      Else
         'Print "What did I find?"
      End If
   Loop
End Sub


sub getwordsinPara(txt as string) as long 'author - me
   dim punct as string,i as long,c as string,wd as string,lenwd as long


   punct = " .;:!?(){}[]\/<>,*@" & Chr(34) & Chr(10)

   for i = 1 to len(txt)
      c = Mid(txt, i, 1)
      If InStr(1, punct, c) <> 0 Then
         if wd <>"" then

'            if isnumeric(wd) =false then

               msgbox wd
               '
'            end if
         end if
      
         wd = ""
         lenwd=0
      Else
         lenwd = lenwd+1
         wd =  wd & c
      End If
   next


         if wd <>"" then

'            if isnumeric(wd) =false then

               msgbox wd
               '
'            end if
         end if


end sub

Openoffice 4.1.2
Windows 8
JeJe
 
Posts: 237
Joined: Wed Mar 09, 2016 2:40 pm

Re: Cursor word empty line

Postby Lupp » Sat Mar 10, 2018 6:06 pm

What did I miss completely?
Aren't words what is left-delimited by a word boundary and starting with a word character then? RegEx search should find that. Stubbornly included spaces (fighting spam?) you can't eliminate without either natural or (that's a joke:) artificial intelligence. And anyway: Who did ever define the counting of "words" for a technical construct like an URL following its very specific syntax?

F_R_ForCountingWords.png

If I do the F&R with "Replace All" as shown above, I get the words counted, either for all the searchable text, or for the current selection if the option is enabled. As far as I can judge the results are good. The sample text starting with "Exkursionsflora" gives me 35 words that way, and that's exactly what I would count using NI. There is, of course, the single "t" of "t-online" counted as a word, but that's the fault of a company using a silly name ("fake syntax"?).

A ViewCursor or a TextCursor ordered "gotoEndOfWord" may not apply the same criteria as it may be expected to work wrap-oriented or the like. I don't know enough. However the results are obviously different, and the statistics done for the document properties are not reliable. (Not in AOO and also not in LibO.)

Editing:
To avoid misunderstandings about what I meant, I attach a bit of code, too:
Code: Select all   Expand viewCollapse view
Sub test()
Print wordCount()
End Sub

Function wordCount(Optional pSearchable)
If IsMissing(pSearchable) Then pSearchable = ThisComponent
doc0 = pSearchable
tRD = doc0.CreateReplaceDescriptor
tRD.SearchRegularExpression = True
tRD.SearchString = "\b\w+\b"
found = doc0.FindAll(tRD)
wordCount = found.Count
For j = 0 To found.Count - 1
  oneWord = found(j).String
  Print oneWord ' Now every found word is displayed one by one
Next j
End Function
On Windows 10: LibreOffice 6.1 and older versions, PortableOpenOffice 4.1.5 and older, StarOffice 5.2
---
Let's create a powerful UFO: United Free Office!
Lupp from München
User avatar
Lupp
Volunteer
 
Posts: 1986
Joined: Sat May 31, 2014 7:05 pm
Location: München, Germany

Re: Cursor word empty line

Postby JeJe » Sat Mar 10, 2018 9:14 pm

Lupp - your function for

Mark’s and Sammy’s.

gives one word - the and.
Openoffice 4.1.2
Windows 8
JeJe
 
Posts: 237
Joined: Wed Mar 09, 2016 2:40 pm

Re: Cursor word empty line

Postby JeJe » Sun Mar 11, 2018 2:50 pm

A little bit improved version of my code above. Added a few more punctuation symbols, keeps web address as one word. Doesn't handle some things. Several choices to make when you decide what your words are.

Code: Select all   Expand viewCollapse view
Sub EnumerateParagraphs
   REM Author: Andrew Pitonyak
   Dim oParEnum 'Enumerator used to enumerate the paragraphs
   Dim oPar 'The enumerated paragraph
   REM Enumerate the paragraphs.
   REM Tables are enumerated along with paragraphs
   oParEnum = ThisComponent.getText().createEnumeration()
   Do While oParEnum.hasMoreElements()
      oPar = oParEnum.nextElement()
      REM This avoids the tables. Add an else statement if you want to
      REM process the tables.
      If oPar.supportsService("com.sun.star.text.Paragraph") Then
         'MsgBox oPar.getString(), 0, "I found a paragraph"
         getwordsinpara(oPar.getstring)

      ElseIf oPar.supportsService("com.sun.star.text.TextTable") Then
         'Print "I found a TextTable"
      Else
         'Print "What did I find?"
      End If
   Loop
End Sub


sub getwordsinPara(txt as string) as long
   dim punct as string,i as long,c as string,wd as string,lenwd as long
   dim lentxt as long,webaddress as boolean, breakpos as long,hyphencount as long

   lentxt = len(txt)
   punct = " ;:!?(){}[]<>,*‘’“”—–…‹›«»•†‡§||#_" & Chr(34) & Chr(10) & chr(9)
   '".'-_@\/" handle separately ?

   for i = 1 to lentxt 'go though each letter
      c = Mid(txt, i, 1)

      select case c
      case "."
         IF WD = "" THEN
            breakpos = i
         ELSE
            if i < lentxt then
               IF InStr(1, punct, Mid(txt, i+1, 1)) <> 0 Then breakpos =i
            else
               breakpos =i
            end if
         END IF
         'apostrophe and single curly quote confusion - NOT handled yet
         '      case "'" 'not in punctuation list so always treating as part of word
         '      case "‘" in punctuation list so always treated as breakchar
         '      case "’" in punctuation list so always treated as breakchar
         '      case "_" 'underline in punctuation list so always treated as breakchar

      case "-" 'hyphen options when to treat as break char
         hyphencount = hyphencount +1

      CASE ":"
         IF wd <> "http" then breakpos = i

      case "/"
         if instr(1,wd,"http:") <>1 and instr(1,wd,"www.")<>1 then
            breakpos = i
         else
            webaddress = true
         end if
      case else
         If InStr(1, punct, c) <> 0 Then breakpos =i
      end select

      if breakpos<>0 then
         gosub handleword
         breakpos =0
         wd = ""
         lenwd=0
         webaddress = false
         hyphencount = 0
      Else
         lenwd = lenwd+1
         wd =  wd & c
      End If
   next

   gosub handleword
   exit sub


handleword:

   if wd<>"" then
      'handle hyphenated words if 1 hyphen treat as one world else spit if not date
      if webaddress = false then
         if hyphencount >1 and isdate(wd) =false then
            wds = split(wd,"-")
            for j=  0 to ubound(wds)
               'option to exclude numbers
               '               if isnumeric(wd) =false then
               msgbox wd
               '               end if
            next
         end if
      end if

         if wd<> "" then
            'option to exclude numbers
            '            if isnumeric(wd) =false then
            msgbox wd
            '            end if
         end if
      end if

      return

end sub
Openoffice 4.1.2
Windows 8
JeJe
 
Posts: 237
Joined: Wed Mar 09, 2016 2:40 pm

Re: Cursor word empty line

Postby Lupp » Sun Mar 11, 2018 3:33 pm

I would assume there is no way to list and count words from any document as long as there isn't a clear definition of what a word is.
I also don't feel capable at all to judge how many and which languages might be treated the same way. But I would judge that the appropriate means for an approach is the usage of regular expressions. This for the definition of words and for the parsing of documents as well.
Any syntactical analysis based on scanning text objects character by character or using TextCursor objects has to do similar things as a RegEx engine based on much less expertise concerning unicode, e.g. In addition it will lack a reliable specification if not the author re-invents something like RegEx.

Actual existing RegEx engines are powerful and efficient, and this is surely valid for the ICU engine integrated into AOO (and LibO, too). And if a more special text requires a more special treatment, it should be much easier to rework the used RegEx string in one place (even if it becomes rather complicated) than to rework an already coded program in many places.

You may run the code contained in the attached example as a demonstration of what I mean to more detail.
Attachments
aoo92697SplitInWords_1.odt
(28.5 KiB) Downloaded 36 times
On Windows 10: LibreOffice 6.1 and older versions, PortableOpenOffice 4.1.5 and older, StarOffice 5.2
---
Let's create a powerful UFO: United Free Office!
Lupp from München
User avatar
Lupp
Volunteer
 
Posts: 1986
Joined: Sat May 31, 2014 7:05 pm
Location: München, Germany

Re: Cursor word empty line

Postby JeJe » Sun Mar 11, 2018 4:06 pm

Lupp - yes, regular expressions are very powerful. Good demo. In it...

‘Mark and Steven’s party’ said Jenny.


Gives Steven as a word - that's the apostrophe, single quote confusion problem. When people use the single quote as a typographically prettier apostrophe.

ex-partner


Is counted separately when its really one word. Other hyphenations will be two words though - its not simple.

Its perhaps not so much that we can't agree a word count - there are some choices about how we count hyphenated words, word contractions and so on... but having made those decisions... we should agree how many words there are.


“Sam and Gary,” said Jenny.


Is counted by OpenOffice as 6 words. LibreOffice does a much better count of words, correctly making it five. OO is particularly bad.
Openoffice 4.1.2
Windows 8
JeJe
 
Posts: 237
Joined: Wed Mar 09, 2016 2:40 pm

Re: Cursor word empty line

Postby JeJe » Sun Mar 11, 2018 4:48 pm

Regarding hyphenation word counts - this article's interesting:

https://wordribbon.tips.net/T009228_Ign ... ounts.html

Counting all hyphenations as one word.


That would make this example that I got from here,

https://www.quickanddirtytips.com/educa ... -modifiers

a count of three words. Which people might disagree with.

a what-you-might-have-been-wondering-about topic
Openoffice 4.1.2
Windows 8
JeJe
 
Posts: 237
Joined: Wed Mar 09, 2016 2:40 pm

Re: Cursor word empty line

Postby Lupp » Sun Mar 11, 2018 5:08 pm

I actually cannot imagine a solution to the "hyphenation problem" based on pure syntax. Or may I say, it's impossible? And it changes with time. When I was still a pupil I was teached to use "type writer" for what we called "Schreibmaschine" in German. Then it was "type-writer" for some time if I remember correctly, and nowadays everybody seems to use "typewriter". This reasonable convention was reached no sooner than we came to need a trip to the museum to see one of these things.

In German we have the absurd case that a prefix needs to be separated and postponed, though not being a regular word with a distinct
meaning. An example?
Well: A: "Lasst uns anfangen!" B: "Ich habe schon angefangen, aber fange du endlich an!"
(A: "Let's start!" B: "I started already, but you should start ultimately!" or similar)

I'm not a linguist, and I don't even have a term for the "infixed perfect-prefix 'ge'" or whatever it is.
On the other hand I'm 73 now and fortunately I never had actual need of definitely listing and counting "words".

The misuse of a single quote as a "pretty apostophe" is easily solved in RegEx by "('|’)" or even "('|’|‘)".
But national and regional or group-creating stubborn specifics will surely soon be introduced to present us with new funny nonsense once in a while.

(Editing:)
Lupp wrote:I actually cannot imagine a solution to the "hyphenation problem" based on pure syntax.

JeJe wrote:Counting all hyphenations as one word.

Yes, I thought of this one, but...
JeJe wrote:Which people might disagree with.

... was anticipated.
I would agree, however. And the example
JeJe wrote:a what-you-might-have-been-wondering-about topic
is exactly the one I would like to apply the agreement to.
On Windows 10: LibreOffice 6.1 and older versions, PortableOpenOffice 4.1.5 and older, StarOffice 5.2
---
Let's create a powerful UFO: United Free Office!
Lupp from München
User avatar
Lupp
Volunteer
 
Posts: 1986
Joined: Sat May 31, 2014 7:05 pm
Location: München, Germany

Re: Cursor word empty line

Postby JeJe » Mon Mar 12, 2018 12:48 am

Here's another version of my count code in an attached writer document with a dialog showing the word list.
Should be easy to adapt for anyone's else's count code.

(Just replace the word "EnumerateParagraphs" in the
"LoadWordsNew" sub in the "WCM" module with the name of your function which starts the adding of words.
And add the words by calling addWord(YourWord) from within your function.)
Attachments
Word Count.odt
(17.64 KiB) Downloaded 26 times
Openoffice 4.1.2
Windows 8
JeJe
 
Posts: 237
Joined: Wed Mar 09, 2016 2:40 pm


Return to Macros and UNO API

Who is online

Users browsing this forum: No registered users and 6 guests