OO Writer - Identifying Multi Replicated Paragraphs

Creating a macro - Writing a Script - Using the API (OpenOffice Basic, Python, BeanShell, JavaScript)
Post Reply
Col
Posts: 5
Joined: Tue Jun 02, 2020 6:46 pm

OO Writer - Identifying Multi Replicated Paragraphs

Post by Col »

Follow on from post: OO Writer - Identifying Duplicated Paragraphs

Apologies folks for wanting a second bite at this. Though the responses solved the question I posed, with hindsight, I now see that I did not anticipate the possibility that different paragraphs might have been replicated. As it stands, the macro as modified by FJCC will identify instances (in red text) of those paragraphs, wherever located within the text of a single file where a single paragraph has been replicated however many times that may be. What I need to address is the situation where there may be instances where additional paragraphs may also been replicated ie say 5 different paragraphs replicated. I don't even know if this is achievable in OO macro language but I would be grateful if some kind hearted soul would recode the macro for me to show such replicated paragraphs in a different text colour.

Thank you

Code: Select all

Sub DeleteDuplicateParagraphs
oDoc = ThisComponent
enum = oDoc.Text.createEnumeration
While enum.hasMoreElements
thisPara = enum.nextElement
s = thisPara.getString
c = c + 1
If Len(s) > 0 then
Check(s,c,oDoc)
EndIf
Wend
End Sub

Sub Check(s,c,oDoc)

enum1 = oDoc.Text.createEnumeration
While enum1.hasMoreElements and c >= cc
enum1.nextElement
cc = cc + 1
Wend
While enum1.hasMoreElements
nextPara = enum1.nextElement
ss = nextPara.getString
If ss = s then
  nextPara.CharColor = RGB(255,0,0)
EndIf
Wend
End Sub


Open Office 4.1.7
Windows 7
JeJe
Volunteer
Posts: 2784
Joined: Wed Mar 09, 2016 2:40 pm

Re: OO Writer - Identifying Multi Replicated Paragraphs

Post by JeJe »

I noticed doing a short test on your original code that adjacent repetitions weren't picked up. This was found

para1
different para
para1

but this wasn't

para1
para1
Windows 10, Openoffice 4.1.11, LibreOffice 7.4.0.3 (x64)
JeJe
Volunteer
Posts: 2784
Joined: Wed Mar 09, 2016 2:40 pm

Re: OO Writer - Identifying Multi Replicated Paragraphs

Post by JeJe »

If I understand you... your mean to find groups of repeated paragraphs so para1, para2, para3 together here:

para1
para2
para3
other para
another para
para1
para2
para3


That's a much more complicated problem.
Windows 10, Openoffice 4.1.11, LibreOffice 7.4.0.3 (x64)
Col
Posts: 5
Joined: Tue Jun 02, 2020 6:46 pm

Re: OO Writer - Identifying Multi Replicated Paragraphs

Post by Col »

Hi, JeJe, yes, I have to admit there is a weakness in that it doesn't uppercase the "original" para but it will identify successive adjacent ones (see output below). I had to add the colour to the text because I couldn't find out how to provide an attachment! If the "original" para was part of the intended text that is OK but if it purely represented a "thought or suggestion" I had added as a prompt it would leave one instance undetected - not good.

Sample Text:

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nullam facilisis, mauris sit amet gravida commodo, diam massa malesuada purus, et lacinia magna libero at eros. Maecenas fermentum leo quis lacus tincidunt, sed pretium ligula fermentum. Donec vel tellus sed magna dignissim elementum. Sed a magna eros. Phasellus sed semper massa, vitae luctus odio. Etiam convallis est eget aliquet gravida. Nam a venenatis turpis. Ut massa leo, porta id sapien eget, iaculis euismod est. Vivamus placerat consequat lectus sed aliquet. Suspendisse potenti.

Sed nibh orci, feugiat id accumsan vel, consectetur vitae ligula. Orci varius natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Nam pharetra ut nisi eu tincidunt. Nullam semper justo non nibh semper, in molestie purus suscipit. Proin eu varius nunc. Proin rhoncus nunc id arcu posuere maximus. Sed vel tincidunt diam, vitae imperdiet enim. Fusce pretium arcu eu lorem fermentum lacinia. Nulla facilisi. Curabitur cursus vestibulum lacus feugiat tristique. Donec tortor sapien, eleifend at tellus sed, laoreet fringilla nunc. Nam ullamcorper odio non mollis fringilla. Ut fringilla sit amet orci ut aliquet. Donec a malesuada orci, sit amet aliquam urna. Aliquam cursus sollicitudin odio id accumsan.

Sed nibh orci, feugiat id accumsan vel, consectetur vitae ligula. Orci varius natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Nam pharetra ut nisi eu tincidunt. Nullam semper justo non nibh semper, in molestie purus suscipit. Proin eu varius nunc. Proin rhoncus nunc id arcu posuere maximus. Sed vel tincidunt diam, vitae imperdiet enim. Fusce pretium arcu eu lorem fermentum lacinia. Nulla facilisi. Curabitur cursus vestibulum lacus feugiat tristique. Donec tortor sapien, eleifend at tellus sed, laoreet fringilla nunc. Nam ullamcorper odio non mollis fringilla. Ut fringilla sit amet orci ut aliquet. Donec a malesuada orci, sit amet aliquam urna. Aliquam cursus sollicitudin odio id accumsan.

Phasellus ac massa mattis, gravida erat sed, dignissim leo. Duis in maximus diam. Donec ut laoreet quam. Sed finibus id neque et vestibulum. Interdum et malesuada fames ac ante ipsum primis in faucibus. Praesent a efficitur libero. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Sed porttitor auctor enim, tincidunt faucibus mauris viverra eu. Ut ex nisi, convallis quis dolor nec, porttitor tempor neque.

Sed nibh orci, feugiat id accumsan vel, consectetur vitae ligula. Orci varius natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Nam pharetra ut nisi eu tincidunt. Nullam semper justo non nibh semper, in molestie purus suscipit. Proin eu varius nunc. Proin rhoncus nunc id arcu posuere maximus. Sed vel tincidunt diam, vitae imperdiet enim. Fusce pretium arcu eu lorem fermentum lacinia. Nulla facilisi. Curabitur cursus vestibulum lacus feugiat tristique. Donec tortor sapien, eleifend at tellus sed, laoreet fringilla nunc. Nam ullamcorper odio non mollis fringilla. Ut fringilla sit amet orci ut aliquet. Donec a malesuada orci, sit amet aliquam urna. Aliquam cursus sollicitudin odio id accumsan.

Ut dapibus vitae nisl nec dictum. Phasellus sagittis, magna id semper tempus, sem lectus posuere odio, dapibus commodo sem ipsum et augue. Ut vel libero quam. Etiam nec interdum sapien. Vestibulum convallis vitae dui non ultrices. Curabitur arcu massa, tincidunt at egestas quis, venenatis eget mi. Vivamus vehicula vitae nisi vel iaculis. Fusce porttitor elit non nulla varius semper. Fusce elementum turpis dignissim nulla laoreet, nec porttitor ante laoreet. Nulla tempor sit amet lectus sed posuere.

Sed nibh orci, feugiat id accumsan vel, consectetur vitae ligula. Orci varius natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Nam pharetra ut nisi eu tincidunt. Nullam semper justo non nibh semper, in molestie purus suscipit. Proin eu varius nunc. Proin rhoncus nunc id arcu posuere maximus. Sed vel tincidunt diam, vitae imperdiet enim. Fusce pretium arcu eu lorem fermentum lacinia. Nulla facilisi. Curabitur cursus vestibulum lacus feugiat tristique. Donec tortor sapien, eleifend at tellus sed, laoreet fringilla nunc. Nam ullamcorper odio non mollis fringilla. Ut fringilla sit amet orci ut aliquet. Donec a malesuada orci, sit amet aliquam urna. Aliquam cursus sollicitudin odio id accumsan.

Nulla at enim eu nibh gravida ultrices ut vel eros. Cras sit amet nisi tempor, euismod risus at, mollis purus. Aenean quis turpis orci. Ut pharetra turpis ex, non imperdiet mi porttitor in. Nam sollicitudin neque ac libero facilisis ornare. Quisque accumsan semper justo, vel eleifend nibh venenatis et. Quisque semper tempor ante in blandit. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Morbi ex augue, pellentesque eu vestibulum sed, aliquet id massa. Proin suscipit vestibulum erat in molestie.

Sed nibh orci, feugiat id accumsan vel, consectetur vitae ligula. Orci varius natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Nam pharetra ut nisi eu tincidunt. Nullam semper justo non nibh semper, in molestie purus suscipit. Proin eu varius nunc. Proin rhoncus nunc id arcu posuere maximus. Sed vel tincidunt diam, vitae imperdiet enim. Fusce pretium arcu eu lorem fermentum lacinia. Nulla facilisi. Curabitur cursus vestibulum lacus feugiat tristique. Donec tortor sapien, eleifend at tellus sed, laoreet fringilla nunc. Nam ullamcorper odio non mollis fringilla. Ut fringilla sit amet orci ut aliquet. Donec a malesuada orci, sit amet aliquam urna. Aliquam cursus sollicitudin odio id accumsan.

Sed nibh orci, feugiat id accumsan vel, consectetur vitae ligula. Orci varius natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Nam pharetra ut nisi eu tincidunt. Nullam semper justo non nibh semper, in molestie purus suscipit. Proin eu varius nunc. Proin rhoncus nunc id arcu posuere maximus. Sed vel tincidunt diam, vitae imperdiet enim. Fusce pretium arcu eu lorem fermentum lacinia. Nulla facilisi. Curabitur cursus vestibulum lacus feugiat tristique. Donec tortor sapien, eleifend at tellus sed, laoreet fringilla nunc. Nam ullamcorper odio non mollis fringilla. Ut fringilla sit amet orci ut aliquet. Donec a malesuada orci, sit amet aliquam urna. Aliquam cursus sollicitudin odio id accumsan.
Open Office 4.1.7
Windows 7
Col
Posts: 5
Joined: Tue Jun 02, 2020 6:46 pm

Re: OO Writer - Identifying Multi Replicated Paragraphs

Post by Col »

Hi, JeJe, I think you've got my meaning. An extreme case would be:

Para 1 Unique
Para 2 Unique
Para 3 Unique
Para 1 Replicated in Red
Para 4 Unique
Para 1 Replicated in Red
Para 5 Unique
Para 6 Unique
Para 7 Unique
Para 6 Replicated in Blue
Para 1 Replicated in Red
Para 8 Unique
Para 9 Unique
Para 10 Unique
Para 9 Replicated in Green
Para 11 Unique
Para 6 Replicated in Blue
Para 12 Unique
Para 9 Replicated in Green
Para 1 Replicated in Red

I agree - a daunting task!
Open Office 4.1.7
Windows 7
User avatar
RoryOF
Moderator
Posts: 34618
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: OO Writer - Identifying Multi Replicated Paragraphs

Post by RoryOF »

Checksum each paragraph and build a table of para, checksum, colour. Automate the insertion of a paragraph as a section. If a previous para has the same checksum, set a background colour from the table, and add a Hide in the section parameters.
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
JeJe
Volunteer
Posts: 2784
Joined: Wed Mar 09, 2016 2:40 pm

Re: OO Writer - Identifying Multi Replicated Paragraphs

Post by JeJe »

I misunderstood - I thought you meant process chunks of text containing more than one paragraph.

If its still just one paragraph and all you want is different colors for each different find:

Note uses QBcolors - which will run out after 12 colors with white and black excluded.
Original text has to be all in black.

Code: Select all

dim coli as long,col as long
Sub DeleteDuplicateParagraphs
coli = 1
col = qbcolor(1)
oDoc = ThisComponent
enum = oDoc.Text.createEnumeration
While enum.hasMoreElements
thisPara = enum.nextElement
s = thisPara.getString
c = c + 1
If Len(s) > 0 and thisPara.CharColor =0	 then
res=Check(s,c,oDoc)
if res then
thisPara.CharColor =col
coli = coli+1
if coli = 7 then coli = 8 'avoid white
if coli = 15 then msgbox "out of colors" '15 = last color and is white again
col  =qbcolor(coli)
end if

EndIf
Wend
End Sub

function Check(s,c,oDoc) as boolean
 found  = false
enum1 = oDoc.Text.createEnumeration
While enum1.hasMoreElements and c >= cc
enum1.nextElement
cc = cc + 1
Wend
While enum1.hasMoreElements
nextPara = enum1.nextElement
ss = nextPara.getString
If ss = s then
  nextPara.CharColor =col
  found = true
EndIf
Wend
if found then  check = true

end function

Windows 10, Openoffice 4.1.11, LibreOffice 7.4.0.3 (x64)
User avatar
Lupp
Volunteer
Posts: 3553
Joined: Sat May 31, 2014 7:05 pm
Location: München, Germany

Re: OO Writer - Identifying Multi Replicated Paragraphs

Post by Lupp »

If a "daunting task" is reasonable and specified in a well considered way it may be seen as a challenge or even as a temptation. This one isn't yet.

What about paragraphs in any TextFrame? In cells of TextTable_s? Containing shapes?
What about paragraphs differing only with respect to additional whitespace?
What if differing in a letter case? A comma replaced with a semicolon?

How should the colors emphasizing repetitions of one repeated paragraph point to this specific paragraph? Or why should this be without meaning? What about colors present in advance of running the solution?

How shall the needed colors be chosen? How so in specific if they are many?

A solution for a rather complicated task in a way only working under lots of assumed (and not even explictly given) assurances is supposed to be of little use and probably disposed when three times applied. In this case it's slave work, and not at all a promising challenge.

In any case you will need to identify "equal" (equivalent) paragraphs. I would suggest you delegate this to a reusable function, and store the results to an appropriate array structure.

The equivalence itself should be decided by another dedicated function.

The overall task may then be designed depending on your detailed specifications to come.

You find suggested code for the mentioned functions and a simplified example for the overall task in the attached document. Only the enhanced cases "StringIgnoreAdditionalWhiteSpace" and "StringRelax" require LibreOffice 6.2 or higher. Everything else will also run in AOO.
On Windows 10: LibreOffice 24.2 (new numbering) and older versions, PortableOpenOffice 4.1.7 and older, StarOffice 5.2
---
Lupp from München
JeJe
Volunteer
Posts: 2784
Joined: Wed Mar 09, 2016 2:40 pm

Re: OO Writer - Identifying Multi Replicated Paragraphs

Post by JeJe »

Lupp - I think the challenge one would be what I thought the OP meant - finding matching parts of the text which consist of any number of multiple paragraphs. Not as hard as finding matching parts of the text that aren't whole paragraphs of course...

Edit: I'm passing...
Windows 10, Openoffice 4.1.11, LibreOffice 7.4.0.3 (x64)
Post Reply