My problem is I pasted several pages in the file and God only knows how but I have repeated information. I have been manually searching Paragraph by Paragraph Crtl-F over and over. Is there a way for the file to check its self for repeated information other than searching it 1 paragraph at a time?
Thanks in advance for your help.
Sincerely, Neal Ranzoni
Title Edited. A descriptive title for posts helps others who are searching for solutions and increases the chances of a reply (Hagar, Moderator).
If you're sure that the duplication is at the paragraph level (that is, whole paragraphs--or more--are duplicated, rather than just sentences or phrases), you might be able to do it with Calc spreadsheet:
Copy/paste all the paragraphs into a Calc column
Copy the column and transpose it into one row
Fill in the entire grid with a formula comparing the paragraph at that row and column.
Any matches that appear off the diagonal are your unwanted duplications.
This will be limited to the maximum number of columns that Calc can hold (16,000, I think).
Sub DeleteDuplicateParagraphs
oDoc = ThisComponent
enum = oDoc.Text.createEnumeration
While enum.hasMoreElements
thisPara = enum.nextElement
s = thisPara.getString
c = c + 1
If Len(s) > 0 then
Check(s,c,oDoc)
EndIf
Wend
End Sub
Sub Check(s,c,oDoc)
enum1 = oDoc.Text.createEnumeration
While enum1.hasMoreElements and c >= cc
enum1.nextElement
cc = cc + 1
Wend
While enum1.hasMoreElements
nextPara = enum1.nextElement
ss = nextPara.getString
If ss = s then
ss = ""
nextPara.setString(ss)
EndIf
Wend
End Sub
context = XSCRIPTCONTEXT
def iterable( enumerable ):
enum = enumerable.createEnumeration()
while enum.hasMoreElements():
yield enum.nextElement()
def remove_duplicate_paragraphs():
doc = context.getDocument()
text = doc.getText()
paras = []
for paragraph in iterable( text ):
if paragraph.getString() in paras:
paragraph.setString("")
else:
paras.append( paragraph.getString() )
Karo
Libreoffice 25.2… on Debian 13 (trixie) (on RaspberryPI5) Libreoffice 25.8… flatpak on Debian 13 (Bookworm) (on RaspberryPI5)
Here is the problem. I was writing a dam book and pasted all the parts in. But somehow post anywhere from 1-3 of the same parts. So I have been going paragraph to paragraph searching and deleting copies of the preposted. The issue is it is over 200 pages and it just got really old. I am close to either scrapping the dam book or starting it over.
Is the book broken into chapters? If not, insert chapter breaks at suitable intervals, so that you can deal only with a smallish section at a time (you can always remove the chapters later if they don't suit your text flow). The old-fashioned method might be best: print it out and go through the printout with a highlighter. Mark the repeats, then sit down at your computer and systematically delete them, handling one "chapter" at a time. I can think of no surer way, nor of any easier way.
Edit: For what it is worth, I give here a link to a method of finding such paragraphs in MS Word http://www.techandlife.com/2012/06/find ... soft-word/
I haven't tried this, and don't know if or how it works. It will almost certainly need modification for OpenOffice, which, as the textbook writers say, I leave as an exercise for those interested!
I would be the interested. lol I will give this a fast shot and then up to the guys post with the code in it. I am dreading starting from scratch but the book is published and OMG if someone buys it before I get it fixed.
I will try both and if I can not fix this disaster I created I will just start over on Sunday when I am finally back at my desk full time.
Thanks to everyone that are working hard not to LOL@me and are shooting me ideas.
"karolus"&"JohnV" Guys This may be beyond my comprehension until I have time to sit down (Sunday) But I see a website/program waiting to be made based on this. I am sure this would be a marketable tool guys.
I am strapped on sleep/ computer time this week but i will be figuring the code out from both post this weekend. If I can get anything working by this weekend (well early next week) I will be back posting the good or bad news.
The Word article RoryOF linked to won't work in Writer: it depends on being able to search across paragraph breaks, which Writer still can't do.
Did you try the method I outlined? It worked in a few minutes with the document I tested with; the main restriction is that you have to have duplicated paragraphs, and not smaller bits.
acknak wrote:The Word article RoryOF linked to won't work in Writer: it depends on being able to search across paragraph breaks, which Writer still can't do.
AltSearch wlll search across paragraphs, as far as I remember, so a version of the Word code ought be possible from some of our programming geniuses; I don't expect it would be blindingly fast, and can already think of scenarios which might cause it to fail, such as highly repetitive lists. I gave the URL merely as an indication of _a_ method of tackling such a problem.