Convert ASCII text files by deleting extra paragraph breaks

Shared Libraries
Forum rules
For sharing working examples of macros / scripts. These can be in any script language supported by OpenOffice.org [Basic, Python, Netbean] or as source code files in Java or C# even - but requires the actual source code listing. This section is not for asking questions about writing your own macros.
Post Reply
JohnV
Volunteer
Posts: 1585
Joined: Mon Oct 08, 2007 1:32 am
Location: Kentucky, USA

Convert ASCII text files by deleting extra paragraph breaks

Post by JohnV »

EDITED 7-4-12 simply to note that the macro appears, from a small test, to continue to work with ApacheOpenOffice's new implementation of regular expressions.

This is the same code stored at OOoForum.org. I worked too hard on it to risk it's potential loss. I think the maintainer of that forum will always act responsibly but he may die.

This macro removes the excess paragraph breaks from an ASCII editor type file and also works on text coped & pasted from the Web that has line breaks inserted like a message in these forums.

It also provides provides other options to indent each paragraph, reduce the spacing between paragraphs, change spaced indent to tabs, remove excess interior spacing, strip all indents and justification of he results. One routine in this macro is designed to reformat text that has been scanned and then OCRed directly into OOo. If your OCR program creates a left margin by inserting spaces before each line then you might find this handy.

Trying to reformat an ASCII file is an exercise in guesswork at best and you shouldn't expect perfect results. This macro has to make assumptions about what is truly the end of a paragraph, a title or part of a list.

Code: Select all

'Version 2.2   5-6-06  John Vigor
'Converts ASCII text files, or selected text within them, by stripping out excess
'paragraph breaks. Works with items copied & pasted from the Web that contain line
'breaks such as a message in these forums. 
'WARNING - Anything stored on the Clipboard will be overwritten. A copy of your
'original file will not be saved to the Clipboard if a file greater then 60K 
'characters or selected text of any size and you are responsible for otherwise
'backing it up. On the other hand, your original file will not be changed unless you
'save the macro results and overwrite the original but this assumes a saved file.
'Sample processing times on a 770 MHz machine in Pages/Seconds format: 
'  10/4, 20/7, 40/16, 80/34, 160/81 (1.35 Min.).
'Hint for long documents - TURN OFF AUTO-SPELLCHECKING. This will save time.
'You can control what items the program asks you about and what happens if you choose
'not to be asked about an item by editing the variables below. These variables are
'ignored if you run the macro on selected text so if you customize them and need to 
'be asked about items for a particular file then simply select the entire file with
'Ctrl+A before running the macro. 
'You should become familiar with how the program works before you attempt to customize
'it. It works differently if the file is less or more than 60K and when processing a
'selection.
'
Sub ASCII_Formatter_StartHere
'VARIABLES YOU CAN CHANGE.
AskShortParagraphs = True 'Show the query about keeping short paragraphs.
 'Default answer if AskShortParagraphs is False.
 KeepShortParagraphs = True 'HIGHLY recommended and the faster of the two methods. I
 'only maintain the other method because it's needed for one option.
'-- 
AskShortParaLength = True 'Show a chance to adjust the program's estimate of what should
'be considered a short paragraph that will kept.
ShortDef = 20 'The minimum number of characters short of the right margin that a line
'must end to be a short paragraph. The paragraph break at the end of any short paragraph
'will be maintained. 
'--
AskViewOptions = True 'Show the query about end of program Options.
 'Default, if AskViewOptions is False.
 GoToOptions = True 'If True the Option variables below will take control. They will
 'also control if AskViewOptions is True and you choose to go to the Options. 
'-- 
'The following values control what Options are displayed following initial file processing:
Show1stOptionSet = True 'Show the first set of Options, which are Indent All Paragraphs
'and Reduce Paragraph Spacing.
 'Defaults, if Show1stOptionSet is False.
 O1_1 = False 'Indent all paragraphs.
 O1_2 = False 'Reduce paragraph spacing.
  AskMaxParaSpacing = True 'Ask for the maximum number of blank "lines" between paragraphs.
   O1_2_1 = 1 'Default, if AskMaxParaSpacing is False.
Show2ndOptionSet = True 'Options are Remove Excess Interior Spaces, Change Spaced
'Indents to Tabs and Justify All Paragraphs.
 'Defaults, if Show2ndOptionset is False.
 O2_1 = False 'Remove excess interior spaces.
 O2_2 = False 'Change spaced indents to tabs.
 O2_3 = False 'Justify text.
'-- 
'The following control aspects of full file, selection and/or Option processing and 
'are not ignored if you process a selection:
ShowBackUpWarning = True 'Show warning that file not copied to Clipboard if over 60K. 
ShowFinished = True 'Show the finished message.
PageBite = 10 'Number of pages processed at a time for selected text or a file over 60K 
'characters. Seems pretty good but I haven't seriously tried to optimize it.
ViewBegin = True 'The cursor and your view are taken to the beginning of the document
'when macro ends. "False" will leave you at the end.
StripHyphens = True 'Assumes hyphens located at the right margin are editorial and do
'not seperate true hyphenated words like "half-dollar".   
CheckForcedLeftMargin = True 'A VERY quick routine if spaces imitating a left margin
'don't exist. Or it will delete such a margin if they do (known to work for my version of
'ScanSoft's OCR software - the 1st line of the file will contain only a series of spaces).
StripIndents = "0" 'This controls what happens during initial file processing. The default
'value is recommended. The default ("0") is to leave all tabbed or spaced paragraph indents,
'"1" will strip spaced indents, "2" will strip tabbed ones and "3" will strip both kinds.
'If you run the macro on selected text and don't choose to go to the Options then you
'will also be asked about this. One use for this is to deal with paragraphs offset, as
'opposed to just 1st line indented, from the left margin. Normal processing will leave
'all "lines" of such paragraphs indented and followed by paragraph breaks. Stripping
'out the indents will convert these to regular paragraphs which can be edited normally
'in Writer. Running the macro on selected text after normal processing is a personal
'favorite of mine because I often scan documents with offsets which I need to edit.
MarkIt = True 'Insert the "MarkWith" character as the last character of the file to
'indicate it has been previouly processed. Allows you to run the macro again and go
'directly to the other Options without wading through the full initial file processing
'again. I do not recommend changing as it controls how the program works on a 2nd run. 
MarkWith = Chr(160) 'A nonbreaking space in most fonts, which isn't seen on printing.
Override60K = False 'Setting this to True may significantly slow runtime on large files
'or selections but no seperate processing document will be used if you don't like this.
'.....................................................................................
'DO NOT EDIT BELOW HERE UNLESS YOU KNOW WHAT YOU ARE DOING.
lTime = Timer : RunTime = 0 : Skip = false : Over60K = false : ProcSel = false 
PrevProc = false : IsSelect = false : ASPL = AskShortParaLength : LastSection = False
thisDoc = thisComponent : oDoc = thisDoc 'oDoc may change.
thisVC = thisDoc.CurrentController.getViewCursor : oVC = thisVC 'oVC may change.
thisText = thisDoc.Text  
MarkSel = thisDoc.Text.createTextCursorByRange(thisVC)'Mark any selection.
thisFrame = thisDoc.CurrentController.getFrame()
dispatcher = createUnoService("com.sun.star.frame.DispatchHelper")
FandR = thisDoc.createReplaceDescriptor() 'Find & Replace initial set up.
FandR.searchRegularExpression = true      'Use regular expressions.
If NOT thisVC.isCollapsed then IsSelect = true
If IsSelect then
  a$ = "Process selected text only?"& Chr(13) & Chr(13) &"Cancel to Quit." 
  RunTime = RunTime + (Timer - lTime)
  iAns = MsgBox(a$,3,"Text in this file has been selected.")
  lTime = Timer : If iAns = 2 then End
  If iAns = 6 then ProcSel = true
EndIf 
If ProcSel then 'Tests for over 60K.
 If Len(MarkSel.String) > 60000 OR Len(MarkSel.String) = 0 then Over60K = true 
 ELSEIf thisDoc.characterCount > 60000 then Over60K = true 
EndIF
If Override60K then Over60K = false
If NOT Over60K then LastSection = true
If Over60K AND NOT ProcSel then MarkSel.gotoStart(false) : MarkSel.gotoEnd(true)
thisTC = thisDoc.Text.createTextCursor : thisTC.gotoEnd(false)
thisTC.goLeft(1,true) 'Get the last character. If a binding space
 'then file was previously processed. Process or go to Options?
IF thisTC.String = MarkWith then 'Was file previouly processed?
 PrevProc = True 'Previously processed.
 Skip = SkipToOptions(Show1stOptionSet,Show2ndOptionSet)
EndIf
'++++++++++++++++++++++++++ All basic information gathered, now control program flow.
Select Case Skip 
 Case True  'Go directly to the options.
  Select Case ProcSel 'Do we need to deal with selected text? 
   Case False : BackUp(thisFrame,dispatcher)
         RunOptions(Show1stOptionSet,Show2ndOptionSet,FandR)
   Case True : NoBackUpWarning(ShowBackUpWarning)
         SetUpSelection(thisFrame,thatFrame,"Copy",dispatcher,MarkSel,IsSelect,Skip)
         RunOptions(Show1stOptionSet,Show2ndOptionSet,FandR)
         FinishSelection(thatFrame,MarkSel,FandR,dispatcher,MarkSel,thisVC) 
  End Select
 Case False 'Going to do the regular file processing.
  Select Case ProcSel 'Do we need to deal selected text?
   Case True : NoBackUpWarning(ShowBackUpWarning)
         AskShortParas(AskShortParagraphs,ShortDef,thisTC,ASPL)
         AskIndents() : If CheckForcedLeftMargin then ForcedLeftMargin(thisTC,FandR)
         SetUpSelection(thisFrame,thatFrame,"Copy",dispatcher,MarkSel,IsSelect,Skip) 
         If Over60K then
           ProcessOver60K(FandR,thisVC,thisDoc,dispatcher,MarkSel,PageBite)
          Else RunMainRoutines(FandR) 
           FinishSelection(thatFrame,MarkSel,FandR,dispatcher,MarkSel,thisVC) 
         EndIf
         oDoc = thisDoc
         AskRunOptions(AskViewOptions,Show1stOptionSet,Show2ndOptionSet,FandR)
   Case False 'Normally this would be the 1st time entire file is processed.  
              'Won't ask about indents because most users won't care.
         IF Over60K then NoBackUpWarning(ShowBackUpWarning)     
         If NOT Over60K then BackUp(thisFrame,dispatcher)
         AskShortParas(AskShortParagraphs,ShortDef,thisTC,ASPL)
         If CheckForcedLeftMargin then ForcedLeftMargin(thisTC,FandR)
         If Over60K then
           SetUpSelection(thisFrame,thatFrame,"Copy",dispatcher,MarkSel,IsSelect,Skip) 
           ProcessOver60K(FandR,thisVC,thisDoc,dispatcher,MarkSel,PageBite)
           oDoc = thisDoc 
          Else
           RunMainRoutines(FandR) 
         EndIf
         AskRunOptions(AskViewOptions,Show1stOptionSet,Show2ndOptionSet,FandR)
  End Select                
End Select
'++++++++++++++++++++++++++++++
thisVC.gotoEnd(False) 
If NOT PrevProc then
 thisText.insertString(thisVC,MarkWith,False)
EndIf
EndMessage(ShowFinished,MarkIt)
If ViewBegin then thisVC.gotoStart(false)
End Sub 

Private lTime, KeepShortParagraphs, oVC, StripHyphens, LastSection
Private oDoc, RunTime, Over60K, thisFrame, thatFrame, thisVC, PrevProc
Private StripIndents, jstify, dispatcher, ShortPara, ProcSel
Private O1_1, O1_2, O1_2_1 ,O2_1, O2_2, O2_3, AskMaxParaSpacing, GoToOptions

Function SkipToOptions(Show1stOptionSet,Show2ndOptionSet)
a$ = "Do you want to skip directly to the end of program Options?"
b$ = Chr(13) & "Cancel to quit." : RunTime = RunTime + (Timer - lTime)
iAns = MsgBox(a$ & b$,3,"This file has been previously processed.")
lTime = Timer : If iAns = 2 then End
If iAns = 7 then 
  SkipToOptions() = false
 Else Show1stOptionSet = true : Show2ndOptionSet = true : SkipToOptions() = true
EndIf
End Function

Sub CopyOrCutIt(whichFrame,doWhich,dispatcher)
 dispatcher.executeDispatch(whichFrame,".uno:" & doWhich,"",0,Array())
End Sub

Sub PasteIt(whichFrame,dispatcher)
 dispatcher.executeDispatch(whichFrame,".uno:Paste","",0,Array())
End Sub

Sub BackUp(thisFrame,dispatcher) 'Back up file to clipboard.
 dispatcher.executeDispatch(thisFrame,".uno:SelectAll","",0,Array())
 CopyOrCutIt(thisFrame,"Copy",dispatcher)
End Sub

Sub NoBackupWarning(ShowBackUpWarning)
 If NOT ShowBackUpWarning then Exit Sub
 a$ = "Because you are processing selected text or the file contains more than 60K "
 a$ = a$ & "characters no backup copy of your file will be stored to the clipboard!"
 a$ = a$ & " Optional > Auto-spellcheck should be off." & Chr(13)
 b$= "DO YOU HAVE THIS FILE BACKED UP?  ('Yes' will continue - 'No' will abort.)"
 c$ = String(15," ") & "NO BACKUP!" : RunTime = RunTime + (Timer - lTime)
 iAns = MsgBox (a$ & b$,308,c$ & c$ & c$) : lTime = Timer
 If iAns = 7 then End 
End Sub

Sub SetUpSelection(thisFrame,thatFrame,doWhat,dispatcher,MarkSel,IsSelect,Skip)
 oDoc = StarDesktop.loadComponentFromURL("private:factory/swriter","_blank",0,Array())
 thatFrame = oDoc.CurrentController.Frame 'Get new doc frame.
 oVC = oDoc.CurrentController.getViewCursor 'Set oVC to new doc.
 If NOT Over60K OR Skip then
  thisVC.gotoRange(MarkSel.Start,false) : thisVC.gotoRange(MarkSel.End,true)
  CopyOrCutIt(thisFrame,"Copy", dispatcher) 'Then fill it 
  PasteIt(thatFrame,dispatcher)             'with the selection.
  BackUp(thisFrame,dispatcher) 'If not big selection or are doing Options 
 EndIf                         'then backup file to clipboard. 
End Sub

Sub FinishSelection(thatFrame,MarkSel,FandR,dispatcher,thisVC)  'Hagar: changed following error message in LO, original = (thatFrame,MarkSel,FandR,dispatcher,MarkSel,thisVC)
 dispatcher.executeDispatch(thatFrame,".uno:SelectAll","",0,Array())
 CopyOrCutIt(thatFrame,"Cut",dispatcher)
 thisVC.gotoRange(MarkSel.Start,false)
 thisVC.gotoRange(MarkSel.End,true) 
 PasteIt(thisFrame,dispatcher)
 oDoc.dispose : If jstify then thisVC.setPropertyValue("ParaAdjust",2)
End Sub

Sub AskShortParas(AskShortParagraphs,ShortDef,thisTC,ASPL)
 thisTC.gotoStart(false)
 While thisTC.isEndOfParagraph() 'Delete empty paragraphs at top of doc.
  thisTC.goRight(1,true) : thisTC.String = ""
 Wend
 If NOT AskShortParagraphs AND NOT KeepShortParagraphs AND NOT ProcSel then Exit Sub
 If NOT AskShortParagraphs AND NOT ProcSel then iAns = 6 : goto Continue
 a$ = "Generally highly recommended. This saves the formatting of lists,"
 b$ = " tables of contents," & Chr(13) & "indexes, etc. that are not indented."
 c$ = " This is also the fastest method."
 d$ = "KEEP SHORT PARAGRAPHS?"
 RunTime = RunTime + (Timer - lTime)
 iAns = Msgbox (a$ & b$ & c$,3,d$) : lTime = Timer
 If iAns = 2 then End
 If iAns = 7 then KeepShortParagraphs = false
 CONTINUE:
 If iAns = 6 then
  KeepShortParaGraphs = true : thisVC.gotoStart(false) : thisTC.gotoStart(false)
  If NOT PrevProc then
    Do 
     If NOT thisTC.isEndOfParagraph then
      thisTC.gotoEndOfParagraph(true) : cnt = cnt +1
      If Len(thisTC.String) > ParaLen then ParaLen = Len(thisTC.String)
     EndIf
    Loop While thisTC.gotoNextParagraph(false) and cnt < 50  
   Else Do 'Get max length of 1st 50 paragraphs which is probably the
         If NOT thisVC.isAtEndOfLine then 'original right margin.
          thisVC.gotoEndOfLine(true) : cnt = cnt + 1
          If (Len(thisVC.String) > ParaLen) then ParaLen = Len(thisVC.String)
         EndIF 
         Loop While thisVC.goRight(1,false) and cnt < 50
  EndIF
  ShortPara = ParaLen - ShortDef 'An arbitary guess of paragraph lengths for 
  If ASPL OR ProcSel then            'titles, lists, etc.
  a$ = "You can adjust the short paragraph length, currently set at " & ShortPara 
  b$ = ". This is " & ShortDef & " characters short of the estimated right margin of "
  c$ = "the original document based on a sampling of up to 50 lines. 'Cancel' will "
  c$ = c$ & "end the program."
  RunTime = RunTime + (Timer - lTime)
  ShortPara = InputBox(a$ & b$ & c$,"ADJUST SHORT PARAGRAPH LENGTH?",ShortPara)
  lTime = Timer : If ShortPara = "" then End
  ShortPara = Cint(ShortPara): If ShortPara < 1 then ShortPara = 1
  EndIf
 EndIf
End Sub

Sub AskIndents()
 a$ = "0 - Remove nothing." & Chr(13) &"1 - Remove spaced indents." & Chr(13)
 a$ = a$ & "2 - Remove tabbed indents.    3 - Both, or click Cancel to quit."
 Query: RunTime = RunTime + (Timer - lTime) 
 sAns = InputBox(a$,"Fix offset paragraphs? (0, 1, 2, 3 or Cancel to quit.)","0")
 lTime = Timer : If sAns = "" then End
 If instr("0123",sAns) = 0 then goto Query
 StripIndents = sAns 
 If Cint(StripIndents) > 0 AND KeepShortParagraphs then AskOffset()
End Sub

Sub AskOffset()
a$ = "If you want to convert an offset paragraph(s) to a normal one then the program "
a$ = a$ & "will not keep short paragraphs." & Chr(13) & "Is this what you want to do?" 
Runtime = Runtime + (Timer - lTime)
iAns = MsgBox(a$,4,"Convert offset paragraphs?") : lTime = Timer
If iAns = 6 then KeepShortParagraphs = false
End Sub

Sub ProcessOver60K(FandR,thisVC,thisDoc,dispatcher,MarkSel,PageBite)
 'View cursor to entire selection range.
 thisVC.gotoRange(MarkSel.Start,false) : thisVC.gotoRange(MarkSel.End,True) 
 LastPage = thisVC.getPage 'Get last page of selection.
 oSelTC = thisDoc.Text.CreateTextCursorByRange(thisVC.Start)
 thisVC.collapseToStart : GetSel = thisVC.getPage : PageBite = PageBite - 1
 Do
  GetSel = GetSel + PageBite 'Get PageBite pages at a time
  If GetSel < LastPage then
    thisVC.jumpToPage(GetSel)'Only the view cursor can jump.
    oSelTC.gotoRange(thisVC,true)
    If Not oSelTC.isEndOfParagraph then
     Do 'Make sure the selection ends at a blank paragraph 
      oSelTC.gotoNextParagraph(true)
     Loop Until oSelTC.isEndOfParagraph
    EndIF
   Else LastSection = true : thisVC.gotoRange(MarkSel.End,false)
    oSelTC.gotoRange(thisVC,true)
  EndIf
  thisVC.gotoRange(oSelTC,true)'Only the view cursor can capture text to copy it.
  CopyOrCutIt(thisFrame,"Copy",dispatcher)
  PasteIt(thatFrame,dispatcher)
  RunMainRoutines(FandR)
  oVC.gotoStart(false): oVC.gotoEnd(true)'Get entire selection.
  CopyOrCutIt(thatFrame,"Cut",dispatcher) : PasteIt(thisFrame,dispatcher)
  oSelTC.collapseToEnd   
 Loop While NOT LastSection
 oDoc.dispose(true) : LastSection = false
End Sub

Sub RunMainRoutines(FandR)
CleanUp(FandR)
StripParagraphBreaks(FandR,StripIndents) 
End Sub

Sub ForcedLeftMargin(thisTC,FandR)'Delete a forced left margin appearing
thisTC.gotoStart(false) : thisTC.gotoEndOfParagraph(true)'in some OCRed text files.
Margin$ = String(Len(thisTC.String)," ")
If Len(Margin$) > 0 then
 FandR.setSearchString("^" & Margin$)
 Find = oDoc.findFirst(FandR)
 Do While Not IsNull(Find)'Can't use "replaceAll" because
  Find.String = ""'multiple a$s in one line will be deleted. 
  If Find.gotoNextParagraph(false) then
    Find = oDoc.findNext(Find.End,FandR)
   Else Exit Do
  EndIf
 Loop
EndIf
End Sub

Sub CleanUp(FandR)
FandR.setSearchString("\n")  'Just in case line breaks got into the doc change
FandR.setReplaceString("\n") 'them to paragraph breaks. This will also help with
oDoc.replaceAll(FandR)       'some text copied & pasted from the web.
FandR.setSearchString(" *$") 'Delete all spaces before paragraph breaks. A must!
FandR.setReplaceString("")
oDoc.replaceAll(FandR)
End Sub

Sub StripParagraphBreaks(FandR,StripIndents)
oTC = oDoc.Text.createTextCursor
oText = oDoc.getText() : s$ = Chr(32) & Chr(9)
Const KM = "�"
If KeepShortParagraphs then
  If NOT PrevProc then
    FandR.setSearchString(".{" & ShortPara+1 & "}$") 'Find long paragraphs 
   Else 
    Do
     oTC.gotoEndOfParagraph(false) : oVC.gotoRange(oTC,false)
     oVC.gotoStartOfLine(true) 
     If Len(oVC.String) > ShortPara then oVC.String = oVC.String & KM
    Loop While oTC.gotoNextParagraph(false) 
  EndIf 
 Else FandR.setSearchString(".*$") 'or find all non-blank ones
EndIf                              'and mark them (at rear).
FandR.setReplaceString("&" & KM)
oDoc.replaceAll(FandR)
FandR.setSearchString("^ *") 'How to handle spaced indents.   
If StripIndents = "0" or StripIndents = "2" then
  FandR.setReplaceString(KM & "&") 'Mark them (at front).
 Else FandR.setReplaceString("") 'Strip them.
EndIf
oDoc.replaceAll(FandR)        
FandR.setSearchString("^\t*") 'How to handle tabbed indents.
If StripIndents = "0" or StripIndents = "1" then
  FandR.setReplaceString(KM & "&") 'Mark them.
 Else FandR.setReplaceString("") 'Strip them.
EndIf
oDoc.replaceAll(FandR)           
If LastSection then 'Make sure correct "ending" paragraph is marked,
 oTC.gotoEnd(false) 'otherwise para breaks may be added or stripped.
 If NOT oTC.isStartOfParagraph then 'If last para has text then
   oTC.goLeft(1,true)               'make sure it's not marked.
   If oTC.String = KM then oTC.String = ""
 EndIf
EndIF
oTC.gotoStart(false)
Do 'Find marked paragraph breaks and replace with a space.
 Do While NOT oTC.isEndOfParagraph
  oTC.gotoEndOfParagraph(true)
  If Right(oTC.String,1) = KM then 
   oTC.collapseToEnd : oTC.goRight(1,true)
   oTC.String = " " : oTC.CollapseToEnd
  EndIf
 Loop
Loop While oTC.gotoNextParagraph(false)
FandR.setSearchString(" " & KM & " *") 'Find space followed by marker and any 
FandR.setReplaceString("\n&") 'trailing spaces which may now be in mid paragraph.
oDoc.replaceAll(FandR)        'Insert break and keep found stuff.  
FandR.setSearchString("^ *" & KM) 'Find 0 or more spaces followed by marker at
FandR.setReplaceString("")        'beginning of paragraph and delete found stuff.
oDoc.replaceAll(FandR)
FandR.setSearchString(" $") 'Replace paragraph ending spaces with a break.   
FandR.setReplaceString("\n")
oDoc.replaceAll(FandR)
oDoc.replaceAll(FandR) 
FandR.setSearchString(KM) 'Delete remaining markers.
FandR.setReplaceString("")
oDoc.replaceAll(FandR)
FandR.setSearchString("- ") 'Hyphens found at the right margin are now
If StripHyphens then 'hyphens followed by a space. What should be done?
  FandR.setReplaceString("") 'Delete all and join the word "parts". Default
 Else FandR.setReplaceString("-") 'Delete the space but keep the hyphen.
EndIf
oDoc.replaceAll(FandR) 
End Sub

Sub AskRunOptions(AskViewOptions,Show1stOptionSet,Show2ndOptionSet,FandR)
 If NOT AskViewOptions AND NOT ProcSel then Goto Continue 
 a$ = Chr(13) & "Would you like to see the additional options?" & Chr(13)
 Runtime = Runtime + (Timer - lTime)
 iAns = MsgBox (a$,4,"MAIN FILE PROCESSING FINISHED") : lTime = Timer
 If iAns = 6 then
   GoToOptions = true
  Else GotoOptions = false  
 EndIf
 CONTINUE:
 If GoToOptions then
  EndIt = FirstOptions(Show1stOptionSet,Show2ndOptionSet,FandR)
  If EndIt then Exit Sub 
  SecondOptions(Show2ndOptionSet,FandR)
 EndIf 
End Sub

Sub RunOptions(Show1stOptionSet,Show2ndOptionSet,FandR) 
 EndIt = FirstOptions(Show1stOptionSet,Show2ndOptionSet,FandR)
 If EndIt then Exit Sub
 SecondOptions(Show2ndOptionSet,FandR)
End Sub 
 
Function FirstOptions(Show1stOptionSet,Show2ndOptionSet,FandR)
If NOT Show1stOptionSet AND NOT O1_1 AND NOT O1_2 AND NOT ProcSel then Exit Function 
If NOT Show1stOptionSet AND NOT ProcSel then goto Silent1
a$ = "1. Indent all paragraphs." : b$ =  Chr(13) & "2. Reduce paragraph spacing."
c$ = Chr(13) & "3. Continue to other options.     'Cancel' will end the program."
d$ = "FIRST OPTION SET - Chose by number (1, 2 or 3) or 'Cancel' to end program."
ASK: RunTime = RunTime + (Timer - lTime)
sAns = InputBox (a$ & b$ & c$,d$,"3") : lTime = Timer
Select Case sAns
Case "1" : Indent(FandR) : a$ = a$ & " COMPLETED!"
Case "2" : ReduceParaSpacing() : b$ = b$ & " COMPLETED!"
Case "3" : Exit Function
Case ""  : FirstOptions() = true : Exit Function
Case Else : Goto ASK
End Select
Goto ASK
SILENT1:
If O1_1 then Indent(FandR)
If O1_2 then ReduceParaSpacing()
End Function
   
Sub Indent(FandR)
FandR.setSearchString("^[!-" & Chr(255) & "]"'Find paragraphs not starting with space
FandR.setReplaceString("\t&")             ' or tab. Insert tab & keep what was found.
oDoc.replaceAll(FandR)
End Sub

Sub ReduceParaSpacing()
If NOT AskMaxParaSpacing AND NOT ProcSel then ParaSpacing = O1_2_1 : goto Continue
a$ = "Enter the maximum number of blank lines you want between paragraphs."
b$ = Chr(13) & "Zero is a valid entry.              'Cancel' will quit this routine."
ASK: RunTime = RunTime + (Timer - lTime)
ParaSpacing = InputBox (a$ & b$,"ALLOW HOW MANY?",1) : lTime = Timer
If ParaSpacing = "" then Exit Sub
If NOT IsNumeric(ParaSpacing) then goto Ask 
ParaSpacing = Cint(ParaSpacing) 
CONTINUE: oTC = oDoc.getText().createTextCursor()
oTC.goToStart(false) : bOK = true 'bOK will test for the end of the file.
While oTC.isEndOfParagraph = true and bOK = true 'Skip spacing at top.
bOK = oTC.goRight(1,false)'This will move to the next paragraph from a blank paragraph.             
Wend
bOK = true
Do 'Starting at the beginning of a paragraph.
 cnt = 0
 While oTC.isEndOfParagraph() and bOK       'Is it also the end of a paragraph, i.e,
  cnt = cnt + 1 'Count blank paragraphs.    'a blank paragraph?
   If (cnt > ParaSpacing) then
     oTC.goLeft(1,true) : oTC.String = ""  'Select it and delete it.
     bOK = oTC.goRight(1,false) 'Go right to next paragraph.
    Else
     bOK = oTC.goRight(1,false) 'Move to next paragraph.
   EndIf
 Wend
Loop While oTC.goToNextParagraph(false) 'Another end of file check.
End Sub

Sub SecondOptions(Show2ndOptionSet,FandR)
If NOT Show2ndOptionSet AND NOT O2_1 AND NOT O2_2 AND NOT O2_2 AND NOT ProcSel then Exit Sub
If NOT Show2ndOptionSet AND NOT ProcSel then goto Silent2
a$ = "1. Replace spaced indents with tabs."
b$ = Chr(13) & "2. Remove excess interior spaces."
c$ = Chr(13) & "3. Justify text." : d$ = "       4. Run all."
e$ = "SECOND OPTION SET - Chose by number (1, 2, 3 or 4) or 'Cancel' to end program."
ASK: RunTime = RunTime + (Timer - lTime)
sAns = InputBox(a$ & b$ & c$ & d$,e$," ") : lTime = Timer
Select Case sAns
Case  "" : Exit Sub
Case "1" : ReplaceSpacesB4LinesWithTab(FandR) : a$ = a$ & " COMPLETED!"
Case "2" : DeleteExcessInteriorSpaces(FandR) : b$ = b$ & " COMPLETED!"
Case "3" : Justify() : c$ = c$ & " COMPLETED!" : jstify = true
Case "4" : ReplaceSpacesB4LinesWithTab(FandR)  : jstify = true
           DeleteExcessInteriorSpaces(FandR)
           Justify() : RunTime = Runtime + (Timer - lTime)
           MsgBox "All routines finished." : lTime = Timer : Exit Sub
Case Else : Goto ASK
End Select
GoTo ASK
SILENT2:
If O2_1 then ReplaceSpacesB4LinesWithTab(FandR)
If O2_2 then DeleteExcessInteriorSpaces(FandR)
If O2_3 then Justify()
End Sub

Sub ReplaceSpacesB4LinesWithTab(FandR as Object)
MaxIndent = 10 'Any indent in excess of this will be ignored.
FandR.setSearchString("^ *") 'find any number of spaces at beginning of line
Find = oDoc.findFirst(FandR) 'replace with tab, to replace with nothing use ""
While NOT isNull(Find)
 If Len(Find.String) <= MaxIndent then Find.String = Chr(9)
 Find = oDoc.findNext(Find.End,FandR)
Wend   
End Sub

Sub DeleteExcessInteriorSpaces(FandR as Object)
SM = Chr(165) 'Space marker.
FandR.setSearchString("^ *") 'find any number of spaces at beginning of paragraph
Find = oDoc.findFirst(FandR)
While NOT isNull(Find)
Find.String = String(Len(Find.String),SM) 'replace with placeholders
Find = oDoc.findNext(Find.End,FandR)
Wend
FandR.setSearchString(" *") 'find any number of spaces
FandR.setReplaceString(" ") 'replace with one space
oDoc.ReplaceAll(FandR) 'do it
FandR.setSearchString("^" & SM & "*") 'find any number of placeholders at beginning of line
Find = oDoc.findFirst(FandR) 'turn them back into spaces
While NOT isNull(Find)
Find.String = String(Len(Find.String)," ")
Find = oDoc.findNext(Find.End,FandR)
Wend
End Sub

Sub Justify()
oTC = oDoc.Text.CreateTextCursor()
oTC.gotoEnd(true)
oTC.setPropertyValue("ParaAdjust",2)
End Sub

Sub EndMessage(ShowFinished,MarkIt)
If NOT ShowFinished then Exit Sub
a$ = "Your original document was saved to the Clipboard and can be retrieved from "
b$ = "there if you do not like the macro results. " 
RunTime = RunTime + (Timer - lTime)
c$ = "Total processing time was "& RunTime &" Second(s) or "& RunTime/60 &" Minute(s)."
d$ = "If you want to use the normal end of program options run this macro "
d$ = d$ & "again on the file or a selection. " 
If MarkIt then
 e$ = "A previously processed marker has been inserted as the last file character. "
EndIf
If NOT Over60K then
  MsgBox (a$ & b$ & e$ & Chr(13) & c$,0,"FINISHED!")
 Else MsgBox (d$ & e$ & Chr(13) & Chr(13) & c$,0,"FINISHED!")
EndIf 
End Sub
'Version 2.2  5-6-06
Last edited by JohnV on Wed Jul 04, 2012 6:19 pm, edited 1 time in total.
davex44
Posts: 4
Joined: Wed Mar 23, 2011 4:02 pm

Re: Convert ASCII text files by deleting extra paragraph bre

Post by davex44 »

Thanks for posting the code.

Am new here and new to the macros. How do I load your code and run it.

Thanks,

David
Open Office 3.10
Windows 7 Version 6.1
JohnV
Volunteer
Posts: 1585
Joined: Mon Oct 08, 2007 1:32 am
Location: Kentucky, USA

Re: Convert ASCII text files by deleting extra paragraph bre

Post by JohnV »

davex44
Posts: 4
Joined: Wed Mar 23, 2011 4:02 pm

Re: Convert ASCII text files by deleting extra paragraph bre

Post by davex44 »

Hey John, Thanks for the quick reply. Best, David
Open Office 3.10
Windows 7 Version 6.1
User avatar
Digger
Posts: 45
Joined: Sat Jan 14, 2012 11:29 pm

Re: Convert ASCII text files by deleting extra paragraph bre

Post by Digger »

John,

I've got a *-load of ASCII files to convert to OO Writer format....your snippet will be a lifesaver!

Thanks! :bravo:
LibreOffice 5.2.7.2
Debian 9 (Stretch) (Linux)
crgreen
Posts: 1
Joined: Fri Jan 02, 2015 7:17 am

Re: Convert ASCII text files by deleting extra paragraph bre

Post by crgreen »

I briefly posted a reply asking for help with a problem that turned out to have "fixed itself" -- meaning that I do not know what the initial problem was; it was probably some weird formatting problem in pasting code copied from my old machine to my new machine. I deleted the post because the problem doesn't exist, but I would like to say thank you for making this macro available . I've used it now and again for the past several years (to help with text formatting after creating a script application to work with the tesseract OCR binary in 2007 [which I still use on occasion]).
I do have another question, though. Have you ever 'ported' (or thought about 'porting') the functionality of this macro to some other scripting language?

-cg
NeoOffice 3.4.1 on MacOS 10.10.1
JohnV
Volunteer
Posts: 1585
Joined: Mon Oct 08, 2007 1:32 am
Location: Kentucky, USA

Re: Convert ASCII text files by deleting extra paragraph bre

Post by JohnV »

Have you ever 'ported' (or thought about 'porting') the functionality of this macro to some other scripting language?
No and don't intend to.
User avatar
Digger
Posts: 45
Joined: Sat Jan 14, 2012 11:29 pm

Re: Convert ASCII text files by deleting extra paragraph bre

Post by Digger »

I can confirm that this workhorse macro, which worked beautifully in OO 3.2.1, also works well with LibreOffice 3.5.4.2 - I just converted a 72,000 word ASCII file...

One hint: The way I've always used this snippet is to copy and paste the subject ASCII text into an OO/LO document. Then, before running the subject macro, I save and close the host OO/LO document, then reopen same. If I don't go through this save/close/reopen process, the macro does not work properly.
Last edited by Digger on Thu Jun 23, 2022 9:18 pm, edited 2 times in total.
LibreOffice 5.2.7.2
Debian 9 (Stretch) (Linux)
User avatar
Lupp
Volunteer
Posts: 3542
Joined: Sat May 31, 2014 7:05 pm
Location: München, Germany

Re: Convert ASCII text files by deleting extra paragraph bre

Post by Lupp »

As LibO V3.5.4 was mentioned, and the most recent released version of LibO is V 5.4.2, I did a very superficial short test which seemed to show that the huge "snippet" also works there roughly as expected.

One hint:
If applied to a 'Writer' document containing frames in addition to the imported text, the Sub obviously tries to work also with the text content of the frames, but does not concatenate the paragraphs therein as expected.
On Windows 10: LibreOffice 24.2 (new numbering) and older versions, PortableOpenOffice 4.1.7 and older, StarOffice 5.2
---
Lupp from München
TeeGeeRoo
Posts: 1
Joined: Wed May 06, 2020 9:08 pm

Re: Convert ASCII text files by deleting extra paragraph bre

Post by TeeGeeRoo »

Seems like great code. Thanks for posting. :bravo:

However, when I attempt to use, it does literally nothing to my document. :? :cry:

Procedure:
1. I copy all ASCII text, paste to .odt file, save, close, reopen.
2. Select All, then go to Tools>Macros>Run Macro..., choose the macro.
3. From Macro Selector, I choose Run... I then get a dialog stating "Process selected text only?", I choose "Yes".
4. No backup over 60K warning dialog next, I choose "Yes".
5. Keep short paragraphs, I choose "No" (I want nothing but a character string).
6. Fix offset paragraphs dialog is next, I have chosen both "0- Remove nothing" and "3-Both".
7. An "Untitled 1" file opens in a new window - and, nothing...
:crazy: :knock:
LibreOffice Version: 6.3.6.2 (x64)
Build ID: 2196df99b074d8a661f4036fca8fa0cbfa33a497
CPU threads: 8; OS: Windows 10.0; UI render: GL; VCL: win;
Locale: en-US (en_US); UI-Language: en-US
Calc: threaded
John_Ha
Volunteer
Posts: 9583
Joined: Fri Sep 18, 2009 5:51 pm
Location: UK

Re: Convert ASCII text files by deleting extra paragraph bre

Post by John_Ha »

See more ideas in [Tutorial] How do I remove end_of_paragraph marks? especially the add-on.
LO 6.4.4.2, Windows 10 Home 64 bit

See the Writer Guide, the Writer FAQ, the Writer Tutorials and Writer for students.

Remember: Always save your Writer files as .odt files. - see here for the many reasons why.
User avatar
Hagar Delest
Moderator
Posts: 32627
Joined: Sun Oct 07, 2007 9:07 pm
Location: France

Re: Convert ASCII text files by deleting extra paragraph bre

Post by Hagar Delest »

Works fine for me. Are you sure that they are paragraph breaks and not line feeds?
LibreOffice 7.6.2.1 on Xubuntu 23.10 and 7.6.4.1 portable on Windows 10
User avatar
RoryOF
Moderator
Posts: 34586
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: Convert ASCII text files by deleting extra paragraph bre

Post by RoryOF »

I always remove paragraph breaks and line feeds with a few passes of Find and Replace, as I have detailed before. No need for any macro.
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
John_Ha
Volunteer
Posts: 9583
Joined: Fri Sep 18, 2009 5:51 pm
Location: UK

Re: Convert ASCII text files by deleting extra paragraph bre

Post by John_Ha »

I have just done a test using the add-on OOoFBTools.

My test large file is a .odt with six copies of Vanity Fair downloaded from Project Gutenberg. The file had 4,938 A4 pages and 1.8 million words.

The problem is that each line is a separate paragraph.
Before
Before
I went OOoFBTools > Join broken lines of a paragraph..., with the settings as below and clicked Join.
Settings
Settings
AOO took about 10 minutes but the result is 99.99% perfect. The file is now 2,507 A4 pages.
After
After
LO 6.4.4.2, Windows 10 Home 64 bit

See the Writer Guide, the Writer FAQ, the Writer Tutorials and Writer for students.

Remember: Always save your Writer files as .odt files. - see here for the many reasons why.
User avatar
Digger
Posts: 45
Joined: Sat Jan 14, 2012 11:29 pm

Re: Convert ASCII text files by deleting extra paragraph breaks

Post by Digger »

I can confirm that this workhorse macro (the one addressed in the OP), which worked beautifully in LibreOffice 3.5.4.2, also works just as well in LibreOffice 5.2.7.2 - I just converted a 61,000 word ASCII file...
LibreOffice 5.2.7.2
Debian 9 (Stretch) (Linux)
Post Reply