Convert ASCII text files by deleting extra paragraph breaks

Creating Extension - Shared Libraries
Forum rules
For sharing working examples of macros / scripts. These can be in any script language supported by OpenOffice.org [Basic, Python, Netbean] or as source code files in Java or C# even - but requires the actual source code listing. This forum is not for asking questions about writing your own macros.

Convert ASCII text files by deleting extra paragraph breaks

Postby JohnV » Wed Jan 05, 2011 6:48 pm

EDITED 7-4-12 simply to note that the macro appears, from a small test, to continue to work with ApacheOpenOffice's new implementation of regular expressions.

This is the same code stored at OOoForum.org. I worked too hard on it to risk it's potential loss. I think the maintainer of that forum will always act responsibly but he may die.

This macro removes the excess paragraph breaks from an ASCII editor type file and also works on text coped & pasted from the Web that has line breaks inserted like a message in these forums.

It also provides provides other options to indent each paragraph, reduce the spacing between paragraphs, change spaced indent to tabs, remove excess interior spacing, strip all indents and justification of he results. One routine in this macro is designed to reformat text that has been scanned and then OCRed directly into OOo. If your OCR program creates a left margin by inserting spaces before each line then you might find this handy.

Trying to reformat an ASCII file is an exercise in guesswork at best and you shouldn't expect perfect results. This macro has to make assumptions about what is truly the end of a paragraph, a title or part of a list.

Code: Select all   Expand viewCollapse view
'Version 2.2   5-6-06  John Vigor
'Converts ASCII text files, or selected text within them, by stripping out excess
'paragraph breaks. Works with items copied & pasted from the Web that contain line
'breaks such as a message in these forums.
'WARNING - Anything stored on the Clipboard will be overwritten. A copy of your
'original file will not be saved to the Clipboard if a file greater then 60K
'characters or selected text of any size and you are responsible for otherwise
'backing it up. On the other hand, your original file will not be changed unless you
'save the macro results and overwrite the original but this assumes a saved file.
'Sample processing times on a 770 MHz machine in Pages/Seconds format:
'  10/4, 20/7, 40/16, 80/34, 160/81 (1.35 Min.).
'Hint for long documents - TURN OFF AUTO-SPELLCHECKING. This will save time.
'You can control what items the program asks you about and what happens if you choose
'not to be asked about an item by editing the variables below. These variables are
'ignored if you run the macro on selected text so if you customize them and need to
'be asked about items for a particular file then simply select the entire file with
'Ctrl+A before running the macro.
'You should become familiar with how the program works before you attempt to customize
'it. It works differently if the file is less or more than 60K and when processing a
'selection.
'
Sub ASCII_Formatter_StartHere
'VARIABLES YOU CAN CHANGE.
AskShortParagraphs = True 'Show the query about keeping short paragraphs.
'Default answer if AskShortParagraphs is False.
KeepShortParagraphs = True 'HIGHLY recommended and the faster of the two methods. I
'only maintain the other method because it's needed for one option.
'--
AskShortParaLength = True 'Show a chance to adjust the program's estimate of what should
'be considered a short paragraph that will kept.
ShortDef = 20 'The minimum number of characters short of the right margin that a line
'must end to be a short paragraph. The paragraph break at the end of any short paragraph
'will be maintained.
'--
AskViewOptions = True 'Show the query about end of program Options.
'Default, if AskViewOptions is False.
GoToOptions = True 'If True the Option variables below will take control. They will
'also control if AskViewOptions is True and you choose to go to the Options.
'--
'The following values control what Options are displayed following initial file processing:
Show1stOptionSet = True 'Show the first set of Options, which are Indent All Paragraphs
'and Reduce Paragraph Spacing.
'Defaults, if Show1stOptionSet is False.
O1_1 = False 'Indent all paragraphs.
O1_2 = False 'Reduce paragraph spacing.
  AskMaxParaSpacing = True 'Ask for the maximum number of blank "lines" between paragraphs.
   O1_2_1 = 1 'Default, if AskMaxParaSpacing is False.
Show2ndOptionSet = True 'Options are Remove Excess Interior Spaces, Change Spaced
'Indents to Tabs and Justify All Paragraphs.
'Defaults, if Show2ndOptionset is False.
O2_1 = False 'Remove excess interior spaces.
O2_2 = False 'Change spaced indents to tabs.
O2_3 = False 'Justify text.
'--
'The following control aspects of full file, selection and/or Option processing and
'are not ignored if you process a selection:
ShowBackUpWarning = True 'Show warning that file not copied to Clipboard if over 60K.
ShowFinished = True 'Show the finished message.
PageBite = 10 'Number of pages processed at a time for selected text or a file over 60K
'characters. Seems pretty good but I haven't seriously tried to optimize it.
ViewBegin = True 'The cursor and your view are taken to the beginning of the document
'when macro ends. "False" will leave you at the end.
StripHyphens = True 'Assumes hyphens located at the right margin are editorial and do
'not seperate true hyphenated words like "half-dollar".   
CheckForcedLeftMargin = True 'A VERY quick routine if spaces imitating a left margin
'don't exist. Or it will delete such a margin if they do (known to work for my version of
'ScanSoft's OCR software - the 1st line of the file will contain only a series of spaces).
StripIndents = "0" 'This controls what happens during initial file processing. The default
'value is recommended. The default ("0") is to leave all tabbed or spaced paragraph indents,
'"1" will strip spaced indents, "2" will strip tabbed ones and "3" will strip both kinds.
'If you run the macro on selected text and don't choose to go to the Options then you
'will also be asked about this. One use for this is to deal with paragraphs offset, as
'opposed to just 1st line indented, from the left margin. Normal processing will leave
'all "lines" of such paragraphs indented and followed by paragraph breaks. Stripping
'out the indents will convert these to regular paragraphs which can be edited normally
'in Writer. Running the macro on selected text after normal processing is a personal
'favorite of mine because I often scan documents with offsets which I need to edit.
MarkIt = True 'Insert the "MarkWith" character as the last character of the file to
'indicate it has been previouly processed. Allows you to run the macro again and go
'directly to the other Options without wading through the full initial file processing
'again. I do not recommend changing as it controls how the program works on a 2nd run.
MarkWith = Chr(160) 'A nonbreaking space in most fonts, which isn't seen on printing.
Override60K = False 'Setting this to True may significantly slow runtime on large files
'or selections but no seperate processing document will be used if you don't like this.
'.....................................................................................
'DO NOT EDIT BELOW HERE UNLESS YOU KNOW WHAT YOU ARE DOING.
lTime = Timer : RunTime = 0 : Skip = false : Over60K = false : ProcSel = false
PrevProc = false : IsSelect = false : ASPL = AskShortParaLength : LastSection = False
thisDoc = thisComponent : oDoc = thisDoc 'oDoc may change.
thisVC = thisDoc.CurrentController.getViewCursor : oVC = thisVC 'oVC may change.
thisText = thisDoc.Text 
MarkSel = thisDoc.Text.createTextCursorByRange(thisVC)'Mark any selection.
thisFrame = thisDoc.CurrentController.getFrame()
dispatcher = createUnoService("com.sun.star.frame.DispatchHelper")
FandR = thisDoc.createReplaceDescriptor() 'Find & Replace initial set up.
FandR.searchRegularExpression = true      'Use regular expressions.
If NOT thisVC.isCollapsed then IsSelect = true
If IsSelect then
  a$ = "Process selected text only?"& Chr(13) & Chr(13) &"Cancel to Quit."
  RunTime = RunTime + (Timer - lTime)
  iAns = MsgBox(a$,3,"Text in this file has been selected.")
  lTime = Timer : If iAns = 2 then End
  If iAns = 6 then ProcSel = true
EndIf
If ProcSel then 'Tests for over 60K.
If Len(MarkSel.String) > 60000 OR Len(MarkSel.String) = 0 then Over60K = true
ELSEIf thisDoc.characterCount > 60000 then Over60K = true
EndIF
If Override60K then Over60K = false
If NOT Over60K then LastSection = true
If Over60K AND NOT ProcSel then MarkSel.gotoStart(false) : MarkSel.gotoEnd(true)
thisTC = thisDoc.Text.createTextCursor : thisTC.gotoEnd(false)
thisTC.goLeft(1,true) 'Get the last character. If a binding space
'then file was previously processed. Process or go to Options?
IF thisTC.String = MarkWith then 'Was file previouly processed?
PrevProc = True 'Previously processed.
Skip = SkipToOptions(Show1stOptionSet,Show2ndOptionSet)
EndIf
'++++++++++++++++++++++++++ All basic information gathered, now control program flow.
Select Case Skip
Case True  'Go directly to the options.
  Select Case ProcSel 'Do we need to deal with selected text?
   Case False : BackUp(thisFrame,dispatcher)
         RunOptions(Show1stOptionSet,Show2ndOptionSet,FandR)
   Case True : NoBackUpWarning(ShowBackUpWarning)
         SetUpSelection(thisFrame,thatFrame,"Copy",dispatcher,MarkSel,IsSelect,Skip)
         RunOptions(Show1stOptionSet,Show2ndOptionSet,FandR)
         FinishSelection(thatFrame,MarkSel,FandR,dispatcher,MarkSel,thisVC)
  End Select
Case False 'Going to do the regular file processing.
  Select Case ProcSel 'Do we need to deal selected text?
   Case True : NoBackUpWarning(ShowBackUpWarning)
         AskShortParas(AskShortParagraphs,ShortDef,thisTC,ASPL)
         AskIndents() : If CheckForcedLeftMargin then ForcedLeftMargin(thisTC,FandR)
         SetUpSelection(thisFrame,thatFrame,"Copy",dispatcher,MarkSel,IsSelect,Skip)
         If Over60K then
           ProcessOver60K(FandR,thisVC,thisDoc,dispatcher,MarkSel,PageBite)
          Else RunMainRoutines(FandR)
           FinishSelection(thatFrame,MarkSel,FandR,dispatcher,MarkSel,thisVC)
         EndIf
         oDoc = thisDoc
         AskRunOptions(AskViewOptions,Show1stOptionSet,Show2ndOptionSet,FandR)
   Case False 'Normally this would be the 1st time entire file is processed. 
              'Won't ask about indents because most users won't care.
         IF Over60K then NoBackUpWarning(ShowBackUpWarning)     
         If NOT Over60K then BackUp(thisFrame,dispatcher)
         AskShortParas(AskShortParagraphs,ShortDef,thisTC,ASPL)
         If CheckForcedLeftMargin then ForcedLeftMargin(thisTC,FandR)
         If Over60K then
           SetUpSelection(thisFrame,thatFrame,"Copy",dispatcher,MarkSel,IsSelect,Skip)
           ProcessOver60K(FandR,thisVC,thisDoc,dispatcher,MarkSel,PageBite)
           oDoc = thisDoc
          Else
           RunMainRoutines(FandR)
         EndIf
         AskRunOptions(AskViewOptions,Show1stOptionSet,Show2ndOptionSet,FandR)
  End Select               
End Select
'++++++++++++++++++++++++++++++
thisVC.gotoEnd(False)
If NOT PrevProc then
thisText.insertString(thisVC,MarkWith,False)
EndIf
EndMessage(ShowFinished,MarkIt)
If ViewBegin then thisVC.gotoStart(false)
End Sub

Private lTime, KeepShortParagraphs, oVC, StripHyphens, LastSection
Private oDoc, RunTime, Over60K, thisFrame, thatFrame, thisVC, PrevProc
Private StripIndents, jstify, dispatcher, ShortPara, ProcSel
Private O1_1, O1_2, O1_2_1 ,O2_1, O2_2, O2_3, AskMaxParaSpacing, GoToOptions

Function SkipToOptions(Show1stOptionSet,Show2ndOptionSet)
a$ = "Do you want to skip directly to the end of program Options?"
b$ = Chr(13) & "Cancel to quit." : RunTime = RunTime + (Timer - lTime)
iAns = MsgBox(a$ & b$,3,"This file has been previously processed.")
lTime = Timer : If iAns = 2 then End
If iAns = 7 then
  SkipToOptions() = false
Else Show1stOptionSet = true : Show2ndOptionSet = true : SkipToOptions() = true
EndIf
End Function

Sub CopyOrCutIt(whichFrame,doWhich,dispatcher)
dispatcher.executeDispatch(whichFrame,".uno:" & doWhich,"",0,Array())
End Sub

Sub PasteIt(whichFrame,dispatcher)
dispatcher.executeDispatch(whichFrame,".uno:Paste","",0,Array())
End Sub

Sub BackUp(thisFrame,dispatcher) 'Back up file to clipboard.
dispatcher.executeDispatch(thisFrame,".uno:SelectAll","",0,Array())
CopyOrCutIt(thisFrame,"Copy",dispatcher)
End Sub

Sub NoBackupWarning(ShowBackUpWarning)
If NOT ShowBackUpWarning then Exit Sub
a$ = "Because you are processing selected text or the file contains more than 60K "
a$ = a$ & "characters no backup copy of your file will be stored to the clipboard!"
a$ = a$ & " Optional > Auto-spellcheck should be off." & Chr(13)
b$= "DO YOU HAVE THIS FILE BACKED UP?  ('Yes' will continue - 'No' will abort.)"
c$ = String(15," ") & "NO BACKUP!" : RunTime = RunTime + (Timer - lTime)
iAns = MsgBox (a$ & b$,308,c$ & c$ & c$) : lTime = Timer
If iAns = 7 then End
End Sub

Sub SetUpSelection(thisFrame,thatFrame,doWhat,dispatcher,MarkSel,IsSelect,Skip)
oDoc = StarDesktop.loadComponentFromURL("private:factory/swriter","_blank",0,Array())
thatFrame = oDoc.CurrentController.Frame 'Get new doc frame.
oVC = oDoc.CurrentController.getViewCursor 'Set oVC to new doc.
If NOT Over60K OR Skip then
  thisVC.gotoRange(MarkSel.Start,false) : thisVC.gotoRange(MarkSel.End,true)
  CopyOrCutIt(thisFrame,"Copy", dispatcher) 'Then fill it
  PasteIt(thatFrame,dispatcher)             'with the selection.
  BackUp(thisFrame,dispatcher) 'If not big selection or are doing Options
EndIf                         'then backup file to clipboard.
End Sub

Sub FinishSelection(thatFrame,MarkSel,FandR,dispatcher,MarkSel,thisVC)
dispatcher.executeDispatch(thatFrame,".uno:SelectAll","",0,Array())
CopyOrCutIt(thatFrame,"Cut",dispatcher)
thisVC.gotoRange(MarkSel.Start,false)
thisVC.gotoRange(MarkSel.End,true)
PasteIt(thisFrame,dispatcher)
oDoc.dispose : If jstify then thisVC.setPropertyValue("ParaAdjust",2)
End Sub

Sub AskShortParas(AskShortParagraphs,ShortDef,thisTC,ASPL)
thisTC.gotoStart(false)
While thisTC.isEndOfParagraph() 'Delete empty paragraphs at top of doc.
  thisTC.goRight(1,true) : thisTC.String = ""
Wend
If NOT AskShortParagraphs AND NOT KeepShortParagraphs AND NOT ProcSel then Exit Sub
If NOT AskShortParagraphs AND NOT ProcSel then iAns = 6 : goto Continue
a$ = "Generally highly recommended. This saves the formatting of lists,"
b$ = " tables of contents," & Chr(13) & "indexes, etc. that are not indented."
c$ = " This is also the fastest method."
d$ = "KEEP SHORT PARAGRAPHS?"
RunTime = RunTime + (Timer - lTime)
iAns = Msgbox (a$ & b$ & c$,3,d$) : lTime = Timer
If iAns = 2 then End
If iAns = 7 then KeepShortParagraphs = false
CONTINUE:
If iAns = 6 then
  KeepShortParaGraphs = true : thisVC.gotoStart(false) : thisTC.gotoStart(false)
  If NOT PrevProc then
    Do
     If NOT thisTC.isEndOfParagraph then
      thisTC.gotoEndOfParagraph(true) : cnt = cnt +1
      If Len(thisTC.String) > ParaLen then ParaLen = Len(thisTC.String)
     EndIf
    Loop While thisTC.gotoNextParagraph(false) and cnt < 50 
   Else Do 'Get max length of 1st 50 paragraphs which is probably the
         If NOT thisVC.isAtEndOfLine then 'original right margin.
          thisVC.gotoEndOfLine(true) : cnt = cnt + 1
          If (Len(thisVC.String) > ParaLen) then ParaLen = Len(thisVC.String)
         EndIF
         Loop While thisVC.goRight(1,false) and cnt < 50
  EndIF
  ShortPara = ParaLen - ShortDef 'An arbitary guess of paragraph lengths for
  If ASPL OR ProcSel then            'titles, lists, etc.
  a$ = "You can adjust the short paragraph length, currently set at " & ShortPara
  b$ = ". This is " & ShortDef & " characters short of the estimated right margin of "
  c$ = "the original document based on a sampling of up to 50 lines. 'Cancel' will "
  c$ = c$ & "end the program."
  RunTime = RunTime + (Timer - lTime)
  ShortPara = InputBox(a$ & b$ & c$,"ADJUST SHORT PARAGRAPH LENGTH?",ShortPara)
  lTime = Timer : If ShortPara = "" then End
  ShortPara = Cint(ShortPara): If ShortPara < 1 then ShortPara = 1
  EndIf
EndIf
End Sub

Sub AskIndents()
a$ = "0 - Remove nothing." & Chr(13) &"1 - Remove spaced indents." & Chr(13)
a$ = a$ & "2 - Remove tabbed indents.    3 - Both, or click Cancel to quit."
Query: RunTime = RunTime + (Timer - lTime)
sAns = InputBox(a$,"Fix offset paragraphs? (0, 1, 2, 3 or Cancel to quit.)","0")
lTime = Timer : If sAns = "" then End
If instr("0123",sAns) = 0 then goto Query
StripIndents = sAns
If Cint(StripIndents) > 0 AND KeepShortParagraphs then AskOffset()
End Sub

Sub AskOffset()
a$ = "If you want to convert an offset paragraph(s) to a normal one then the program "
a$ = a$ & "will not keep short paragraphs." & Chr(13) & "Is this what you want to do?"
Runtime = Runtime + (Timer - lTime)
iAns = MsgBox(a$,4,"Convert offset paragraphs?") : lTime = Timer
If iAns = 6 then KeepShortParagraphs = false
End Sub

Sub ProcessOver60K(FandR,thisVC,thisDoc,dispatcher,MarkSel,PageBite)
'View cursor to entire selection range.
thisVC.gotoRange(MarkSel.Start,false) : thisVC.gotoRange(MarkSel.End,True)
LastPage = thisVC.getPage 'Get last page of selection.
oSelTC = thisDoc.Text.CreateTextCursorByRange(thisVC.Start)
thisVC.collapseToStart : GetSel = thisVC.getPage : PageBite = PageBite - 1
Do
  GetSel = GetSel + PageBite 'Get PageBite pages at a time
  If GetSel < LastPage then
    thisVC.jumpToPage(GetSel)'Only the view cursor can jump.
    oSelTC.gotoRange(thisVC,true)
    If Not oSelTC.isEndOfParagraph then
     Do 'Make sure the selection ends at a blank paragraph
      oSelTC.gotoNextParagraph(true)
     Loop Until oSelTC.isEndOfParagraph
    EndIF
   Else LastSection = true : thisVC.gotoRange(MarkSel.End,false)
    oSelTC.gotoRange(thisVC,true)
  EndIf
  thisVC.gotoRange(oSelTC,true)'Only the view cursor can capture text to copy it.
  CopyOrCutIt(thisFrame,"Copy",dispatcher)
  PasteIt(thatFrame,dispatcher)
  RunMainRoutines(FandR)
  oVC.gotoStart(false): oVC.gotoEnd(true)'Get entire selection.
  CopyOrCutIt(thatFrame,"Cut",dispatcher) : PasteIt(thisFrame,dispatcher)
  oSelTC.collapseToEnd   
Loop While NOT LastSection
oDoc.dispose(true) : LastSection = false
End Sub

Sub RunMainRoutines(FandR)
CleanUp(FandR)
StripParagraphBreaks(FandR,StripIndents)
End Sub

Sub ForcedLeftMargin(thisTC,FandR)'Delete a forced left margin appearing
thisTC.gotoStart(false) : thisTC.gotoEndOfParagraph(true)'in some OCRed text files.
Margin$ = String(Len(thisTC.String)," ")
If Len(Margin$) > 0 then
FandR.setSearchString("^" & Margin$)
Find = oDoc.findFirst(FandR)
Do While Not IsNull(Find)'Can't use "replaceAll" because
  Find.String = ""'multiple a$s in one line will be deleted.
  If Find.gotoNextParagraph(false) then
    Find = oDoc.findNext(Find.End,FandR)
   Else Exit Do
  EndIf
Loop
EndIf
End Sub

Sub CleanUp(FandR)
FandR.setSearchString("\n")  'Just in case line breaks got into the doc change
FandR.setReplaceString("\n") 'them to paragraph breaks. This will also help with
oDoc.replaceAll(FandR)       'some text copied & pasted from the web.
FandR.setSearchString(" *$") 'Delete all spaces before paragraph breaks. A must!
FandR.setReplaceString("")
oDoc.replaceAll(FandR)
End Sub

Sub StripParagraphBreaks(FandR,StripIndents)
oTC = oDoc.Text.createTextCursor
oText = oDoc.getText() : s$ = Chr(32) & Chr(9)
Const KM = "�"
If KeepShortParagraphs then
  If NOT PrevProc then
    FandR.setSearchString(".{" & ShortPara+1 & "}$") 'Find long paragraphs
   Else
    Do
     oTC.gotoEndOfParagraph(false) : oVC.gotoRange(oTC,false)
     oVC.gotoStartOfLine(true)
     If Len(oVC.String) > ShortPara then oVC.String = oVC.String & KM
    Loop While oTC.gotoNextParagraph(false)
  EndIf
Else FandR.setSearchString(".*$") 'or find all non-blank ones
EndIf                              'and mark them (at rear).
FandR.setReplaceString("&" & KM)
oDoc.replaceAll(FandR)
FandR.setSearchString("^ *") 'How to handle spaced indents.   
If StripIndents = "0" or StripIndents = "2" then
  FandR.setReplaceString(KM & "&") 'Mark them (at front).
Else FandR.setReplaceString("") 'Strip them.
EndIf
oDoc.replaceAll(FandR)       
FandR.setSearchString("^\t*") 'How to handle tabbed indents.
If StripIndents = "0" or StripIndents = "1" then
  FandR.setReplaceString(KM & "&") 'Mark them.
Else FandR.setReplaceString("") 'Strip them.
EndIf
oDoc.replaceAll(FandR)           
If LastSection then 'Make sure correct "ending" paragraph is marked,
oTC.gotoEnd(false) 'otherwise para breaks may be added or stripped.
If NOT oTC.isStartOfParagraph then 'If last para has text then
   oTC.goLeft(1,true)               'make sure it's not marked.
   If oTC.String = KM then oTC.String = ""
EndIf
EndIF
oTC.gotoStart(false)
Do 'Find marked paragraph breaks and replace with a space.
Do While NOT oTC.isEndOfParagraph
  oTC.gotoEndOfParagraph(true)
  If Right(oTC.String,1) = KM then
   oTC.collapseToEnd : oTC.goRight(1,true)
   oTC.String = " " : oTC.CollapseToEnd
  EndIf
Loop
Loop While oTC.gotoNextParagraph(false)
FandR.setSearchString(" " & KM & " *") 'Find space followed by marker and any
FandR.setReplaceString("\n&") 'trailing spaces which may now be in mid paragraph.
oDoc.replaceAll(FandR)        'Insert break and keep found stuff. 
FandR.setSearchString("^ *" & KM) 'Find 0 or more spaces followed by marker at
FandR.setReplaceString("")        'beginning of paragraph and delete found stuff.
oDoc.replaceAll(FandR)
FandR.setSearchString(" $") 'Replace paragraph ending spaces with a break.   
FandR.setReplaceString("\n")
oDoc.replaceAll(FandR)
oDoc.replaceAll(FandR)
FandR.setSearchString(KM) 'Delete remaining markers.
FandR.setReplaceString("")
oDoc.replaceAll(FandR)
FandR.setSearchString("- ") 'Hyphens found at the right margin are now
If StripHyphens then 'hyphens followed by a space. What should be done?
  FandR.setReplaceString("") 'Delete all and join the word "parts". Default
Else FandR.setReplaceString("-") 'Delete the space but keep the hyphen.
EndIf
oDoc.replaceAll(FandR)
End Sub

Sub AskRunOptions(AskViewOptions,Show1stOptionSet,Show2ndOptionSet,FandR)
If NOT AskViewOptions AND NOT ProcSel then Goto Continue
a$ = Chr(13) & "Would you like to see the additional options?" & Chr(13)
Runtime = Runtime + (Timer - lTime)
iAns = MsgBox (a$,4,"MAIN FILE PROCESSING FINISHED") : lTime = Timer
If iAns = 6 then
   GoToOptions = true
  Else GotoOptions = false 
EndIf
CONTINUE:
If GoToOptions then
  EndIt = FirstOptions(Show1stOptionSet,Show2ndOptionSet,FandR)
  If EndIt then Exit Sub
  SecondOptions(Show2ndOptionSet,FandR)
EndIf
End Sub

Sub RunOptions(Show1stOptionSet,Show2ndOptionSet,FandR)
EndIt = FirstOptions(Show1stOptionSet,Show2ndOptionSet,FandR)
If EndIt then Exit Sub
SecondOptions(Show2ndOptionSet,FandR)
End Sub

Function FirstOptions(Show1stOptionSet,Show2ndOptionSet,FandR)
If NOT Show1stOptionSet AND NOT O1_1 AND NOT O1_2 AND NOT ProcSel then Exit Function
If NOT Show1stOptionSet AND NOT ProcSel then goto Silent1
a$ = "1. Indent all paragraphs." : b$ =  Chr(13) & "2. Reduce paragraph spacing."
c$ = Chr(13) & "3. Continue to other options.     'Cancel' will end the program."
d$ = "FIRST OPTION SET - Chose by number (1, 2 or 3) or 'Cancel' to end program."
ASK: RunTime = RunTime + (Timer - lTime)
sAns = InputBox (a$ & b$ & c$,d$,"3") : lTime = Timer
Select Case sAns
Case "1" : Indent(FandR) : a$ = a$ & " COMPLETED!"
Case "2" : ReduceParaSpacing() : b$ = b$ & " COMPLETED!"
Case "3" : Exit Function
Case ""  : FirstOptions() = true : Exit Function
Case Else : Goto ASK
End Select
Goto ASK
SILENT1:
If O1_1 then Indent(FandR)
If O1_2 then ReduceParaSpacing()
End Function
   
Sub Indent(FandR)
FandR.setSearchString("^[!-" & Chr(255) & "]"'Find paragraphs not starting with space
FandR.setReplaceString("\t&")             ' or tab. Insert tab & keep what was found.
oDoc.replaceAll(FandR)
End Sub

Sub ReduceParaSpacing()
If NOT AskMaxParaSpacing AND NOT ProcSel then ParaSpacing = O1_2_1 : goto Continue
a$ = "Enter the maximum number of blank lines you want between paragraphs."
b$ = Chr(13) & "Zero is a valid entry.              'Cancel' will quit this routine."
ASK: RunTime = RunTime + (Timer - lTime)
ParaSpacing = InputBox (a$ & b$,"ALLOW HOW MANY?",1) : lTime = Timer
If ParaSpacing = "" then Exit Sub
If NOT IsNumeric(ParaSpacing) then goto Ask
ParaSpacing = Cint(ParaSpacing)
CONTINUE: oTC = oDoc.getText().createTextCursor()
oTC.goToStart(false) : bOK = true 'bOK will test for the end of the file.
While oTC.isEndOfParagraph = true and bOK = true 'Skip spacing at top.
bOK = oTC.goRight(1,false)'This will move to the next paragraph from a blank paragraph.             
Wend
bOK = true
Do 'Starting at the beginning of a paragraph.
cnt = 0
While oTC.isEndOfParagraph() and bOK       'Is it also the end of a paragraph, i.e,
  cnt = cnt + 1 'Count blank paragraphs.    'a blank paragraph?
   If (cnt > ParaSpacing) then
     oTC.goLeft(1,true) : oTC.String = ""  'Select it and delete it.
     bOK = oTC.goRight(1,false) 'Go right to next paragraph.
    Else
     bOK = oTC.goRight(1,false) 'Move to next paragraph.
   EndIf
Wend
Loop While oTC.goToNextParagraph(false) 'Another end of file check.
End Sub

Sub SecondOptions(Show2ndOptionSet,FandR)
If NOT Show2ndOptionSet AND NOT O2_1 AND NOT O2_2 AND NOT O2_2 AND NOT ProcSel then Exit Sub
If NOT Show2ndOptionSet AND NOT ProcSel then goto Silent2
a$ = "1. Replace spaced indents with tabs."
b$ = Chr(13) & "2. Remove excess interior spaces."
c$ = Chr(13) & "3. Justify text." : d$ = "       4. Run all."
e$ = "SECOND OPTION SET - Chose by number (1, 2, 3 or 4) or 'Cancel' to end program."
ASK: RunTime = RunTime + (Timer - lTime)
sAns = InputBox(a$ & b$ & c$ & d$,e$," ") : lTime = Timer
Select Case sAns
Case  "" : Exit Sub
Case "1" : ReplaceSpacesB4LinesWithTab(FandR) : a$ = a$ & " COMPLETED!"
Case "2" : DeleteExcessInteriorSpaces(FandR) : b$ = b$ & " COMPLETED!"
Case "3" : Justify() : c$ = c$ & " COMPLETED!" : jstify = true
Case "4" : ReplaceSpacesB4LinesWithTab(FandR)  : jstify = true
           DeleteExcessInteriorSpaces(FandR)
           Justify() : RunTime = Runtime + (Timer - lTime)
           MsgBox "All routines finished." : lTime = Timer : Exit Sub
Case Else : Goto ASK
End Select
GoTo ASK
SILENT2:
If O2_1 then ReplaceSpacesB4LinesWithTab(FandR)
If O2_2 then DeleteExcessInteriorSpaces(FandR)
If O2_3 then Justify()
End Sub

Sub ReplaceSpacesB4LinesWithTab(FandR as Object)
MaxIndent = 10 'Any indent in excess of this will be ignored.
FandR.setSearchString("^ *") 'find any number of spaces at beginning of line
Find = oDoc.findFirst(FandR) 'replace with tab, to replace with nothing use ""
While NOT isNull(Find)
If Len(Find.String) <= MaxIndent then Find.String = Chr(9)
Find = oDoc.findNext(Find.End,FandR)
Wend   
End Sub

Sub DeleteExcessInteriorSpaces(FandR as Object)
SM = Chr(165) 'Space marker.
FandR.setSearchString("^ *") 'find any number of spaces at beginning of paragraph
Find = oDoc.findFirst(FandR)
While NOT isNull(Find)
Find.String = String(Len(Find.String),SM) 'replace with placeholders
Find = oDoc.findNext(Find.End,FandR)
Wend
FandR.setSearchString(" *") 'find any number of spaces
FandR.setReplaceString(" ") 'replace with one space
oDoc.ReplaceAll(FandR) 'do it
FandR.setSearchString("^" & SM & "*") 'find any number of placeholders at beginning of line
Find = oDoc.findFirst(FandR) 'turn them back into spaces
While NOT isNull(Find)
Find.String = String(Len(Find.String)," ")
Find = oDoc.findNext(Find.End,FandR)
Wend
End Sub

Sub Justify()
oTC = oDoc.Text.CreateTextCursor()
oTC.gotoEnd(true)
oTC.setPropertyValue("ParaAdjust",2)
End Sub

Sub EndMessage(ShowFinished,MarkIt)
If NOT ShowFinished then Exit Sub
a$ = "Your original document was saved to the Clipboard and can be retrieved from "
b$ = "there if you do not like the macro results. "
RunTime = RunTime + (Timer - lTime)
c$ = "Total processing time was "& RunTime &" Second(s) or "& RunTime/60 &" Minute(s)."
d$ = "If you want to use the normal end of program options run this macro "
d$ = d$ & "again on the file or a selection. "
If MarkIt then
e$ = "A previously processed marker has been inserted as the last file character. "
EndIf
If NOT Over60K then
  MsgBox (a$ & b$ & e$ & Chr(13) & c$,0,"FINISHED!")
Else MsgBox (d$ & e$ & Chr(13) & Chr(13) & c$,0,"FINISHED!")
EndIf
End Sub
'Version 2.2  5-6-06
Last edited by JohnV on Wed Jul 04, 2012 6:19 pm, edited 1 time in total.
JohnV
Volunteer
 
Posts: 1584
Joined: Mon Oct 08, 2007 1:32 am
Location: Kentucky, USA

Re: Convert ASCII text files by deleting extra paragraph bre

Postby davex44 » Wed Mar 23, 2011 4:06 pm

Thanks for posting the code.

Am new here and new to the macros. How do I load your code and run it.

Thanks,

David
Open Office 3.10
Windows 7 Version 6.1
davex44
 
Posts: 4
Joined: Wed Mar 23, 2011 4:02 pm

Re: Convert ASCII text files by deleting extra paragraph bre

Postby JohnV » Wed Mar 23, 2011 5:30 pm

JohnV
Volunteer
 
Posts: 1584
Joined: Mon Oct 08, 2007 1:32 am
Location: Kentucky, USA

Re: Convert ASCII text files by deleting extra paragraph bre

Postby davex44 » Thu Mar 24, 2011 6:37 pm

Hey John, Thanks for the quick reply. Best, David
Open Office 3.10
Windows 7 Version 6.1
davex44
 
Posts: 4
Joined: Wed Mar 23, 2011 4:02 pm

Re: Convert ASCII text files by deleting extra paragraph bre

Postby Digger » Mon May 07, 2012 9:52 pm

John,

I've got a *-load of ASCII files to convert to OO Writer format....your snippet will be a lifesaver!

Thanks! :bravo:
Open Office 3.2.1
Debian Squeeze (Linux)
User avatar
Digger
 
Posts: 43
Joined: Sat Jan 14, 2012 11:29 pm

Re: Convert ASCII text files by deleting extra paragraph bre

Postby crgreen » Sat Jan 03, 2015 3:06 am

I briefly posted a reply asking for help with a problem that turned out to have "fixed itself" -- meaning that I do not know what the initial problem was; it was probably some weird formatting problem in pasting code copied from my old machine to my new machine. I deleted the post because the problem doesn't exist, but I would like to say thank you for making this macro available . I've used it now and again for the past several years (to help with text formatting after creating a script application to work with the tesseract OCR binary in 2007 [which I still use on occasion]).
I do have another question, though. Have you ever 'ported' (or thought about 'porting') the functionality of this macro to some other scripting language?

-cg
NeoOffice 3.4.1 on MacOS 10.10.1
crgreen
 
Posts: 1
Joined: Fri Jan 02, 2015 7:17 am

Re: Convert ASCII text files by deleting extra paragraph bre

Postby JohnV » Sun Jan 04, 2015 12:44 am

Have you ever 'ported' (or thought about 'porting') the functionality of this macro to some other scripting language?

No and don't intend to.
JohnV
Volunteer
 
Posts: 1584
Joined: Mon Oct 08, 2007 1:32 am
Location: Kentucky, USA

Re: Convert ASCII text files by deleting extra paragraph bre

Postby Digger » Sun Oct 15, 2017 7:22 am

I can confirm that this workhorse macro, which worked beautifully in OO 3.2.1, also works well with LibreOffice 3.5.4.2.

One hint: The way I've always used this snippet is to copy and paste the subject ASCII text into an OO/LO document. Then, before running the subject macro, I save and close the host OO/LO document, then reopen same. If I don't go through this save/close/reopen process, the macro does not work properly.
Last edited by Digger on Sun Oct 15, 2017 9:10 pm, edited 1 time in total.
Open Office 3.2.1
Debian Squeeze (Linux)
User avatar
Digger
 
Posts: 43
Joined: Sat Jan 14, 2012 11:29 pm

Re: Convert ASCII text files by deleting extra paragraph bre

Postby Lupp » Sun Oct 15, 2017 2:34 pm

As LibO V3.5.4 was mentioned, and the most recent released version of LibO is V 5.4.2, I did a very superficial short test which seemed to show that the huge "snippet" also works there roughly as expected.

One hint:
If applied to a 'Writer' document containing frames in addition to the imported text, the Sub obviously tries to work also with the text content of the frames, but does not concatenate the paragraphs therein as expected.
On Windows 10: LibreOffice 5.4.2 and older versions, PortableOpenOffice 4.1.3 and older, StarOffice 5.2
---
Maybe we might! (Create a powerful UFO: United Free Office)
Lupp from München
User avatar
Lupp
Volunteer
 
Posts: 1521
Joined: Sat May 31, 2014 7:05 pm


Return to Code Snippets

Who is online

Users browsing this forum: No registered users and 2 guests