XFDF to CSV-Converter

Shared Libraries
Forum rules
For sharing working examples of macros / scripts. These can be in any script language supported by OpenOffice.org [Basic, Python, Netbean] or as source code files in Java or C# even - but requires the actual source code listing. This section is not for asking questions about writing your own macros.
Post Reply
musikai
Volunteer
Posts: 294
Joined: Wed Nov 11, 2015 12:19 am

XFDF to CSV-Converter

Post by musikai »

Hi,
here we discussed the different possibilities how one can send out a PDF-Form and get the results back.
viewtopic.php?f=7&t=91468
The xml-option gives back a nice XFDF-xml-document.


To extract form data of Folder of PDFs (using PDFtk) in to .txt-files see
viewtopic.php?f=21&t=91586#p434033
To convert all .txt files to .xdfd files see:
viewtopic.php?f=21&t=91587
Create CSV from Formdata of PDFs in a folder (using PDFtk)
viewtopic.php?f=21&t=91588


This XFDF to CSV-Converter will convert a chosen .xfdf-file into a .cvs-file:
(for a batch converter of folder of .xfdf-files scroll down)

Code: Select all

Sub Convert_XFDF_to_CSV
If (Not GlobalScope.BasicLibraries.isLibraryLoaded("Tools")) Then GlobalScope.BasicLibraries.LoadLibrary("Tools")
   
rem---filepicker
Fileurl=""
rem to not use the filepicker set the file here and uncomment:
rem Fileurl=converttourl("C:\Users\kai\Desktop\PDFform_xml.xfdf")
if Fileurl="" then
oFileDlg = CreateUnoService("com.sun.star.ui.dialogs.FilePicker")
oFileDlg.AppendFilter("all files", "*.*")
oFileDlg.AppendFilter("XFDF files","*.xfdf")
oFileDlg.SetCurrentFilter("XFDF files")
rem oFileDlg.setDisplayDirectory(workdirfull)
If oFileDlg.execute() Then
oFiles = oFileDlg.getFiles()
   If UBound(oFiles) >= 0 Then
   Fileurl = oFiles(0)
   End If
else
exit sub
End If

end if
if not Fileexists(Fileurl) then
msgbox "File not found!"
exit sub
end if

   sFolderurl=DirectoryNameoutofPath(Fileurl,"/")
   sFilename_noext=GetFileNameWithoutExtension(Fileurl,"/")
   sFileext=GetFileNameExtension(Fileurl,"/")
   sFilename=FileNameoutofPath(Fileurl,"/")

   oSFA = createUnoService("com.sun.star.ucb.SimpleFileAccess")
   oInStream = oSFA.openFileRead(Fileurl)
   oDOM = createUnoService("com.sun.star.xml.dom.DocumentBuilder")
   oXML = oDOM.parse(oInStream)
   oInStream.closeInput
   
   nodeListFields = oXML.getElementsByTagName("fields").item(0).getElementsByTagName("field")
   rem msgbox nodeListFields.length
   for i=0 to nodeListFields.length-1
      curFieldNode = nodeListFields.item(i)
   If curFieldNode.hasAttributes Then
   nodeAttributes = curFieldNode.Attributes
      For k=0 To nodeAttributes.Length - 1
      'FieldNameResult = FieldNameResult & " " & nodeAttributes.item(k).NodeName & "=" & nodeAttributes.item(k).Value
      FieldNameResult = FieldNameResult & ";" & convertXMLstring(nodeAttributes.item(k).Value)
      Next k
   end if
   
       childcounter=1
       sNodeValue=""
       sNodeValuetemp=""
       If curFieldNode.hasChildNodes Then
       curChildNode = curFieldNode.firstChild
       
       Do While NOT isNull(curChildNode)
       
          If curChildNode.LocalName = "value" then
             if curChildNode.hasChildNodes then
             curChildChildNode=curChildNode.firstChild                      
   	         sNodeValuetemp= curChildChildNode.NodeValue
    	     sNodeValuetemp= replace(sNodeValuetemp,chr(10), "\n")
    	     sNodeValuetemp= replace(sNodeValuetemp,";", ",")         	         
             else
             sNodeValuetemp=""
             end if
           
             if childcounter=1 then
    	     sNodeValue=sNodeValuetemp
    	     else
    	     sNodeValue = sNodeValue & "\n" & sNodeValuetemp
    	     end if
    	     
             childcounter=childcounter+1
         
          end if
       curChildNode = curChildNode.nextSibling
       
       loop
       FieldValueResult = FieldValueResult & ";" & sNodeValue
       end if
       
       next i

   FieldNameResult=mid(FieldNameResult,2)
   FieldValueResult=mid(replace(FieldValueResult,chr(10), " "),2)
   'msgbox FieldNameResult & chr(13) & chr(10) & FieldValueResult

if fileexists(sFolderurl & "/" & sFilename_noext & ".csv") then kill(sFolderurl & "/" & sFilename_noext & ".csv")   
outputfile = sFolderurl & "/" & sFilename_noext & ".csv"
oTextoutputStream = CreateUnoService("com.sun.star.io.TextOutputStream")
ooutputStream = oSFA.openFileWrite(outputfile)
oTextoutputStream.setOutputStream(ooutputStream)
oTextoutputStream.writestring(FieldNameResult & chr(10) & FieldValueResult)
oTextoutputStream.closeOutput()
msgbox "Saved " & convertfromurl(sFolderurl & "/" & sFilename_noext & ".csv")
End Sub

    function convertXMLstring(instring)
	instring=replace(instring,"#","%")
	convertXMLstring = convertfromurl(instring)
	end function

A single .xfdf-document only contains the minimal data of the forms. So if the user didn't enter anything into e.g. a Textbox, then there will be no "Textbox"-Node in the XML at all. In another .xfdf-document there may be one, but another Node missing. Manually combining multiple of those csv files that have different HeaderNames into 1 csv is no fun job. So:
This XFDF to CSV-Converter will merge all .xfdf files in a chosen Folder, rebuild the HeaderNames and create a merged CSV:
(at the end CSV will be opened automatically in Calc, "\n"-strings converted to real linebreaks and optimal RowHeight and ColumnWidth set.)

Code: Select all

    Sub Convert_multipleXFDFs_to_1CSV
    If (Not GlobalScope.BasicLibraries.isLibraryLoaded("Tools")) Then GlobalScope.BasicLibraries.LoadLibrary("Tools")
    oSFA = createUnoService("com.sun.star.ucb.SimpleFileAccess") 
    sFolderpath=""
    rem---folderpicker
    rem to not use the folderpicker set the folder here and uncomment:
    rem sFolderpath=converttourl("C:\Users\kai\Desktop")
    if sFolderpath="" then
    oDialog = CreateUnoService("com.sun.star.ui.dialogs.FolderPicker")

    If oDialog.Execute() = 1 Then
    sFolderpath = oDialog.getDirectory
    else
    exit sub
    end if

    end if
    if not oSFA.Exists(sFolderpath) then
    msgbox "Folder not found!"
    exit sub
    end if

    dim allXFDFHeaders(0)
    dim allXFDFValues()
    sFileName=""
    sFileName = Dir(sFolderpath & "/", 0)
    Do While (sFileName <> "")
    if GetFileNameExtension(sFileName)="xfdf" then
    Fileurl=sFolderpath & "/" & sFileName

       oSFA = createUnoService("com.sun.star.ucb.SimpleFileAccess")
       oInStream = oSFA.openFileRead(Fileurl)
       oDOM = createUnoService("com.sun.star.xml.dom.DocumentBuilder")
       oXML = oDOM.parse(oInStream)
       oInStream.closeInput
       
       nodeListFields = oXML.getElementsByTagName("fields").item(0).getElementsByTagName("field")

       for i=0 to nodeListFields.length-1
          curFieldNode = nodeListFields.item(i)
       If curFieldNode.hasAttributes Then
       nodeAttributes = curFieldNode.Attributes
          For k=0 To nodeAttributes.Length - 1
                additem=1
                 for j=0 to ubound(allXFDFHeaders)
              if allXFDFHeaders(j)=convertXMLstring(nodeAttributes.item(k).Value) then additem=0
               next j
              
            if additem=1 then
            redim preserve allXFDFHeaders(ubound(allXFDFHeaders)+1)
          allXFDFHeaders(ubound(allXFDFHeaders))=convertXMLstring(nodeAttributes.item(k).Value)
          end if

          Next k
       end if
         
       next i
    end if
    sFileName = Dir()
    loop
    k=0
    i=0
    j=0
       

    sFileName = Dir(sFolderpath & "/", 0)
    Do While (sFileName <> "")
    if GetFileNameExtension(sFileName)="xfdf" then
    Fileurl=sFolderpath & "/" & sFileName

       oSFA = createUnoService("com.sun.star.ucb.SimpleFileAccess")
       oInStream = oSFA.openFileRead(Fileurl)
       oDOM = createUnoService("com.sun.star.xml.dom.DocumentBuilder")
       oXML = oDOM.parse(oInStream)
       oInStream.closeInput
       
       nodeListFields = oXML.getElementsByTagName("fields").item(0).getElementsByTagName("field")

       FieldNameResult=""
       FieldValueResult=""
       
       for i=0 to nodeListFields.length-1
          curFieldNode = nodeListFields.item(i)
       If curFieldNode.hasAttributes Then
       nodeAttributes = curFieldNode.Attributes
          For k=0 To nodeAttributes.Length - 1
          FieldNameResult = FieldNameResult & chr(9) & convertXMLstring(nodeAttributes.item(k).Value)
          Next k
       end if
       
       childcounter=1
       sNodeValue=""
       sNodeValuetemp=""
       If curFieldNode.hasChildNodes Then
       curChildNode = curFieldNode.firstChild
       
       Do While NOT isNull(curChildNode)
       
          If curChildNode.LocalName = "value" then
             if curChildNode.hasChildNodes then
             curChildChildNode=curChildNode.firstChild                      
   	         sNodeValuetemp= curChildChildNode.NodeValue
    	     sNodeValuetemp= replace(sNodeValuetemp,chr(10), "\n")
    	     sNodeValuetemp= replace(sNodeValuetemp,";", ",")         	         
             else
             sNodeValuetemp=""
             end if
           
             if childcounter=1 then
    	     sNodeValue=sNodeValuetemp
    	     else
    	     sNodeValue = sNodeValue & "\n" & sNodeValuetemp
    	     end if
    	     
             childcounter=childcounter+1
         
          end if
       curChildNode = curChildNode.nextSibling
       
       loop
       FieldValueResult = FieldValueResult & chr(9) & sNodeValue
       end if
       
       next i
       m=0
       n=0
        XFDFHeaders = split(FieldNameResult,chr(9))
        XFDFValues = split(FieldValueResult,chr(9))

        redim allXFDFValues(ubound(allXFDFHeaders))
        for m=0 to ubound(XFDFHeaders)
         for n=0 to ubound(allXFDFHeaders)
         if XFDFHeaders(m)=allXFDFHeaders(n) then allXFDFValues(n)=XFDFValues(m)
         next n
        next m
       FieldValueResult2 = FieldValueResult2 & sFileName & join(allXFDFValues(),";") & chr(13) & chr(10)
    end if
    sFileName = Dir()
    loop
    if ubound(allXFDFHeaders())=0 then
    msgbox "No .xfdf files found!"
    exit sub
    end if
    Outstring =  "File" & join(allXFDFHeaders(),";") & chr(13) & chr(10) & FieldValueResult2

    outputfile = sFolderpath & "/" & "merged.csv"
    if fileexists(outputfile) then kill(outputfile)   
    
    oTextoutputStream = CreateUnoService("com.sun.star.io.TextOutputStream")
    ooutputStream = oSFA.openFileWrite(outputfile)
    oTextoutputStream.setOutputStream(ooutputStream)
    oTextoutputStream.writestring(Outstring)
    oTextoutputStream.closeOutput()
    msgbox "Saved: " & convertfromurl(outputfile)
    OpenCSVinCalc(outputfile)
    calc_replaceLinebreak_UsedArea
    End Sub

    rem-----for opening in Calc
    Sub OpenCSVinCalc(Url)
    Dim oDocument as Object
    fnURL=ConvertToURL(Url)

    'Create new document and import data
    oDocument = StarDesktop.LoadComponentFromURL( fnURL, "_blank", 0, _
    Array(MakePropertyValue( "FilterName", "Text - txt - csv (StarCalc)" ), _
             MakePropertyValue( "FilterOptions", "59,34,76,1,1" )

    End Sub

    Function MakePropertyValue( Optional cName As String, Optional uValue ) As com.sun.star.beans.PropertyValue
       oPropertyValue = createUnoStruct( "com.sun.star.beans.PropertyValue" )
       If Not IsMissing( cName ) Then
          oPropertyValue.Name = cName
       EndIf
       If Not IsMissing( uValue ) Then
          oPropertyValue.Value = uValue
       EndIf
       MakePropertyValue() = oPropertyValue
    End Function

    function convertXMLstring(instring)
	instring=replace(instring,"#","%")
	convertXMLstring = convertfromurl(instring)
	end function

sub calc_replaceLinebreak_UsedArea
Doc = ThisComponent
If Doc.SupportsService("com.sun.star.sheet.SpreadsheetDocument") Then
else
exit sub
end if
Sheet = Doc.Sheets.getByIndex(0)
oCursor = Sheet.createCursor()
oCursor.gotoEndofUsedArea(false)

for i=0 to oCursor.RangeAddress.EndColumn
j=0
for j=0 to oCursor.RangeAddress.EndRow
Sheet.getCellByPosition(i,j).string = replace(Sheet.getCellByPosition(i,j).string,"\n",chr(10))
'Sheet.getCellByPosition(i,j).string = replace(Sheet.getCellByPosition(i,j).string,chr(10),"\n")
next j
next i

rem---Set Optimal RowHeight
for k=0 to oCursor.RangeAddress.EndRow
oRow = Sheet.getRows().getByIndex(oCursor.RangeAddress.EndRow-k)
oRow.OptimalHeight()=True
next
rem---Set oOptimal ColumnWidth
for m=0 to oCursor.RangeAddress.EndColumn
oColumn = Sheet.getColumns().getByIndex(oCursor.RangeAddress.EndColumn-m)
oColumn.OptimalWidth()=True
next
end sub

PDF-Form:
PDF-Form
PDF-Form
Form.PNG (11.14 KiB) Viewed 3852 times
CSV created from sent back .xfdf-files:
CSV created from sent back .xfdf-files
CSV created from sent back .xfdf-files
CSV.PNG (10.97 KiB) Viewed 3852 times
Thanks to the code from here:
http://de.openoffice.info/viewtopic.php?t=62289

Please test!
Win7 Pro, Lubuntu 15.10, LO 4.4.7, OO 4.1.3
Free Project: LibreOffice Songbook Architect (LOSA)
http://struckkai.blogspot.de/2015/04/li ... itect.html
Post Reply