Page 1 of 1

Extract form data of Folder of PDFs (using PDFtk)

PostPosted: Thu Dec 14, 2017 12:20 am
by musikai
PDFtk must be installed.
This will run the command "dump_data_fields_utf8" for all PDFs in a chosen folder and extract the formdatas to .txt-files into a subfolder.

To convert all .txt files to .xdfd files see:
https://forum.openoffice.org/en/forum/viewtopic.php?f=21&t=91587
To convert all .xfdf files to 1 csv see:
https://forum.openoffice.org/en/forum/viewtopic.php?f=21&t=91524
Create CSV from Formdata of PDFs in a folder (using PDFtk)
https://forum.openoffice.org/en/forum/viewtopic.php?f=21&t=91588

Code: Select all   Expand viewCollapse view
    Sub PDFTK_dumpdatafields_of_Folder_with_PDFs_to_txt_files
    If (Not GlobalScope.BasicLibraries.isLibraryLoaded("Tools")) Then GlobalScope.BasicLibraries.LoadLibrary("Tools")
    oSFA = createUnoService("com.sun.star.ucb.SimpleFileAccess")
   
    pdftkapp="pdftk"
   
    sFolderpath=""
    rem---folderpicker
    rem to not use the folderpicker set the folder here and uncomment:
    rem sFolderpath=converttourl("C:\Users\kai\Desktop")
    if sFolderpath="" then
    oDialog = CreateUnoService("com.sun.star.ui.dialogs.FolderPicker")

    If oDialog.Execute() = 1 Then
    sFolderpath = oDialog.getDirectory
    else
    exit sub
    end if

    end if
    if not oSFA.Exists(sFolderpath) then
    msgbox "Folder not found!"
    exit sub
    end if

    sFileName=""
    sFileName = Dir(sFolderpath & "/", 0)
    Do While (sFileName <> "")
    if GetFileNameExtension(sFileName)="pdf" then
    Fileurl=sFolderpath & "/" & sFileName
   inputfile=convertfromurl(Fileurl)

   mkdir(sFolderpath & "/" & "PDFtk-DataFields")
   rem indirect way because of PDFtk utf8 filename bug
   outputfiledummy = sFolderpath & "/" & "PDFtk-DataFields" & "/" & "datafields.txt"
   outputfile = sFolderpath & "/" & "PDFtk-DataFields" & "/" & getfilenamewithoutextension(sFileName) & ".txt"
    if fileexists(outputfiledummy) then kill(outputfiledummy)
    outputfiledummy=convertfromurl(outputfiledummy)
    if fileexists(outputfile) then kill(outputfile)
    Shell(pdftkapp,0, """" & inputfile & """" & " dump_data_fields_utf8 output " & """" & outputfiledummy & """", true)

    rem indirect way because of PDFtk utf8 filename bug
   
      if fileexists(outputfiledummy) then
      Name outputfiledummy as outputfile
      end if
    end if
    sFileName = Dir()
    loop
   
    msgbox "Files saved to: " & convertfromurl(sFolderpath & "/" & "PDFtk-DataFields")

    End Sub