Extract form data of Folder of PDFs (using PDFtk)

Creating Extension - Shared Libraries
Forum rules
For sharing working examples of macros / scripts. These can be in any script language supported by OpenOffice.org [Basic, Python, Netbean] or as source code files in Java or C# even - but requires the actual source code listing. This forum is not for asking questions about writing your own macros.

Extract form data of Folder of PDFs (using PDFtk)

Postby musikai » Thu Dec 14, 2017 12:20 am

PDFtk must be installed.
This will run the command "dump_data_fields_utf8" for all PDFs in a chosen folder and extract the formdatas to .txt-files into a subfolder.

To convert all .txt files to .xdfd files see:
https://forum.openoffice.org/en/forum/viewtopic.php?f=21&t=91587
To convert all .xfdf files to 1 csv see:
https://forum.openoffice.org/en/forum/viewtopic.php?f=21&t=91524
Create CSV from Formdata of PDFs in a folder (using PDFtk)
https://forum.openoffice.org/en/forum/viewtopic.php?f=21&t=91588

Code: Select all   Expand viewCollapse view
    Sub PDFTK_dumpdatafields_of_Folder_with_PDFs_to_txt_files
    If (Not GlobalScope.BasicLibraries.isLibraryLoaded("Tools")) Then GlobalScope.BasicLibraries.LoadLibrary("Tools")
    oSFA = createUnoService("com.sun.star.ucb.SimpleFileAccess")
   
    pdftkapp="pdftk"
   
    sFolderpath=""
    rem---folderpicker
    rem to not use the folderpicker set the folder here and uncomment:
    rem sFolderpath=converttourl("C:\Users\kai\Desktop")
    if sFolderpath="" then
    oDialog = CreateUnoService("com.sun.star.ui.dialogs.FolderPicker")

    If oDialog.Execute() = 1 Then
    sFolderpath = oDialog.getDirectory
    else
    exit sub
    end if

    end if
    if not oSFA.Exists(sFolderpath) then
    msgbox "Folder not found!"
    exit sub
    end if

    sFileName=""
    sFileName = Dir(sFolderpath & "/", 0)
    Do While (sFileName <> "")
    if GetFileNameExtension(sFileName)="pdf" then
    Fileurl=sFolderpath & "/" & sFileName
   inputfile=convertfromurl(Fileurl)

   mkdir(sFolderpath & "/" & "PDFtk-DataFields")
   rem indirect way because of PDFtk utf8 filename bug
   outputfiledummy = sFolderpath & "/" & "PDFtk-DataFields" & "/" & "datafields.txt"
   outputfile = sFolderpath & "/" & "PDFtk-DataFields" & "/" & getfilenamewithoutextension(sFileName) & ".txt"
    if fileexists(outputfiledummy) then kill(outputfiledummy)
    outputfiledummy=convertfromurl(outputfiledummy)
    if fileexists(outputfile) then kill(outputfile)
    Shell(pdftkapp,0, """" & inputfile & """" & " dump_data_fields_utf8 output " & """" & outputfiledummy & """", true)

    rem indirect way because of PDFtk utf8 filename bug
   
      if fileexists(outputfiledummy) then
      Name outputfiledummy as outputfile
      end if
    end if
    sFileName = Dir()
    loop
   
    msgbox "Files saved to: " & convertfromurl(sFolderpath & "/" & "PDFtk-DataFields")

    End Sub
Win7 Pro, Lubuntu 15.10, LO 4.4.7, OO 4.1.3
Free Project: LibreOffice Songbook Architect (LOSA)
http://struckkai.blogspot.de/2015/04/libreofficesongbookarchitect.html
musikai
Volunteer
 
Posts: 199
Joined: Wed Nov 11, 2015 12:19 am

Return to Code Snippets

Who is online

Users browsing this forum: No registered users and 4 guests