Extract form data of Folder of PDFs (using PDFtk)

Shared Libraries
Forum rules
For sharing working examples of macros / scripts. These can be in any script language supported by OpenOffice.org [Basic, Python, Netbean] or as source code files in Java or C# even - but requires the actual source code listing. This section is not for asking questions about writing your own macros.
Post Reply
musikai
Volunteer
Posts: 294
Joined: Wed Nov 11, 2015 12:19 am

Extract form data of Folder of PDFs (using PDFtk)

Post by musikai »

PDFtk must be installed.
This will run the command "dump_data_fields_utf8" for all PDFs in a chosen folder and extract the formdatas to .txt-files into a subfolder.

To convert all .txt files to .xdfd files see:
viewtopic.php?f=21&t=91587
To convert all .xfdf files to 1 csv see:
viewtopic.php?f=21&t=91524
Create CSV from Formdata of PDFs in a folder (using PDFtk)
viewtopic.php?f=21&t=91588

Code: Select all

    Sub PDFTK_dumpdatafields_of_Folder_with_PDFs_to_txt_files
    If (Not GlobalScope.BasicLibraries.isLibraryLoaded("Tools")) Then GlobalScope.BasicLibraries.LoadLibrary("Tools")
    oSFA = createUnoService("com.sun.star.ucb.SimpleFileAccess")
    
    pdftkapp="pdftk"
    
    sFolderpath=""
    rem---folderpicker
    rem to not use the folderpicker set the folder here and uncomment:
    rem sFolderpath=converttourl("C:\Users\kai\Desktop")
    if sFolderpath="" then
    oDialog = CreateUnoService("com.sun.star.ui.dialogs.FolderPicker")

    If oDialog.Execute() = 1 Then
    sFolderpath = oDialog.getDirectory
    else
    exit sub
    end if

    end if
    if not oSFA.Exists(sFolderpath) then
    msgbox "Folder not found!"
    exit sub
    end if

    sFileName=""
    sFileName = Dir(sFolderpath & "/", 0)
    Do While (sFileName <> "")
    if GetFileNameExtension(sFileName)="pdf" then
    Fileurl=sFolderpath & "/" & sFileName
	inputfile=convertfromurl(Fileurl)

	mkdir(sFolderpath & "/" & "PDFtk-DataFields")
	rem indirect way because of PDFtk utf8 filename bug
	outputfiledummy = sFolderpath & "/" & "PDFtk-DataFields" & "/" & "datafields.txt"
	outputfile = sFolderpath & "/" & "PDFtk-DataFields" & "/" & getfilenamewithoutextension(sFileName) & ".txt"
    if fileexists(outputfiledummy) then kill(outputfiledummy) 
    outputfiledummy=convertfromurl(outputfiledummy) 
    if fileexists(outputfile) then kill(outputfile) 
    Shell(pdftkapp,0, """" & inputfile & """" & " dump_data_fields_utf8 output " & """" & outputfiledummy & """", true)

    rem indirect way because of PDFtk utf8 filename bug
    
		if fileexists(outputfiledummy) then 
		Name outputfiledummy as outputfile
		end if
	 end if
    sFileName = Dir()
    loop
    
    msgbox "Files saved to: " & convertfromurl(sFolderpath & "/" & "PDFtk-DataFields")
 
    End Sub
Win7 Pro, Lubuntu 15.10, LO 4.4.7, OO 4.1.3
Free Project: LibreOffice Songbook Architect (LOSA)
http://struckkai.blogspot.de/2015/04/li ... itect.html
Post Reply