[Calc] Open PDF files, at specific pages, with web browser, from Calc cells

Shared Libraries
Forum rules
For sharing working examples of macros / scripts. These can be in any script language supported by OpenOffice.org [Basic, Python, Netbean] or as source code files in Java or C# even - but requires the actual source code listing. This section is not for asking questions about writing your own macros.
Post Reply
User avatar
Sébastien C
Posts: 112
Joined: Mon Jan 04, 2010 5:06 pm
Location: Meymac, France

[Calc] Open PDF files, at specific pages, with web browser, from Calc cells

Post by Sébastien C »

The context:

I need to set up a small database, which will have less than 3 000 entries, and for the setup, Calc does perfectly the job (even though it is a spreadsheet and not a real database).
I have a significant number of PDF files downloaded from our French National digital library (Gallica), which is done because it is so overloaded that the latency times, when you need to quickly flip through pages to index them, are sometimes exasperating.

Each PDF file contains a small collection of miscellaneous articles, generally grouped over a year; therefore, a single file contains several articles. For each article, my small database will list the starting and ending page numbers. Obviously, eventually, the spreadsheet will no longer be sufficient (but that's not the point). I can't thank Villeroy enough for providing us his excellent "getArgumentFromURL()" function; I've already demonstrated its power here. The code presented here is infinitely less complex since the calling cell is not itself modified, passing its own coordinates.

To understand the macro, you need to make sure you're entering the correct command for your web browser. Since I'm working with GNU-Linux, for me, it's simply “firefox”. For others, it might be “C:\bla\bla\bla\firefox.exe”. For this example, you can download this PDF file (21Mb) and place it in a folder just below the file containing the macro (here, “Gallica_PDF”). The offset of 2, compared to web references, is due to the fact that PDF files always contain two initial pages relating to the license. But beyond these two pages, the page order is strictly the same as that of the images in the online viewer. For me, using Firefox is due to the fact that it's the only PDF reader I have that supports URLs like:

file:///home/USER/bla/bla/bla/Gallica_PDF/1878_bpt6k4413902.pdf#page=527

knowing that obviously everything relies on this functionality “#page=XXX”.

Since it also opens a web browser tab on every call, this is also something that really suits me in this case, especially since I have two screens, one of which is vertical (which is pure convenience in this case).

Enjoy!
:D

Code: Select all

Option explicit

' ╔══════════════════════════════════════════════════════════════════════════════╗
' ║ Reads a PDF file to a specified page in a cell.                              ║█
' ╚══════════════════════════════════════════════════════════════════════════════╝█
'  ▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀

Sub readPDFPage(myUrl As String)
 Dim          myPath As String, myPDFreader As String, myPDFfolder As String
 Dim myReferences(1) As String,   myPDFfile As String,   myBNFpage As String
 Dim        myOffset As Integer

 globalScope.basicLibraries.loadLibrary("ScriptForge")

  myPDFreader = "firefox"
  myPDFfolder = "Gallica_PDF"
     myOffset = 2
       myPath = getParentFolderName(thisComponent.url)
    myBNFpage = getArgumentFromURL(myUrl, "myFile")
 myReferences = split(myBNFpage, "/f")
    myPDFfile = myPath & myPDFfolder & "/" & myReferences(0) & ".pdf#page=" & cStr(val(myReferences(1)) + myOffset)

 ' Call PDF reader.
 shell(myPDFreader, 3, myPDFfile)
End Sub

' ╔══════════════════════════════════════════════════════════════════════════════════╗
' ║ Extracts the values of variables (text) passed in argument with the URL call.    ║█
' ║ Source : https://forum.openoffice.org/en/forum/viewtopic.php?t=44153             ║█
' ╚══════════════════════════════════════════════════════════════════════════════════╝█
'  ▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀

Function getArgumentFromURL(sURL$, sName$) As String
	On Error Goto exitErr:

	Dim iStart%, i%, l%, sArgs$, a()
 
	iStart = instr(sURL, "?")
	     l = len(sName)

	If (iStart = 0) or (l = 0) Then Exit Function

	sArgs = mid(sURL, iStart + 1)               ' sArgs behind "?".

	a() = split(sArgs, "&")

	For i = 0 To uBound(a())
	 If instr(1, a(i), sName & "=", 1) = 1 Then ' Not case sensitive.
	  getArgumentFromURL = Mid(a(i), l + 2)
	  Exit for
	 Endif
	Next i

	exitErr:                                    ' Return "".
End Function
Attachments
PDFlinkPageDemo.ods
(37.05 KiB) Downloaded 8 times
LibreOffice v. 24.2.7.2, under GNU-Linux Mint and, in virtualization and just for tests, LibreOffice v. 24.2.7.2 under M$-W. :ouch: .
Post Reply