[Solved] Find all hyperlinks [C#]

Creating a macro - Writing a Script - Using the API (OpenOffice Basic, Python, BeanShell, JavaScript)
Post Reply
Forbz
Posts: 2
Joined: Mon Nov 14, 2022 9:53 am

[Solved] Find all hyperlinks [C#]

Post by Forbz »

Hello, could you tell me how to find all hyperlinks in the text?

I get the XComponent and cast it to XTextDocument. What do I need to do next?

I read something about XAccessibleHypertext, but i don't understand how to work with it.

Code: Select all

XComponent xComponent = aLoader.loadComponentFromURL(
            filePathUri, "_default", 0,              
            arr);
           
XTextDocument textDoc = (XTextDocument)xComponent;
Last edited by Forbz on Mon Nov 14, 2022 4:38 pm, edited 1 time in total.
LibreOffice 7.4/ Windows 11
JeJe
Volunteer
Posts: 2784
Joined: Wed Mar 09, 2016 2:40 pm

Re: [Find all hyperlinks][C#]

Post by JeJe »

Enumerate the paragraphs and text portions within the paragraphs. With a hyperlink at the start of the document MRI in basic gives this:

Code: Select all

mri thiscomponent.text.createenumeration.nextelement.createenumeration.nextelement


'HyperLinkEvents                  .container.XNameReplace    -INTERFACE-                   Maybevoid             48  
'HyperLinkName                    string                     ""                            Maybevoid             48  
'HyperLinkTarget                  string                     ""                            Maybevoid             48  
'HyperLinkURL 

Windows 10, Openoffice 4.1.11, LibreOffice 7.4.0.3 (x64)
JeJe
Volunteer
Posts: 2784
Joined: Wed Mar 09, 2016 2:40 pm

Re: [Find all hyperlinks][C#]

Post by JeJe »

Here's code I wrote a while ago (in basic) to get all the hyperlink names. [Edit: just does the main text]

The navigator provides a list of hyperlinks - but the list either isn't exposed in the API for macro programmers or it is and I don't know how to get it.

[Edit: hyperlinks aren't one of the attributes for a find operation either, which would make getting them easy]

Code: Select all


function gethyperlinks(hyperlinkNames)

	Dim oEnum 
	Dim oPar 
	dim c as long,TScount as long

	redim hyperlinkNames(1000)
	ub = 1000
	c=-1
	'on error resume next
	oEnum = ThisComponent.Text.createEnumeration()
	Do While oEnum.hasMoreElements()
		oPar = oEnum.nextElement()
		on error resume next
		oportionenum = opar.createenumeration
		do while oportionenum.hasmoreelements
			oportion =oportionenum.nextelement
			if oPortion.hyperlinkurl <>"" then
				c= c+1

				if (c= 0) or (oPortion.hyperlinkurl <>oldsection) then

					hyperlinkNames(c) = oPortion.text.createtextcursorbyrange(oportion)
					oldsection=oPortion.hyperlinkurl
					st = st & chr(10) & c & " " & oPortion.hyperlinkurl
					if c>ub then
						ub = ub+1000
						redim preserve hyperlinkNames(ub)
					end if
				end if
			end if
hr:
		Loop
	loop
	if c>-1 then		redim preserve hyperlinkNames(c)
	gethyperlinks =true

end function
Windows 10, Openoffice 4.1.11, LibreOffice 7.4.0.3 (x64)
JeJe
Volunteer
Posts: 2784
Joined: Wed Mar 09, 2016 2:40 pm

Re: [Find all hyperlinks][C#]

Post by JeJe »

My above code didn't handle for tables either.

This thread looks to do a better job of getting hyperlinks in more than just the main text:

https://forum.openoffice.org/en/forum/v ... 21&t=40598

The alt search extension searches for hyperlinks but I presume it uses the same enumeration method.
Windows 10, Openoffice 4.1.11, LibreOffice 7.4.0.3 (x64)
Forbz
Posts: 2
Joined: Mon Nov 14, 2022 9:53 am

Re: [Find all hyperlinks][C#]

Post by Forbz »

JeJe wrote: Mon Nov 14, 2022 11:54 am Enumerate the paragraphs and text portions within the paragraphs. With a hyperlink at the start of the document MRI in basic gives this:

Code: Select all

mri thiscomponent.text.createenumeration.nextelement.createenumeration.nextelement


'HyperLinkEvents                  .container.XNameReplace    -INTERFACE-                   Maybevoid             48  
'HyperLinkName                    string                     ""                            Maybevoid             48  
'HyperLinkTarget                  string                     ""                            Maybevoid             48  
'HyperLinkURL 

Thanks for idea about two enumerates, it helps!
LibreOffice 7.4/ Windows 11
User avatar
Lupp
Volunteer
Posts: 3553
Joined: Sat May 31, 2014 7:05 pm
Location: München, Germany

Re: [Solved] Find all hyperlinks [C#]

Post by Lupp »

(I didn't test everything with AOO. There may be differences as compared to LibO.)
How can this topic be "solved"?
There isn't any C (C#) code as far as I can see. (Mentioned by the subject of the topic.)
There isn't any mention of TextFrame(s) which can be linked themselves, and, in addition, can contain linked TextPortion(s) and everything else.
There isn't any mention of graphical objects which also again can be linked and can contain linked text portions.
There isn't ...
And after all annotations are implemented as a special type of TextField. Text in Writer doesn't support the URL text fields used in Calc cells, but the annotation field can in turn contain such a URL-field inside its text.
Since links can be dangerous in a sense (create vulnerabilities), we shouldn't report "no links" prematurely.
On Windows 10: LibreOffice 24.2 (new numbering) and older versions, PortableOpenOffice 4.1.7 and older, StarOffice 5.2
---
Lupp from München
JeJe
Volunteer
Posts: 2784
Joined: Wed Mar 09, 2016 2:40 pm

Re: [Solved] Find all hyperlinks [C#]

Post by JeJe »

There are obvious workarounds which involve marking the link in some way in advance - such as a character attribute not otherwise used - so they can be found easily with a find. If they're of a unique form such as starting with www that would be easy too.

Edit: I'll mention for completeness... there should be access to the navigator via the accessiblecontext - and that list, or possibly just part of it. My experience with the accessiblecontext is it can be a can of worms though.
Windows 10, Openoffice 4.1.11, LibreOffice 7.4.0.3 (x64)
User avatar
Lupp
Volunteer
Posts: 3553
Joined: Sat May 31, 2014 7:05 pm
Location: München, Germany

Re: [Solved] Find all hyperlinks [C#]

Post by Lupp »

Assuming somebody wants to check if a Writer document is "link-safe" in the sense that it doesn't contain any tricky or hidden or unexpected hyperlinks, access to the navigator tree wouldn't help. There again are only shown the links assigned to a text portion, and the link targets aren't shown at all if not being the same as the anchor text. Links assigned to frames (not to contained text) or to graphical objects are ignored, and concerning the strange links inserted into annotations, I anyway suspect the two of us to be the only persons who know about. (I got the hunch when I was looking for a workaround concerning the tdf-bug #138347 about hyperlinks made for calling custom code. Meanwhile I checked with AOO. It hasn't yet the bug which is a regression, but the workaround is applicable.)
On Windows 10: LibreOffice 24.2 (new numbering) and older versions, PortableOpenOffice 4.1.7 and older, StarOffice 5.2
---
Lupp from München
Post Reply