How to find images in a Writer document

Creating a macro - Writing a Script - Using the API (OpenOffice Basic, Python, BeanShell, JavaScript)
Post Reply
_savage
Posts: 198
Joined: Sun Apr 21, 2013 12:55 am

How to find images in a Writer document

Post by _savage »

This question is related to this one: http://forum.openoffice.org/en/forum/vi ... 45&t=64757.

When walking the paragraphs of a Writer document, how do I find and access properties of the images as I walk the paragraphs? (Doing this in Python, but any example would help :-))
Mac 13.7 using LO 24.8.2.1, Ubuntu Linux using LO 24.8 headless.
User avatar
Mr.Dandy
Posts: 461
Joined: Tue Dec 11, 2012 4:22 pm

Re: How to find images in a Writer document

Post by Mr.Dandy »

Hello,

Images are localised on DrawPage object. Use XRay or MRI to explore your document and find this property.
OpenOffice 4.1.12 - Windows 10
_savage
Posts: 198
Joined: Sun Apr 21, 2013 12:55 am

Re: How to find images in a Writer document

Post by _savage »

Mr.Dandy wrote:Images are localised on DrawPage object. Use XRay or MRI to explore your document and find this property.
Hm... I looked into XRay and MRI (with a nice thread on using introspection here) but I can't quite figure out how this works from Python when I parse a document.

I thought that it should work something like this: iterate over paragraphs, and for each paragraph check if the page it's on contains an image. If so, I assume I can figure out where the image is placed in relation to the paragraph by checking the positions of image and paragraph on the page? But that's just speculation.

If you have more hints, please let me know. Meanwhile, I'll keep digging around...
Mac 13.7 using LO 24.8.2.1, Ubuntu Linux using LO 24.8 headless.
_savage
Posts: 198
Joined: Sun Apr 21, 2013 12:55 am

Re: How to find images in a Writer document

Post by _savage »

I think I might not have made the question clear enough: this is supposed to be a scripted/automated solution to traverse a Writer document and extract certain kinds of content. It seems to me (please correct me if I'm wrong) that an MRI based solution assumes manual inspection of the object tree (graph?) but I haven't managed to get that working yet either.

Anyway, after some more digging I think I'm getting a bit closer. Here is some Python code which allows me to find the images in a given document:

Code: Select all

...
context = resolver.resolve("uno:socket,host=localhost,port=2002;urp;StarOffice.ComponentContext")
desktop = context.ServiceManager.createInstanceWithContext("com.sun.star.frame.Desktop", context)
document = desktop.loadComponentFromURL("file://path/to/file.doc", "_blank", 0, ())

def get_uno_attr(obj, attr) :
    """Helper function that reads an attribute from a UNO object; returns the attribute, or None if it didn't exist."""
    try : return getattr(obj, attr)
    except AttributeError : return None

pages = document.getDrawPage()
penum = pages.createEnumeration()
while penum.hasMoreElements() :
    elem = penum.nextElement()
    graphic = get_uno_attr(elem, "Graphic")
    if graphic is not None :
        print("Found an image: " + str(elem.getSize()) + " " + graphic.MimeType + " " + elem.GraphicURL)
        # The following is based on F3KTotal's answer below.
        psetinfo = graphic.getPropertySetInfo()
        pset = psetinfo.getProperties()
        for p in pset :
            print(p.Name, p.Type, p.Handle, p.Attributes)
This gives me some information on the images in the document. However, two things are missing:

* Looks like I need to get a hold of an GraphicsProvider which would allow me to write out the graphics object to a file. Not quite sure how to do that yet.

* This is just the image (i.e. TextGraphicObject ?), but I need to know at least one of the surrounding paragraphs. Not quite sure how to do that yet.
Last edited by _savage on Fri Nov 08, 2013 4:47 pm, edited 2 times in total.
Mac 13.7 using LO 24.8.2.1, Ubuntu Linux using LO 24.8 headless.
F3K Total
Volunteer
Posts: 1044
Joined: Fri Dec 16, 2011 8:20 pm

Re: How to find images in a Writer document

Post by F3K Total »

Hi,
maybe this Basic Code helps a bit:

Code: Select all

Sub S_find_Images
    odrawpage = Thiscomponent.drawpage
    for i = 0 to odrawpage.count - 1
        oshape = odrawpage.getbyindex(i)
        if oshape.supportsservice("com.sun.star.text.TextGraphicObject") then
            msgbox "I'm a graphic shape, my name is: "+oshape.name
            aprops = oshape.Propertysetinfo.Properties
            for k = 0 to ubound (aProps)
            sstring = sstring + "Name: "+aProps(k).Name+" ----Type: "+aProps(k).Type.Name+chr(10)
            next k
            msgbox sstring
        end if
    next i
End Sub
R
  • MMove 1.0.6
  • Extension for easy, exact positioning of shapes, pictures, controls, frames ...
  • my current system
  • Windows 11 AOO, LO | Linux Mint AOO, LO
regina
Posts: 67
Joined: Sat Apr 05, 2008 4:55 pm

Re: How to find images in a Writer document

Post by regina »

Code: Select all

sub TraverseGraphicObjects
dim oDocument as variant: oDocument = ThisComponent
dim oGraphicCollection as variant: oGraphicCollection = oDocument.getGraphicObjects
dim i as integer
dim aGraphic as variant
for i = 0 to oGraphicCollection.count-1
	aGraphic = oGraphicCollection(i)
    Rem Do something with the graphic	
	msgbox(aGraphic.Name)
next i
end sub
_savage
Posts: 198
Joined: Sun Apr 21, 2013 12:55 am

Re: How to find images in a Writer document

Post by _savage »

Thanks F3KTotal and regina! The code helps to dig up some more information but I still don't have access to where the image is (what page, what are neighboring paragraphs) or how to save the image out into a file.

F3KTotal: I took your code an extended the example above, so that the property set is printed for each graphic found. That worked well, thank you!
Mac 13.7 using LO 24.8.2.1, Ubuntu Linux using LO 24.8 headless.
_savage
Posts: 198
Joined: Sun Apr 21, 2013 12:55 am

Re: How to find images in a Writer document

Post by _savage »

It's been a while...

Looks like iterating over the draw pages of a document, the loaded images are handled as XShape elements, which have a Graphic property with a Type but also a mime type. For pixel bitmap graphics, I can then get the DIB (Device Independent Bitmap, see here) as a ByteSequence.

Is this how bitmap images are handled internally by Office? Am I correct in assuming that this is in fact a BMP file? (It does open with "BM6" as the first bytes...) The following code seems to work as an extension of the loop in the above question:

Code: Select all

if graphic.Type == 1: # GraphicType.PIXEL
    with open("image.bmp", "wb") as imgf:
        imgf.write(graphic.DIB.value)
Is there a way to save this image in its original format, PNG for example?
Mac 13.7 using LO 24.8.2.1, Ubuntu Linux using LO 24.8 headless.
Post Reply