Can't load HTML content in doc properly

Java, C++, C#, Delphi... - Using the UNO bridges
Post Reply
pro
Posts: 2
Joined: Mon Jan 03, 2011 12:07 pm

Can't load HTML content in doc properly

Post by pro »

I work on oosdk and try to load whole content of html into doc. When I tried to load dynamic (ie http://www.google.com etc) html content into doc, it's not loaded -- some ioerrorcode exeption occured. As per oo forum concern com.sun.star.text.WebDocument not provide any web/writer doc api name so its gives this exception.

So how can i get this type of content into doc? Please send me the code or ur guidence to solve it.

So i save dynamic page in m/c and then tried to load the same into doc,its loaded but not look same as the front-end view of html page in browser. Mainly its not mention style, not wellorder, not getting img etc.

How can i get content in doc same as front-end view of html n browser?
 Edit: Q tied up up by mod 
pro,openoffice.org3,linux
pro
Posts: 2
Joined: Mon Jan 03, 2011 12:07 pm

Ok sir but my work is that then how can i solve it

Post by pro »

Hello Hol.Sten

Sorry.

Thanks.But its didnt give img noteven from static html content and when its loaded
into doc its size is not ok.is it due to not support css? And sir my work is that to load
html content in doc then what should i do? need ur detail valued advice.

Sir any other tools/open source through which i can do that using java technology.

pro

To reply to this thread use teh post reply or Quick Reply buttons below - do not start a new thread. (TheGurkha, Moderator).
pro,openoffice.org3,linux
User avatar
Hagar Delest
Moderator
Posts: 32685
Joined: Sun Oct 07, 2007 9:07 pm
Location: France

Re: Ok sir but my work is that then how can i solve it

Post by Hagar Delest »

Please read the Survival Guide for the forum. Give your topic a real title and explain what is your problem. You seem to refer to a previous discussion.
LibreOffice 24.2 on Xubuntu 24.04 and 7.6.4.1 portable on Windows 10
hol.sten
Volunteer
Posts: 495
Joined: Mon Oct 08, 2007 1:31 am
Location: Hamburg, Germany

Re: Cant load html content in doc prperly

Post by hol.sten »

pro wrote:I work on oosdk and try to load whole content of html into doc.when i tried to load dynamic (ie http://www.google.com etc) html content into doc its not loaded some ioerrorcode exeption occured.As per oo forum concern com.sun.star.text.WebDocument not provide any web/writer doc api name so its gives this exception. So sir how can i get this type of content into doc?please send me the code or ur guidence to solve it.
I read through some of the older posts in OOo forums concerning loading HTML and gave finally the following Java code a try (using Ubuntu 10.04, OOo 3.2.0, NetBeans 6.8, Java 1.6.0.16, bootstrapconnector.jar):

Code: Select all

import com.sun.star.comp.helper.BootstrapException;
import com.sun.star.frame.XComponentLoader;
import com.sun.star.lang.XMultiComponentFactory;
import com.sun.star.uno.Exception;
import com.sun.star.uno.UnoRuntime;
import com.sun.star.uno.XComponentContext;
import ooo.connector.BootstrapSocketConnector;

public class LoadHtmlFromUrl {

    public static void main(String[] args) throws BootstrapException, Exception {

        String oooExeFolder = "/usr/lib/openoffice/program/";
        XComponentContext xContext = BootstrapSocketConnector.bootstrap(oooExeFolder);
        XMultiComponentFactory xMultiComponentFactory = xContext.getServiceManager();
        XComponentLoader xcomponentloader = (XComponentLoader) UnoRuntime.queryInterface(XComponentLoader.class,xMultiComponentFactory.createInstanceWithContext("com.sun.star.frame.Desktop", xContext));

        String loadURL;
        loadURL = "http://www.google.com";
        //loadURL = "file:///tmp/html/google/google.html"; // Created with "Save As ..." in Google Chrome
        //loadURL = "http://static.springsource.org/spring/docs/3.0.x/spring-framework-reference/html/overview.html";
        //loadURL = "http://static.springsource.org/spring/docs/3.0.x/spring-framework-reference/html/beans.html";
        //loadURL = "http://static.springsource.org/spring/docs/3.0.x/spring-framework-reference/html/aop.html";

        Object objectDocumentToStore = xcomponentloader.loadComponentFromURL(loadURL, "_blank", 0, null);
    }
}
The code produced no ioerrorcode exeption. I made several runs, one for each given loadURL example. After each run and always after some time OOo came up with a window showing at least some content.

Due to the fact that the Google page uses heavily JavaScript and CSS you get nearly nothing loaded in OOo. The result is far away from what you see in a web browser.

As an other example I tried several pages from the Spring documentation. The text of the pages is shown ok in OOo. But loading them directly from the web gives another problem, if you try to automate it: OOo does not load the pictures together with the HTML code. The pictures only get loaded after you scroll down in OOo to the pictures.

In my opinion OOo is totally useless for getting web pages dynamically from the web.
pro wrote:so i save dynamic page in m/c and then tried to load the same into doc,its loaded but not look same as the front-end view of html page in browser.Mainly its not mention style, not wellorder,not getting img etc.
All mentioned differences are caused by OOo's inability to process CSS and JavaScript in an HTML-document.
pro wrote:How can i get content in doc same as front-end view of html n browser?
At least not with OOo. And due to OOo's CSS limitations OOo will never be useful for this.
pro wrote:any other tools/open source through which i can do that using java technology.
No idea.
OOo 3.2.0 on Ubuntu 10.04 • OOo 3.2.1 on Windows 7 64-bit and MS Windows XP
TerryE
Volunteer
Posts: 1402
Joined: Sat Oct 06, 2007 10:13 pm
Location: UK

Re: Cant load html content in doc prperly

Post by TerryE »

(pro originally send this as a Q to me as well as posting it.)

OOo Writer/Web does not evaluate any DHTML generated by embedded Javascript, and certainly does not support the more complex AJAX mechanisms. It also has limited contains on interpreting any referenced style sheets, so Writer/Web is the wrong tool to do what you seem to be attempting.

If you can reformulate your problem / approach then it might help. For example I use Writer/Web to draft / edit my blog, but work within a fixed style and massage the Writer HTML in PHP before storing it in my blog D/B.
Ubuntu 11.04-x64 + LibreOffice 3 and MS free except the boss's Notebook which runs XP + OOo 3.3.
Post Reply