Page 1 of 1

[Solved] Unable to export a document to html

Posted: Tue Apr 28, 2009 10:00 pm
by Amunike
Hello Guys,

I'm trying to programatically export a word document to html format using OpenOffice 3 but I haven't had any luck so far. My computer's operating system is Windows XP.
I have tried the following filter with disastrous results: 'HTML', 'HTML (StarWriter)' and 'impress_html_Export'.

To give you an idea of the type of result I have been getting, here is a sample of the output file:

PK�����”šœ:^Æ2f'���'������mimetypeapplication/vnd.oasis.opendocument.textPK�����”šœ:���������������Configurations2/statusbar/PK�����”šœ:���������������Configurations2/floater/PK�����”šœ:���������������Configurations2/popupmenu/PK�����”šœ:���������������Configurations2/progressbar/PK�����”šœ:���������������Configurations2/menubar/PK�����”šœ:���������������Configurations2/toolbar/PK�����”šœ:���������������Configurations2/images/Bitmaps/PK���”šœ:������������-���Pictures/200000B7000089FA0000AB416FBC0CF2.wmf|œuÜVÅó°wföÜtKww§ H‰€ˆ„ (ŠÒÝÝÝÝÒ-Ý-‚ÒRJ‡(ïuß‚ïï?_ŸÏå9gsvvfvöÍ9+X†¹ÑÿS;‡ á·.óár‘ÈÓK¬È3ôú[E#}D_ñO¸$—$qQyþ¥áòþ —çç¢ð|Á˜K6ü·T\.ž


My actual code is as follow:

Code: Select all

using System;
using System.Collections.Generic;
using System.Linq;
using System.Web;
using System.Web.UI;
using System.Web.UI.WebControls;
using unoidl.com.sun.star.lang;
using unoidl.com.sun.star.uno;
using unoidl.com.sun.star.bridge;
using unoidl.com.sun.star.frame;
using Microsoft.Win32;
using System.Runtime.InteropServices;
using System.IO;
using System.Diagnostics;

namespace WebApplicationOpenOff
{
    public partial class _Default : System.Web.UI.Page
    {
        protected void Page_Load(object sender, EventArgs e)
        {
            InitOpenOfficeEnvironment();
        }

        protected void Button1_Click(object sender, EventArgs e)
        {

            if (StartOpenOffice())
            {
                //Get a ComponentContext
                unoidl.com.sun.star.uno.XComponentContext xLocalContext =
                   uno.util.Bootstrap.bootstrap();
                //Get MultiServiceFactory
                unoidl.com.sun.star.lang.XMultiServiceFactory xRemoteFactory =
                   (unoidl.com.sun.star.lang.XMultiServiceFactory)
                   xLocalContext.getServiceManager();
                //Get a CompontLoader
                XComponentLoader aLoader =
                   (XComponentLoader)xRemoteFactory.createInstance("com.sun.star.frame.Desktop");
                //Load the sourcefile
                XComponent xComponent = initDocument(aLoader,
                   PathConverter("C:\\Documents and Settings\\Soumah\\Desktop\\Test-Word-Doc.doc"), "_blank");
                //Wait for loading
                while (xComponent == null)
                {
                    System.Threading.Thread.Sleep(1000);
                }
                saveDocument(xComponent, PathConverter("C:\\Documents and Settings\\Soumah\\Desktop\\Test-Word-Doc.html"));
                //Wait for input
                Console.WriteLine("Conversation completed!");
            }

        }


        private void InitOpenOfficeEnvironment()
        {
            string baseKey;
            // OpenOffice being a 32 bit app, its registry location is different in a 64 bit OS  
            if (Marshal.SizeOf(typeof(IntPtr)) == 8)
                baseKey = @"SOFTWARE\Wow6432Node\OpenOffice.org\";
            else
                baseKey = @"SOFTWARE\OpenOffice.org\";

            // Get the URE directory  
            string key = baseKey + @"Layers\URE\1";
            RegistryKey reg = Registry.CurrentUser.OpenSubKey(key);
            if (reg == null) reg = Registry.LocalMachine.OpenSubKey(key);
            string urePath = reg.GetValue("UREINSTALLLOCATION") as string;
            reg.Close();
            urePath = Path.Combine(urePath, "bin");

            // Get the UNO Path  
            key = baseKey + @"UNO\InstallPath";
            reg = Registry.CurrentUser.OpenSubKey(key);
            if (reg == null) reg = Registry.LocalMachine.OpenSubKey(key);
            string unoPath = reg.GetValue(null) as string;
            reg.Close();

            string path;
            path = string.Format("{0};{1}", System.Environment.GetEnvironmentVariable("PATH"), urePath);
            System.Environment.SetEnvironmentVariable("PATH", path);
            System.Environment.SetEnvironmentVariable("UNO_PATH", unoPath);
        }

        /// <summary>
        /// Load a given file or create a new blank file
        /// </summary>
        /// <param name="aLoader">A ComponentLoader</param>
        /// <param name="file">The file</param>
        /// <param name="target">The target</param>
        /// <returns>Th Component</returns>
        static XComponent initDocument(
           XComponentLoader aLoader, string file, string target
           )
        {
            XComponent xComponent = aLoader.loadComponentFromURL(
               file, target, 0,
               new unoidl.com.sun.star.beans.PropertyValue[0]);

            return xComponent;
        }

        /// <summary>
        /// Saves the document.
        /// </summary>
        /// <param name="xComponent">The x component.</param>
        /// <param name="fileName">Name of the file.</param>
        static void saveDocument(XComponent xComponent, string fileName)
        {
            unoidl.com.sun.star.beans.PropertyValue[] propertyValue =
               new unoidl.com.sun.star.beans.PropertyValue[1];

            propertyValue[0] = new unoidl.com.sun.star.beans.PropertyValue();
            propertyValue[0].Name = "Filter";
            propertyValue[0].Value = new uno.Any("writer_html_Export");

            ((XStorable)xComponent).storeToURL(fileName, propertyValue);
        }

        /// <summary>
        /// Convert into OO file format
        /// </summary>
        /// <param name="file">The file.</param>
        /// <returns>The converted file</returns>
        private static string PathConverter(string file)
        {
            try
            {
                file = file.Replace(@"\", "/");

                return "file:///" + file;
            }
            catch (System.Exception ex)
            {
                throw ex;
            }
        }

        

        /// <summary>
        /// Starts the open office.
        /// </summary>
        /// <returns></returns>
        private static bool StartOpenOffice()
        {
            Process[] ps = Process.GetProcessesByName("soffice.exe");
            if (ps != null)
            {
                if (ps.Length > 0)
                    return true;
                else
                {
                    Process p = Process.Start("soffice.exe");
                    //spent some time to start
                    System.Threading.Thread.Sleep(3000);
                }
            }
            return true;
        }




    }
}

Please help me spot the problem.

Thank you in advance.


-Amunike

Re: unable to export a document to html

Posted: Tue Apr 28, 2009 10:05 pm
by Villeroy

Re: unable to export a document to html

Posted: Tue Apr 28, 2009 10:36 pm
by Amunike
I forgot to add a detail:

The file "TypeDetection.xml" is nowhere to be found in my installation. I've read on the web that I'm supposed to have it but it wasn't included in my brand new installation of OpenOffice 3.

Re: Unable to export a document to html

Posted: Tue Apr 28, 2009 10:45 pm
by Villeroy
Another converter in Java: http://www.artofsolving.com/opensource/jodconverter

Moderation: moved from "General" to "External Programs"

Re: Unable to export a document to html

Posted: Tue Apr 28, 2009 10:51 pm
by Amunike
Thanks for the tips villeroy. However i am not trying to buy a solution, I am trying to implement one.

Re: Unable to export a document to html

Posted: Tue Apr 28, 2009 11:00 pm
by Villeroy
Somebody has done this job already. No need to buy anything. Just download and use it.
http://sourceforge.net/project/showfile ... p_id=91849

http://sourceforge.net/project/shownote ... _id=675054
Licenses
========

JODConverter is distributed under the terms of the LGPL.

This basically means that you are free to use it in both open source
and commercial projects.

If you modify the library itself you are required to contribute
your changes back, so JODConverter can be improved.

(You are free to modify the sample webapp as a starting point for your
own webapp without restrictions.)

JODConverter includes various third-party libraries so you must
agree to their respective licenses - included in docs/third-party-licenses.

That may include software developed by

* the Apache Software Foundation (http://www.apache.org)
* the Spring Framework project (http://www.springframework.org)

Re: Unable to export a document to html

Posted: Wed Apr 29, 2009 3:09 am
by Amunike
Thanks for all your replies villeroy. I'm really looking to fix my code...a thirs party solution is not inline with my project requierements. I would really appreciate if you had any idea of how I can fix my code.....

-Amunike

Re: Unable to export a document to html

Posted: Wed Apr 29, 2009 9:16 am
by Villeroy
Debug? Read other people's code? The current output looks like a typical Writer document, which is a zip archive containing XML mainly.

Whatever language that is, are you shure that ...

Code: Select all

new unoidl.com.sun.star.beans.PropertyValue[1];
... initializes an array of exactly one property value?

Does your project requirements really rely on a 300MB office suite to convert doc2html? "doc2html" throws thousands of google results unrelated to OOo.

Re: Unable to export a document to html

Posted: Wed Apr 29, 2009 1:17 pm
by mnasato
Amunike wrote:

Code: Select all

            propertyValue[0].Name = "Filter";
That should be "FilterName".

Re: Unable to export a document to html

Posted: Wed Apr 29, 2009 4:47 pm
by Amunike
Hi guys,

Thanks for your help. I managed to solve my problem with this:

http://user.services.openoffice.org/en/ ... 20&t=12371

I was basically using the wrong filter name.

Regards,
Amunike.