[Solved] Math Formulas as code?

Discuss the word processor
Post Reply
kkaminsk
Posts: 2
Joined: Mon Aug 14, 2017 2:30 pm

[Solved] Math Formulas as code?

Post by kkaminsk »

Hey y'all,

I have a lot of documents with solved Math problems in OpenOffice Writer with Math formulas in them.

We want to find a quickest way to export that content into Wordpress which uses LaTeX Math language.

OpenOffice Math formulas code is very similar to LaTex.

If I could display all the formulas inside the document NOT as formulas but as Math formulas code, that would make our life much easier and would save 10s of hours.

Or alternatively any other way to change OO docs with formulas to Wordpress text with LaTeX code

Thanks in advance!
Last edited by Hagar Delest on Tue Aug 15, 2017 5:55 pm, edited 1 time in total.
Reason: tagged [Solved].
APACHE OPENOFFICE 4.1.2 Mac 10.10.5
User avatar
Villeroy
Volunteer
Posts: 31279
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany

Re: Math Formulas as code?

Post by Villeroy »

Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
User avatar
Lupp
Volunteer
Posts: 3548
Joined: Sat May 31, 2014 7:05 pm
Location: München, Germany

Re: Math Formulas as code?

Post by Lupp »

(I did not check any of the solutions offered where Villeroy linked to. As I just had writen a response I post it nonetheless. Feel free to ignore it. Of course you should prefer a well proven solution if any.)

In a different forum (on LibO - same breed) there just was this thread:
https://ask.libreoffice.org/en/question ... ce-writer/
where you can find how to access the text representations of the formulae.

I would suggest you create a dedicated tecxtdocument then where you insert the formulae one per paragraph, e.g.
For a somehow similar request I once sketched the code below from which you can take everythnig not contained in "librebel"'s post

Code: Select all

REM This procedure was sketched because questions about moving the textual
REM content from pdf files opened in 'Draw' into an actual text file come up
REM every few days, and there was not offered a solution yet, as far as I know.
REM 
REM OF COURSE, this provisional code cannot replace a thorough solution
REM to the problem (if actually needed at all).
REM In specific there is not made an attempt to resolve groups or to process
REM the 'Draw' objects regarding their position. The sequencing of texts goes 
REM along the logical order of the objects.
REM For a PDF automatically imported by 'Draw' this should work.
sub c_ExportTextFromDrawToWriterDoc(optional pNum as Long)
	dim doc0 as Object, page as Object, shape as Object, shapeText as String
	dim doc1 as Object, tText as Object,vCur  as Object, tCur      as Object
	dim i as Long, j as Long, k as Long, m as Long, n as Long, low as Long, high as Long
	dim location as String, newLocation as String, alert as String
	dim unresolvedSignal as String
unresolvedSignal = "%&@~+!!\µ~*?§" 'Arbitray string not occurring somewhere else in the universe!
doc0 = ThisComponent
if IsMissing(pNum) then pNum = 0
m = doc0.DrawPages().Count()
if (m<pNum) OR (pNum<0) then
	MsgBox "No page "+pNum+" available!"
	exit Sub
endif
location = doc0.GetLocation
newLocation = location+".odt"
if FileExists(newLocation) then 
	alert = "Warning! The destination file "+Chr(13)+ newLocation+Chr(13)+ _
	"already exists. Please delete or rename it before calling this procedure again!"
	MsgBox alert
	exit sub
endif

REM Specifically relevant for https://forum.openoffice.org/en/forum/posting.php?mode=reply&f=7&t=89962
doc1 = StarDesktop.LoadComponentFromUrl("private:factory/swriter", "_blank", 0, Array())
doc1.GetCurrentController().GetFrame().GetContainerWindow().SetVisible(False)
doc1.StoreAsUrl(newLocation,Array())
tText = doc1.getText()
vCur = doc1.CurrentController.getViewCursor()
tCur = tText.createTextCursorByRange(vCur.GetEnd())
low = 0 : if pNum > 0 then low = pNum-1
if pNum=0 then 
	high = m - 1
else 
	high = pNum - 1
endif

for i = low to high
	tText.insertString(tCur, "------PAGE "+(i+1)+ "------", False)
	tText.insertControlCharacter(tCur, com.sun.star.text.ControlCharacter.PARAGRAPH_BREAK, False)
	tText.insertControlCharacter(tCur, com.sun.star.text.ControlCharacter.PARAGRAPH_BREAK, False)
	k = 0
	page = doc0.DrawPages(i)
	n = page.Count()
	for j = 0 to n - 1
		shape = page.GetByIndex(j)
		shapeText = unresolvedSignal
		on error resume next
		shapeText = shape.Text.String
		on error goto 0
		if shapeText = unresolvedSignal then
			k = k + 1
		else

REM Specifically relevant for https://forum.openoffice.org/en/forum/posting.php?mode=reply&f=7&t=89962
			tText.insertString(tCur, shapeText, False)
			tText.insertControlCharacter(tCur, com.sun.star.text.ControlCharacter.PARAGRAPH_BREAK, False)
		endif
	next j
	tText.insertControlCharacter(tCur, com.sun.star.text.ControlCharacter.PARAGRAPH_BREAK, False)
	tText.insertString(tCur, "There were "+k+ " unresolved objects on page "+(i+1)+".", False)
	tText.insertControlCharacter(tCur, com.sun.star.text.ControlCharacter.PARAGRAPH_BREAK, False)
	tText.insertControlCharacter(tCur, com.sun.star.text.ControlCharacter.PARAGRAPH_BREAK, False)
next i
doc1.Store
doc1.Close(True)
end sub
On Windows 10: LibreOffice 24.2 (new numbering) and older versions, PortableOpenOffice 4.1.7 and older, StarOffice 5.2
---
Lupp from München
John_Ha
Volunteer
Posts: 9584
Joined: Fri Sep 18, 2009 5:51 pm
Location: UK

Re: Math Formulas as code?

Post by John_Ha »

According to the LO manual, LO allows you to export formulae as a MathML file by File > Save As ..., and choose MathML.

If you unzip the .odt file and extract content.xml from the Object_n folders you get as below for the formula for solving quadratic equations. Does that go easily into Latex?

Code: Select all

<?xml version="1.0" encoding="UTF-8"?>
<math xmlns="http://www.w3.org/1998/Math/MathML">
	<semantics>
		<mrow>
			<mrow>
				<mi>x</mi>
				<mo stretchy="false">=</mo>
				<mfrac>
					<mrow>
						<mrow>
							<mrow>
								<mo stretchy="false">−</mo>
								<mi>b</mi>
							</mrow>
							<mo stretchy="false">±</mo>
							<msqrt>
								<mrow>
									<mrow>
										<msup>
											<mi>b</mi>
											<mn>2</mn>
										</msup>
										<mo stretchy="false">−</mo>
										<mn>4ac</mn>
									</mrow>
								</mrow>
							</msqrt>
						</mrow>
					</mrow>
					<mrow>
						<mn>2a</mn>
					</mrow>
				</mfrac>
			</mrow>
		</mrow>
		<annotation encoding="StarMath 5.0">x = { -b +- sqrt { b^2 - 4ac } } over { 2a }</annotation>
	</semantics>
</math>
LO 6.4.4.2, Windows 10 Home 64 bit

See the Writer Guide, the Writer FAQ, the Writer Tutorials and Writer for students.

Remember: Always save your Writer files as .odt files. - see here for the many reasons why.
User avatar
Lupp
Volunteer
Posts: 3548
Joined: Sat May 31, 2014 7:05 pm
Location: München, Germany

Re: Math Formulas as code?

Post by Lupp »

If you still want to extract Math formulae in text representation for editing: I merged and simplified the mentioned code:

Code: Select all

Sub writeFormulaStringsToFile()
doc0 = ThisComponent
loc0 = doc0.Location
loc1 = loc0 & "_MathFo.odt"
If FileExists(loc1) Then
   Alert = "Warning! The destination file "+Chr(13)+ loc1+Chr(13)+ _
   "already exists. Please delete or rename it before calling this procedure again!"
   MsgBox(Alert)
   Exit Sub
End If
    Dim args1(0) as new com.sun.star.beans.PropertyValue		
args1(0).Name  = "Hidden"
args1(0).Value = False     'For a batch processing set to True.

doc1   = StarDesktop.LoadComponentFromUrl("private:factory/swriter", "_blank", 0, args1)
doc1.StoreAsUrl(loc1, Array())
text1  = doc1.GetText
vCur1  = doc1.CurrentController.getViewCursor()
tCur1  = text1.CreateTextCursorByRange(vCur1.GetEnd())
embObs = doc0.EmbeddedObjects()
For j = 0 To embObs.Count - 1
    embOb    = embObs(j)
    theRelOb = embOb.GetEmbeddedObject
    If NOT theRelOb.SupportsService("com.sun.star.formula.FormulaProperties") Then Goto nextObject  REM Edit 2017-08-16: Better test for service than for ImplementationName.
    theFormula = theRelOb.Formula
    text1.insertString(tCur1, embOb.Name, False)
    text1.insertControlCharacter(tCur1, com.sun.star.text.ControlCharacter.PARAGRAPH_BREAK, False)
    text1.insertString(tCur1, theFormula, False)
    text1.insertControlCharacter(tCur1, com.sun.star.text.ControlCharacter.PARAGRAPH_BREAK, False)
nextObject:
Next j
doc1.Store()
doc1.Close(True)
End Sub
It assumes to be run from the documnt containing the formulae.
(I had to edit the code a bit. See REM lines!)

(Little Rectification of the code 2017-08-15)
(Another small improvement of the code 2017-08-06)
Last edited by Lupp on Wed Aug 16, 2017 3:26 pm, edited 5 times in total.
On Windows 10: LibreOffice 24.2 (new numbering) and older versions, PortableOpenOffice 4.1.7 and older, StarOffice 5.2
---
Lupp from München
jrkrideau
Volunteer
Posts: 3816
Joined: Sun Dec 30, 2007 10:00 pm
Location: Kingston Ontario Canada

Re: Math Formulas as code?

Post by jrkrideau »

Perhaps http://writer2latex.sourceforge.net/ It looks interesting.

It did export a simple bit of text cleanly with LO (\documentclass[a4paper]{article} seems to be the default)

I did not test it with a Math object as it's been years since I wrote any Math in AOO or LO and I forget how to do it.
LibreOffice 7.3.7. 2; Ubuntu 22.04
kkaminsk
Posts: 2
Joined: Mon Aug 14, 2017 2:30 pm

Re: Math Formulas as code?

Post by kkaminsk »

Lupp wrote:If you still want to extract Math formulae in text representation for editing: I merged and simplified the mentioned code:
It worked! Thanks a lot @Lupp
I got a file with "Object X" and Math formula text in separate lines, where X is a count number.

Do you see an easy way to change text (normal text, simple formatting and Math formulas objects) to text (same normal text, same simple formatting and Math formulas in between $ $).
APACHE OPENOFFICE 4.1.2 Mac 10.10.5
John_Ha
Volunteer
Posts: 9584
Joined: Fri Sep 18, 2009 5:51 pm
Location: UK

Re: Math Formulas as code?

Post by John_Ha »

kkaminsk wrote:Do you see an easy way to change text (normal text, simple formatting and Math formulas objects) to text (same normal text, same simple formatting and Math formulas in between $ $).
Please give a specific example of what you intend to start with, and what you want to change it to.

You can almost certainly make the necessary changes with Regular Expressions - see [Tutorial] How to record a macro (and Regular Expressions).
LO 6.4.4.2, Windows 10 Home 64 bit

See the Writer Guide, the Writer FAQ, the Writer Tutorials and Writer for students.

Remember: Always save your Writer files as .odt files. - see here for the many reasons why.
User avatar
Lupp
Volunteer
Posts: 3548
Joined: Sat May 31, 2014 7:05 pm
Location: München, Germany

Re: Math Formulas as code?

Post by Lupp »

kkaminsk wrote:Do you see an easy way to change text (normal text, simple formatting and Math formulas objects) to text (same normal text, same simple formatting and Math formulas in between $ $)
As "John_Ha" already suggested 'F&R' with RegEx enabled is the appropriate tool. In fact the raw code I posted was mainly induced by https://ask.libreoffice.org/en/question ... ce-writer/ , the stepsister forum you already know. I didn't post it there because there a write-back part was needed and I didn't want to use the time to write the respective code.
For your purpose: I don't know if enclosing the text representation of a 'Math' formula with a pair of $ signs will help much.

And: Assumed you are working on a 'Writer' file created with the Sub I provided, it is not exatly simple to get by 'F&R' what you need. More than one step needed.
Thus I would advise to make a simple change to that code: Replace the two lines

Code: Select all

    text1.insertString(tCur1, embOb.Name, False)
    text1.insertControlCharacter(tCur1, com.sun.star.text.ControlCharacter.PARAGRAPH_BREAK, False)
with the single line

Code: Select all

    text1.insertString(tCur1, embOb.Name & "::: ", False)
The triple colon followed by a space should be a sufficiently unambiguous syntactical marker usable in Regex for 'F&R' to easily achieve everything you want now.

There is another little problem I would suggest you solve it with 'F&R':
Math formulae can contain line breaks and additional spaces not influencing the appearance of the formulae, but useful while edditing them.
Shall these arbitrary witespace be kept or omitted or ...?
If there are linebreaks at all in your formulae you need to understand that they are NOT accepted by the RegEx placeholder ".".
You need to explicitly accept them. Instead of

Code: Select all

.*
e.g. you need to use the slightly more complicated

Code: Select all

(.|\n)*
On Windows 10: LibreOffice 24.2 (new numbering) and older versions, PortableOpenOffice 4.1.7 and older, StarOffice 5.2
---
Lupp from München
User avatar
Lupp
Volunteer
Posts: 3548
Joined: Sat May 31, 2014 7:05 pm
Location: München, Germany

Re: [Solved] Math Formulas as code?

Post by Lupp »

Someone also wanting the formula codes collected in a file, but preferring a Calc document over a Writer document as the target for the purpose (as I would do) may use the code shown below.

Code: Select all

Sub callHelperC()
writeFormulaStringsToFileC(False, True) REM Put in the wanted values before running the helper.
End Sub

Sub writeFormulaStringsToFileC(Optional ByVal pLoadHidden As Boolean, Optional ByVal pRemoveLineBreaks As Boolean)
doc0 = ThisComponent
loc0 = doc0.Location
loc1 = loc0 & "_MathFo.ods"
If FileExists(loc1) Then
   Alert = "Warning! The destination file "+Chr(13)+ loc1+Chr(13)+ _
   "already exists. Please delete or rename it before calling this procedure again!"
   MsgBox(Alert)
   Exit Sub
End If
rLB  = False
If Not IsMissing(pRemoveLineBreaks) Then rLB = pRemoveLineBreaks
    Dim args1(0) as new com.sun.star.beans.PropertyValue		
args1(0).Name  = "Hidden"
args1(0).Value = False
If Not Ismissing(pLoadHidden) Then args1(0).Value = pLoadHidden
doc1  = StarDesktop.LoadComponentFromUrl("private:factory/scalc", "_blank", 0, args1)
doc1.StoreAsUrl(loc1, Array())
sheet = doc1.Sheets(0)
y = 0 : xA = 0 : xB = 1
cellA        = sheet.GetCellByPosition(xA, y) : cellB        = sheet.GetCellByPosition(xB, y)
cellA.String = "OBJECT"                       : cellB.String = "FORMULA_CODE"
embObs = doc0.EmbeddedObjects()
For j = 0 To embObs.Count - 1
    embOb        = embObs(j)
    theFormulaOb = embOb.GetEmbeddedObject
    If NOT theFormulaOb.SupportsService("com.sun.star.formula.FormulaProperties") Then Goto nextObject
    theFormula   = theFormulaOb.Formula
    If rLB AND (InStr(1, theFormula, Chr(10))>0) Then
        fSplit     = Split(theFormula, Chr(10))
        theFormula = Join(fSplit, "")
    End If
    y = y + 1
    cellA = sheet.getCellByPosition(xA, y)    : cellB = sheet.getCellByPosition(xB, y)
    cellA.String = embOb.Name REM Special separator not needed here having seoparate cells for name and formula & "::: "
    cellB.String = theFormula
nextObject:
Next j
cellA.Columns.OptimalWidth = True              : cellB.Columns.OptimalWidth = True
If NOT rLB Then                                  sheet.Rows.OptimalHeight   = True
doc1.Store()
doc1.Close(True)
End Sub
On Windows 10: LibreOffice 24.2 (new numbering) and older versions, PortableOpenOffice 4.1.7 and older, StarOffice 5.2
---
Lupp from München
Post Reply