[Calc BASIC]Function to convert attributes and links to html

Shared Libraries
Forum rules
For sharing working examples of macros / scripts. These can be in any script language supported by OpenOffice.org [Basic, Python, Netbean] or as source code files in Java or C# even - but requires the actual source code listing. This section is not for asking questions about writing your own macros.
Post Reply
User avatar
Lupp
Volunteer
Posts: 3542
Joined: Sat May 31, 2014 7:05 pm
Location: München, Germany

[Calc BASIC]Function to convert attributes and links to html

Post by Lupp »

Considering a thread (viewtopic.php?f=9&t=85034) from the Calc branch I wrote a user function cAttribs2Html in BASIC.
The function is designed to analyse a cell for its content or the result of a formula and to transform the findings to html if there exist applicable inline tags and the control parameter of the function is containing a respective letter.
The function can handle the attributes "bold", "italic", "strike", and "underline". In addition links that are assigned to parts of the text content (or all of it) can be transformed into the respective html representation. As compared with the formatting possible in Calc, there is, of course, coarsening.
The code:

Code: Select all

Option Explicit

Function cAttribs2Html(ByVal pX As Long, ByVal pY As Long, ByVal pZ As Long, ByVal pAttribs As String, Optional ByVal pDummy) As String

'COMMENTS WELCOME! Wolfgang Jäger (Lupp; jag@psilosoph.de)

'The dummy parameter shall give the user a way to force recalculation on "any" event by placing NOW() or any volatile function there.
'Any correct expressin in this place will cause recalculation if one of the precedents is changed.
'The two remarks above only fully apply, of course, if 'AutoCalculate' is active.

'To pass the cell to work on by position should be an easy and reliable way. Counting starts with 1 for the parameters.

'The inline tags for striking through and for underscoring were deprecated with V5 of html, but are still offered here.
'I suppose the rendering of most browsers still supports them.
'pAttribs may contain any combination of the letters b, i, s, u, l in arbitrary order. Repetitions allowed (no difference). Empty allowed.
'The attributes Bold, Italic, Strike, Underline and Link are transcribed to html if the respective letter is present.
'In case of l the correct href is inserted when the <a ...> tag is prefixed.
'The plain text is returned as part of the result anyway.

'The function does not attempt to anlalyse formulae containing a call to the HYPERLINK function. Only the resulting string is processed.

	Dim theDoc As Object
theDoc = ThisComponent
If	NOT theDoc.SupportsService("com.sun.star.sheet.SpreadsheetDocument") Then
	cAttribs2Html = "Need Calc Document!"
	Exit Function
End If
Dim theSheets As Object, sheetTally As Long
theSheets = theDoc.Sheets
sheetTally = theSheets.GetCount()
If	(pZ < 1) OR (pZ > sheetTally) Then
	cAttribs2Html = "Bad SheetNumber!"
	Exit Function
End If
	Dim theSheet As Object
theSheet = theDoc.Sheets(pZ - 1)
If	(pX < 1) OR (pY < 1) or (pX > theSheet.RangeAddress.EndColumn + 1) OR (pY > theSheet.RangeAddress.EndRow + 1) Then
	cAttribs2Html = "Bad position!"
	Exit Function
End If
	Dim theCell As Object, rText As String
theCell = theSheet.GetCellByPosition(pX - 1, pY - 1)
rText = ""
'If we have a direct entry of type text it may consist of more than one TextPortions,
'zero or more of them possessing an attribute value to be handled here and/or an URL property.
'To get compact code also formula results are processed by the full-featured routine
'despite the fact that there only one textportion is supported.
	Dim theParEnum As Object, theParElement As Object
	Dim theSubEnum As Object, theSubElement As Object
	Dim textSlice
theParEnum = theCell.GetText().CreateEnumeration
Do	While	theParEnum.HasMoreElements
	theParElement = theParEnum.NextElement
	theSubEnum = theParElement.CreateEnumeration
	Do	While	theSubEnum.HasMoreElements
		textSlice = ""
		theSubElement = theSubEnum.NextElement
'The alternative below shall ensure that in case of a display formatted according to the 'Numbers' format code
'the displayed string is also used for the function result.
'For directly entered text the 'Numbers' format will be ignored.
		If	theCell.Type = 2 Then
			textSlice = theSubElement.String
		Else
			textSlice = theCell.String
		End If
		If	Instr(pAttribs, "b") > 0 Then
			If	theSubElement.CharWeight >= com.sun.star.awt.FontWeight.BOLD Then
				textSlice = "<b>" & textSlice & "</b>"
			End If
		End If
		If	Instr(pAttribs, "i") > 0 Then
			If	theSubElement.CharPosture >= 1 Then
				textSlice = "<i>" & textSlice & "</i>"
			End If
		End If
		If	Instr(pAttribs, "s") > 0 Then
			If	theSubElement.CharStrikeOut >= 1 Then
				textSlice = "<strike>" & textSlice & "</strike>"
			End If
		End If
		If	Instr(pAttribs, "u") > 0 Then
			If	theSubElement.CharUnderline >= 1 Then
				textSlice = "<u>" & textSlice & "</u>"
			End If
		End If
		If	Instr(pAttribs, "l") > 0 Then
			If	theSubElement.TextPortionType = "TextField" Then
				If	theSubElement.TextField.SupportsService("com.sun.star.text.TextField.URL") Then
					textSlice = "<a href=" & Chr(34) & theSubElement.TextField.URL & Chr(34) & ">" & textSlice & "</a>"
				End If
			End If
		End If
		rText = rText & textSlice
	Loop
Loop
cAttribs2Html = rText
End Function
The attached file is containing the function. There are also a few demonstrations.
Attribs2Html_1.ods
(21.41 KiB) Downloaded 425 times
===== Edit 2017-03-02 =====
Induced by posts into a thread of the Calc branch (viewtopic.php?f=9&t=85034) of the forum I would like to state that I provided this function for users probably knowing little about programming for Calc, but knowing everything they need to make use of the results. This is supposed to be mainly knowledge about html.

Concerning a "macro" preparing everything for export as a table into a correct html document I am surely not the one to be charged with the task.
But why should one want to do so. There is an 'Export' option for the purpose. I suppose any html editor will allow to paste the respective part into a target document.

This said.
As a rule the content of any single cell converted will be expected to be a paragraph in html (as it is treated in Calc itself). To get it as such you have to prefix <p> and to append </p> to the function result for each cell.

If the results are separated into an extra sheet, someone may want to export this sheet to a csv. The <p>...</p> pairs already created, this will require to choose NO column separator and NO text delimiter. I would also recommend to Copy/Paste Special... (results only) in advance.
A csv created this way, and just reextensioned as html, should be accepted and interpreted by one of the modern error-tolerant browsers despite of the fact that it probably lacks a lot of things specified as mandatory for correct html. However, you will not get a table structure this way. To know how to do that is your turn. I didn't claim to know much about html and the common editors. The plain html code I wrote to date consists of a few hundred lines, mostly done just for trying.

===== Edit 2018-06-24 =====
There was also a use-case where the conversion to html only helped to find the first character formatted to 'Italic' posture. The underlying intention was to split a column into two where the first part of any orginal cell content was text in default character style (without a markup) and the second part (a translation to a different language) was in 'Italic'.
https://ask.libreoffice.org/de/question ... en-suchen/
(The posts there are in German.)
Last edited by Lupp on Sun Jun 24, 2018 4:53 pm, edited 1 time in total.
On Windows 10: LibreOffice 24.2 (new numbering) and older versions, PortableOpenOffice 4.1.7 and older, StarOffice 5.2
---
Lupp from München
User avatar
Lupp
Volunteer
Posts: 3542
Joined: Sat May 31, 2014 7:05 pm
Location: München, Germany

Re: [Calc BASIC]function to convert attributes and links to

Post by Lupp »

As there were related topics in different forums recently, I provide here a reworked version of the function cAttribs2Html. There were few changes except that a "p" attribute was added to allow for the conversion to <p> ... </p> if paragraph breaks are found in the cell's content (.Type=2) or result string (otherwise). In fact what generally is called a line break, if occurring in a cell, is actually treated as a paragraph break if the .Text property is analysed by enumeration.
Changed: If "p" is not used, but paragraph breaks are found, they will now be converted to a standard space (U+0020) each.

Please excuse the length of the code. I tried to include a kind of help, and also to avoid exceptions.
(Editing: Tiny correction of the code concerning the trigger.)

Code: Select all

Function cAttribs2Html(pX As Long, pY As Long, pZ As Long, pAttribs As String, Optional pTrigger, Optional pSpecial As Object) As String
'V 1.0.2 Wolfgang Jäger 2017-11-20 replacing preliminary versions
'New: If the "p" attribute is missing in pAttribs, paragraph breaks (in Calc = line breaks) are replaced by an ordinary space each.
'COMMENTS WELCOME! Wolfgang Jäger (Lupp; jag@psilosoph.de)

'The optional parameter pTrigger shall give the user a way to force recalculation by a kind of listening or with the help of a volatile function.
'Any correct expression in this place will cause recalculation if one of the precedents is changed or the volatile function recalculates.
'The two remarks above only fully apply, of course, if 'AutoCalculate' is active. pTrigger is not accesed from the body of this function.

'To pass the cell to work on by position should be an easy and reliable way. Counting starts with 1 for the parameters.
'For calls from BASIC the function can also be used passing a cell object via pSpecial.
'If this parameter is present in the calling expression, has the TypeName(pSpecial) = "Object", and is a SheetCell object, the parameters pX, pY, pZ are 
'treated as if declared optional and will not be accessed. The positions must be present, of course, in the actual parameter list.

'The inline tags for striking through and for underscoring were deprecated with V5 of html, but are still offered here.
'I suppose the rendering of most browsers still supports them.
'pAttribs may contain any combination of the letters b, i, s, u, l, p in arbitrary order. Repetitions allowed (no difference). Empty allowed.
'The attributes Bold, Italic, Strike, Underline and Link are transcribed to html if the respective switching letter is present.
'In case of l the correct href="..." part is inserted when the <a ...> tag is prefixed.
'The handling of paragraphs was introduced later. Hard line breaks are treated as paragraph breaks. in Calc cells.
'The switching letter p, if present, causes <p>...</p> pairs to be inserted. Ottherwise paragraph breaks are replaced by an ordinary space.

'The plain text is returned as part of the result anyway.

'The function does not attempt to analyse formulas containing a call to the HYPERLINK function. Only the resulting string is processed.
'I do not know a way to get the URL linked in by HYPERLINK accessing a property of the cell or a service supported by it or by objects it may point to.
'Actually I suppose the link is assigned to the used part of the cell's area of the cell grid on the screen view by means similar to those used for CF.

   Dim theDoc As Object, theCell As Object
theDoc = ThisComponent
If   NOT theDoc.SupportsService("com.sun.star.sheet.SpreadsheetDocument") Then
   cAttribs2Html = "Need Calc Document!"
   Exit Function
End If
If NOT IsMissing(pSpecial) Then
   If TypeName(pSpecial) = "Object" Then
      If pSpecial.SupportsService("com.sun.star.sheet.SheetCell") Then
         theCell = pSpecial
         GoTo skippedGetCell
      Else
         cAttribs2Html = "pSpecial not valid"
         Exit Function
      End If
   End If
End If
   Dim theSheets As Object, sheetTally As Long
theSheets = theDoc.Sheets
sheetTally = theSheets.GetCount()
If   (pZ < 1) OR (pZ > sheetTally) Then
   cAttribs2Html = "Bad SheetNumber!"
   Exit Function
End If
   Dim theSheet As Object
theSheet = theDoc.Sheets(pZ - 1)
If   (pX < 1) OR (pY < 1) or (pX > theSheet.RangeAddress.EndColumn + 1) OR (pY > theSheet.RangeAddress.EndRow + 1) Then
   cAttribs2Html = "Bad position!"
   Exit Function
End If
theCell = theSheet.GetCellByPosition(pX - 1, pY - 1)

skippedGetCell:
   Dim rText As String
   Dim cB As Boolean, cI As Boolean, cS As Boolean, cU As Boolean, cL As Boolean, cP As Boolean, cR As Boolean ', simpleT As Boolean
cB = (Instr(pAttribs, "b")>0)
cI = (Instr(pAttribs, "i")>0)
cS = (Instr(pAttribs, "s")>0)
cU = (Instr(pAttribs, "u")>0)
cL = (Instr(pAttribs, "l")>0)
cP = (Instr(pAttribs, "p")>0)
rText = ""
'To get compact code also formula results are processed by the full-featured routine
'despite the fact that there no attributes as addressed by "bisul" are supported for text portions. Numerics will be treated based on the formatted strings.
   Dim theParEnum As Object, theParElement As Object
   Dim theSubEnum As Object, theSubElement As Object
   Dim textSlice
theParEnum = theCell.GetText().CreateEnumeration
'I think the outer loop will always prossess exactly once.
'??However, there seems to be no direct acces to the sole paragraph contained in the cell.
'I now think there is. Maybe this implementation cann survive a change to cells containing more than one paragraph, however.
Do   While   theParEnum.HasMoreElements
   theParElement = theParEnum.NextElement
   If cP Then
      rText = rText & "<p>"
   End If
   theSubEnum = theParElement.CreateEnumeration
   Do   While   theSubEnum.HasMoreElements
      textSlice = ""
      theSubElement = theSubEnum.NextElement
'The alternative below shall ensure that in case of a display formatted according to the 'Numbers' format code
'the displayed string is also used for the result of this function.
      If   theCell.FormulaResultType=1 Then
         textSlice = theCell.String
      Else
         textSlice = theSubElement.String
      End If
      If   cB Then
         If   theSubElement.CharWeight >= com.sun.star.awt.FontWeight.BOLD Then
            textSlice = "<b>" & textSlice & "</b>"
         End If
      End If
      If   cI Then
         If   theSubElement.CharPosture >= 1 Then
            textSlice = "<i>" & textSlice & "</i>"
         End If
      End If
      If   cS Then
         If   theSubElement.CharStrikeOut >= 1 Then
            textSlice = "<strike>" & textSlice & "</strike>"
         End If
      End If
      If   cU Then
         If   theSubElement.CharUnderline >= 1 Then
            textSlice = "<u>" & textSlice & "</u>"
         End If
      End If
      If   cL Then
         If   theSubElement.TextPortionType = "TextField" Then
            If   theSubElement.TextField.SupportsService("com.sun.star.text.TextField.URL") Then
               textSlice = "<a href=" & Chr(34) & theSubElement.TextField.URL & Chr(34) & ">" & textSlice & "</a>"
            End If
         End If
      End If
      rText = rText & textSlice
   Loop
   If cP Then
      rText = rText & "</p>"
   Else
      If theParEnum.HasMoreElements Then rText = rText & " "
   End If
Loop
cAttribs2Html = rText
End Function
On Windows 10: LibreOffice 24.2 (new numbering) and older versions, PortableOpenOffice 4.1.7 and older, StarOffice 5.2
---
Lupp from München
Post Reply