Page 1 of 1

IsAlpha string function for OO

Posted: Mon Sep 04, 2023 4:14 pm
by JeJe
Some posts in another thread were talking about Scriptforge, which has had me looking at their string functions and the OO i18n module.
Scriptforge is part of LibreOffice only though.

Here's an IsAlpha function for OO users. Slightly different from Scriptforge's. They discarded the api function's ability to work on only part of a string. There is a Windows API function for this too - that's not cross platform but only needs the declare.

I notice Scriptforge has a Capitalize function which looks to be covered already by strConv.

Feel free to post a better version than mine, other string functions...



Edit: note I've only used UnicodeType.UPPERCASE_LETTER or UnicodeType.LOWERCASE_LETTER to decide whether is Alpha.

TITLECASE_LETTER may be needed for some languages. The Constants are here:

https://www.openoffice.org/api/docs/com ... eType.html

Code: Select all

Sub Main
	MSGBOX IsAlphaOO("àén66ΣlPµp9(",0,2)
	MSGBOX IsAlphaOO("àén66ΣlPµp9(",3,4)
end sub


function IsAlphaOO(st as string,optional zerobasedA as long,optional zerobasedB as long) as boolean
	dim n as long, aLocale,i as long,CharClassification, a as long,b as long
	CharClassification = createUNOService("com.sun.star.i18n.CharacterClassification")
	aLocale = ThisComponent.CharLocale
	lenst = len(St)
	if lenst > 0 then
	if ismissing(zerobasedA) = true then
	a=0
	b= lenst-1
	else 
	a =zerobasedA
	b= zerobasedB 
	if (a<0 or a >= lenst or b<a or b<0 or b>= lenst) then exit function
	end if
		For i = a to b
			n = CharClassification.getType(st, i, aLocale)
			if (n <>1 and n <>2) then exit function	'com.sun.star.i18n.UnicodeType.UPPERCASE_LETTER .LOWERCASE_LETTER 
		Next
		IsAlphaOO = true
	end if
End function


Re: IsAlpha string function for OO

Posted: Tue Sep 05, 2023 12:25 am
by JeJe
An isAlpha function is just one case where you look for all characters being of certain unicode types (either upper or lower were chosen in my original post)

The more general function ContainsOnlyUnicodeTypes below allows search for all characters being of only a chosen unicode type or a chosen array of unicode types

ContainsUnicodeTypes is a general function allowing search for contains at least one character being of a given unicode type or array of unicode types

EDIT: CHANGED FIRST SUB NAME TO LESS CONFUSING ContainsOnlyUnicodeTypes
Edit2: minor correction in test sub descriptions

Code: Select all

Option Explicit
	REM  *****  BASIC  *****

Sub testSub

'''''''ContainsOnlyUnicodeTypes - every character of chosen unicode types
	dim UnicodeTypes
	
	'for isalpha choose unicode types upper and lower case
'	UnicodeTypes = array(com.sun.star.i18n.UnicodeType.UPPERCASE_LETTER,com.sun.star.i18n.UnicodeType.LOWERCASE_LETTER)
'	msgbox ContainsOnlyUnicodeTypes("777ppppp888",UnicodeTypes)
'	msgbox ContainsOnlyUnicodeTypes("777ppppp888",UnicodeTypes,3,5)
'	msgbox ContainsOnlyUnicodeTypes("777ppppp888",UnicodeTypes,1,5)

'	UnicodeTypes = com.sun.star.i18n.UnicodeType.UPPERCASE_LETTER
'	msgbox ContainsOnlyUnicodeTypes("777ppppp888",UnicodeTypes)
'	msgbox ContainsOnlyUnicodeTypes("PRTUER",UnicodeTypes)

''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''

'''''''ContainsUnicodeTypes - contain at least one character of chosen unicode types

'	UnicodeTypes = com.sun.star.i18n.UnicodeType.DECIMAL_DIGIT_NUMBER
'	msgbox ContainsUnicodeTypes("777ppppp888",UnicodeTypes)
'	msgbox ContainsUnicodeTypes("pppppp",UnicodeTypes)

'	UnicodeTypes = array(com.sun.star.i18n.UnicodeType.UPPERCASE_LETTER,com.sun.star.i18n.UnicodeType.LOWERCASE_LETTER)
'	msgbox ContainsUnicodeTypes("9K832737",UnicodeTypes,1,3)
	
end sub


function ContainsOnlyUnicodeTypes(st as string,UnicodeTypes,optional zerobasedA as long,optional zerobasedB as long) as boolean
	dim n as long, aLocale,i as long,CharClassification,lenst as long,ub as long, j as long, found as boolean,a as long, b as long

	CharClassification = createUNOService("com.sun.star.i18n.CharacterClassification")
	aLocale = ThisComponent.CharLocale
	lenst= len(st)
	if lenSt >0 then
		if ismissing(zerobasedA) = true then
			a=0
			b= lenst-1
		else
			a =zerobasedA
			b= zerobasedB
			if (a<0 or a >= lenst or b<a or b<0 or b>= lenst) then exit function
		end if

		if vartype(Unicodetypes) > 8192 then 'array
			ub = ubound(Unicodetypes)
			ContainsOnlyUnicodeTypes =true

			For i = a to b
				n = CharClassification.getType(st, i, aLocale)
				found = false
				for j = 0 to ub
					if n= UnicodeTypes(j) then
						found = true
						exit for
					end if
				next
				if found = false then
					ContainsOnlyUnicodeTypes = false
					exit for
				end if
			Next

		else

			ContainsOnlyUnicodeTypes =true
			For i = a to b
				n = CharClassification.getType(st, i, aLocale)
				if n<> UnicodeTypes then
					ContainsOnlyUnicodeTypes = false
					exit for
				end if
			Next

		end if
	end if
End function



function ContainsUnicodeTypes(st as string,UnicodeTypes,optional zerobasedA as long,optional zerobasedB as long) as boolean
	dim n as long, aLocale,i as long,CharClassification,lenst as long,ub as long, j as long,a as long, b as long

	CharClassification = createUNOService("com.sun.star.i18n.CharacterClassification")
	aLocale = ThisComponent.CharLocale
	lenst= len(st)
	if lenSt >0 then
		if ismissing(zerobasedA) = true then
			a=0
			b= lenst-1
		else
			a =zerobasedA
			b= zerobasedB
			if (a<0 or a >= lenst or b<a or b<0 or b>= lenst) then exit function
		end if

		if vartype(Unicodetypes) > 8192 then 'array
			ub = ubound(Unicodetypes)

			For i = a to b
				n = CharClassification.getType(st, i, aLocale)
				for j = 0 to ub
					if n= UnicodeTypes(j) then
						ContainsUnicodeTypes = true
						exit function
					end if
				next
			Next

		else

			For i = a to b
				n = CharClassification.getType(st, i, aLocale)
				if n= UnicodeTypes then
					ContainsUnicodeTypes = true
					exit for
				end if
			Next
		end if
	end if
End function


Re: IsAlpha string function for OO

Posted: Tue Sep 05, 2023 3:39 am
by JeJe
CharacterClassification's parseAnyToken is another way to write an IsAlpha function

Code: Select all

Option Explicit

'https://www.openoffice.org/api/docs/common/ref/com/sun/star/i18n/KParseTokens.html

sub testContainsOnlyParseTokens

	dim parseTokens
	with com.sun.star.i18n.KParseTokens 'For IsAlpha result use below tokens perhaps
		parseTokens = .ASC_UPALPHA or .ASC_LOALPHA or .UNI_UPALPHA or .UNI_LOALPHA
	end with

	msgbox ContainsOnlyParseTokens("TEoooEREu",parseTokens)
	msgbox ContainsOnlyParseTokens("TEooo EREu",parseTokens)
	msgbox ContainsOnlyParseTokens("6",parseTokens)
	msgbox ContainsOnlyParseTokens("Tu",parseTokens)

end sub

Function ContainsOnlyParseTokens (aText,parseTokens) as boolean
	dim alocale,npos,nStartCharFlags,aUserDefinedCharactersStart,nContCharFlags ,aUserDefinedCharactersCont,ret,CharClassification
	if len(atext) >0 then
		CharClassification = createUNOService("com.sun.star.i18n.CharacterClassification")
		aLocale = ThisComponent.CharLocale
		nPos =0
		nStartCharFlags = parseTokens
		aUserDefinedCharactersStart = ""
		nContCharFlags = parseTokens
		aUserDefinedCharactersCont = ""
		ret = CharClassification.parseAnyToken( aText,nPos,aLocale,nStartCharFlags,aUserDefinedCharactersStart,nContCharFlags,aUserDefinedCharactersCont )
		if ret.tokentype = com.sun.star.i18n.KParseType.IDENTNAME then
			ContainsOnlyParseTokens=( ret.CharLen= len(aText))
		end if
	end if
End function
Or we could trim the function in that code down to just:

Code: Select all


Function ContainsOnlyParseTokens (aText,parseTokens) as boolean
	dim ret,CharClassification
	if len(atext) >0 then
		CharClassification = createUNOService("com.sun.star.i18n.CharacterClassification")
		ret = CharClassification.parseAnyToken( aText,0,ThisComponent.CharLocale,parseTokens,"",parseTokens,"")
		if ret.tokentype = com.sun.star.i18n.KParseType.IDENTNAME then ContainsOnlyParseTokens=( ret.CharLen= len(aText))
	end if
End function

Re: IsAlpha string function for OO

Posted: Fri Sep 08, 2023 6:02 pm
by MrProgrammer
In an OpenOffice spreadsheet one can test if a cell's content is alphabetic, alphanumeric, numeric, or hexadecimal using the SEARCH function, as long as option Enable regular expressions in formulas is set. Options are set with OpenOffice → Preferences on a Mac, Tools → Options on other platforms. To test that a cell contains only alphabetic characters, SEARCH looks for a non-alphabetic character, that is, [^[:alpha:]]. If found, the test fails; if not found, the test succeeds.
202309062050.ods
Tests using SEARCH function
(16.39 KiB) Downloaded 412 times

I think that JeJe's functions are for use in a macro and are not intended to be called by a Calc formula, but I can imagine situations where it's helpful to perform these tests in a spreadsheet.

Re: IsAlpha string function for OO

Posted: Fri Sep 08, 2023 6:52 pm
by JeJe
Calc's Search is available for strings via functionAccess, but there's the same requirement to enable regular expressions under Options.

https://wiki.openoffice.org/wiki/Docume ... H_function

Re: IsAlpha string function for OO

Posted: Fri Sep 08, 2023 8:04 pm
by Villeroy
RegularExpressions, MatchWholeCell etc. are properties of service FunctionAccess.

Code: Select all

Function isAlnum(strVar) As Boolean
	ofa = createUnoService("com.sun.star.sheet.FunctionAccess")
    ofa.RegularExpressions = True
    x = False
    on error resume next
    	x = ofa.callFunction("SEARCH", Array("^[[:alpha:]]+$", strVar))
    isAlnum = x
End Function

Re: IsAlpha string function for OO

Posted: Fri Sep 08, 2023 10:00 pm
by JeJe
Interesting. I also notice looking at that, FunctionAccess can, in a limited way find a string across more than 1 paragraphs - something you can't do with a regular Writer document search.

eg: With a Writer document with this text
Blah blah

happy




bats
Blah blah
happy to bats including the paragraph mark characters is found.

Code: Select all


	ofa = createUnoService("com.sun.star.sheet.FunctionAccess")
    ofa.RegularExpressions = true
    str2 =thiscomponent.text.string
	STR1 = "happy" & chr(13) & chr(10) & chr(13) & chr(10) & chr(13) & chr(10) & chr(13) & chr(10) & chr(13) & chr(10)  & "bats"
	x = ofa.callFunction("SEARCH", array(STR1,STR2))
	msgbox x '=14
A regular search yields a void

Code: Select all


 oSearch = thiscomponent.createSearchDescriptor()
 With oSearch
 .SearchString ="happy" & chr(13) & chr(10) & chr(13) & chr(10) & chr(13) & chr(10) & chr(13) & chr(10) & chr(13) & chr(10)  & "bats"
 .SearchRegularExpression = True
 End With
 oFound = thiscomponent.findFirst(oSearch)
 mri ofound 'void
Edit: testing on OO

Re: IsAlpha string function for OO

Posted: Fri Sep 08, 2023 10:06 pm
by karolus
Maybe you should take a look at the string methods that have been around for 20 years in python, instead of (like the guys from the "script-forge" site) trying to recreate it somehow.
https://docs.python.org/3/library/stdty ... tr.isalnum

Re: IsAlpha string function for OO

Posted: Fri Sep 08, 2023 10:21 pm
by JeJe
karolus wrote: Fri Sep 08, 2023 10:06 pm Maybe you should take a look at the string methods that have been around for 20 years in python, instead of (like the guys from the "script-forge" site) trying to recreate it somehow.
https://docs.python.org/3/library/stdty ... tr.isalnum
Those functions are looking a bit like my second post - searching using the Unicode character types. Except mine is a more flexible general function which allows any choice or array of those. With a short look, doesn't seem to be one that does that. I'd never have coded it so quickly if I'd been faffing on with macros in text files rather than the Basic IDE.

Still, use python is another way indeed...

Re: IsAlpha string function for OO

Posted: Fri Sep 08, 2023 10:54 pm
by karolus
Except mine is a more flexible general function which allows any choice or array of those
youre sure?!

Code: Select all

somestring = "0123456789௨௦௨௧٢٠٢١२०२१"
somestring.isnumeric()

-> True
In the text, besides the digits 0-9, there is the number "2023" in Tamil, Devanagari and Arabic-indic

Re: IsAlpha string function for OO

Posted: Fri Sep 08, 2023 11:08 pm
by JeJe
Did you look at my second post?

The unicode types are here again:

http://www.openoffice.org/api/docs/comm ... eType.html

How, using your python link, can you do an examination of: only comprising of any chosen array of those?

Re: IsAlpha string function for OO

Posted: Fri Sep 08, 2023 11:29 pm
by karolus
If you need something like "contains-upper", you just take a simple regular expression. so what?

Re: IsAlpha string function for OO

Posted: Fri Sep 08, 2023 11:44 pm
by JeJe
That's not the same thing. The answer is you can't. And my function is more flexible. And you didn't look at it.

Re: IsAlpha string function for OO

Posted: Sat Sep 09, 2023 7:31 am
by karolus
JeJe wrote: Fri Sep 08, 2023 11:44 pm The answer is you can't. … And you didn't look at it.
if you believe that... never mind!

No, it's too stupid for me to dig through twelve meters of code.
Then to write something generic that does nothing but test a set of numbers to see if they fit between various upper and lower bounds.

Re: IsAlpha string function for OO

Posted: Sat Sep 09, 2023 8:24 am
by JeJe
You're protesting loudly.

If I want to see if a string is comprised only of UPPERCASE_LETTER, DECIMAL_DIGIT_NUMBER, or MATH_SYMBOL there's no looking through twelve meters of code, or writing anything else except easily writing the following call for my function:

Code: Select all

	UnicodeTypes = array(com.sun.star.i18n.UnicodeType.UPPERCASE_LETTER,com.sun.star.i18n.UnicodeType.DECIMAL_DIGIT_NUMBER, com.sun.star.i18n.UnicodeType.MATH_SYMBOL  )
	msgbox ContainsOnlyUnicodeTypes("HHH∂∃∄∅888",UnicodeTypes)
Its more flexible and concise having a generic sub or function rather than individual subs... and it would be a lot of subs... for each possible combination - only a few of which are available at your link.

Re: IsAlpha string function for OO

Posted: Sat Sep 09, 2023 9:39 am
by karolus
Hallo
For example, I hate it when I have to scroll right and left to read this code, so the first thing I would do (if I would):
thats_python.png
thats_python.png (22.19 KiB) Viewed 12954 times