COBOL - Funções intrínsecas - UVALID

Intrinsic functions - Enterprise COBOL for z/OS 6.3.0

If a character data item contains valid UTF-8 or UTF-16 data, the UVALID function returns the value zero.
If a character data item contains invalid UTF-8 or UTF-16 data, the UVALID function returns the index of the first invalid element.

The function type is integer.

argument-1: Must be of class alphabetic, alphanumeric, national or UTF-8.; Must be of class alphabetic, alphanumeric, or national.

The returned value is an integer, which differs based on argument-1, and is 9-digit if LP(32) is in effect or 18-digit if LP(64) is in effect:

If argument-1 is of class alphabetic, alphanumeric or UTF-8, and it consists of valid UTF-8 encoded Unicode data, the returned value is zero.
If argument-1 is of class alphabetic, or alphanumeric, and it consists of valid UTF-8 encoded Unicode data, the returned value is zero.
If argument-1 is of class alphabetic, alphanumeric or UTF-8, and it contains invalid UTF-8 encoded Unicode data, the returned value is the position of the first byte where the invalid UTF-8 data starts.
If argument-1 is of class alphabetic, or alphanumeric, and it contains invalid UTF-8 encoded Unicode data, the returned value is the position of the first byte where the invalid UTF-8 data starts.
If argument-1 is of class national, and it consists of valid UTF-16 encoded Unicode data, the returned value is zero.
If argument-1 is of class national, and it contains invalid UTF-16 encoded Unicode data, the returned value is the position of the first UTF-16 encoding unit where the invalid UTF-16 data starts. This position is one plus the number of well-formed UTF-16 encoding units that precede the invalid data.

Note:
The UVALID function indicates whether the character string contains well-formed Unicode UTF-8 or UTF-16 data.
It does not indicate whether any or all of the Unicode code points represented by the character string are assigned to characters.

For UTF-8 data, the validity of a byte varies according to its range as listed in the table:/p>

Table 1. Byte validity for UTF-8 data

Value Range	Dependency	Validity
x'00' - x'7F'	None	Valid
x'80' - x'C1'	None	Invalid
x'C2' - x'DF'	Followed by another byte that is in the range x'80' to x'BF'	Valid
x'E0' - x'EF'	If the first byte is x'E0', followed by two more bytes that meet the following requirements: The second byte is in the range x'A0' to x'BF' The third byte is in the range x'80' to x'BF'	Valid
	If the first byte is in the range x'E1' to x'EC', both the second and third bytes are in the range x'80' to x'BF'	Valid
	If the first byte is x'ED', followed by two more bytes that meet the following requirements: The second byte is in the range x'80' to x'9F' The third byte is in the range x'80' to x'BF'	Valid
	If the first byte is in the range x'EE' to x'EF', both the second and third bytes are in the range x'80' to x'BF'	Valid
x'F0' - x'F4'	If the first byte is x'F0', followed by three more bytes that meet the following requirements: The second byte is in the range x'90' to x'BF' The third byte is in the range x'80' to x'BF' The fourth byte is in the range x'80' to x'BF'	Valid
	If the first byte is in the range x'F1' to x'F3', all the second, third, and fourth bytes are in the range x'80' to x'BF'	Valid
	If the first byte is x'F4', followed by three more bytes that meet the following requirements: The second byte is in the range x'80' to x'8f' The third byte is in the range x'80' to x'BF' The fourth byte is in the range x'80' to x'BF'	Valid
x'F5' - x'FF'	None	Invalid

For UTF-16 data, the validity of an encoding unit varies according to its range as listed in the table:

Table 2. Encoding unit validity for UTF-16 data

Value Range	Dependency	Validity	Number of bytes if converted to UTF-8
nx'0000' - nx'007F'	None	Valid	1
nx'0080' - nx'07FF'	None	Valid	2
nx'0800' - nx'D7FF'	None	Valid	3
nx'D800' - nx'DBFF'	Must be followed by a second encoding unit with a value in the range nx'DC00' to nx'DFFF'	Valid	4 (A Unicode surrogate pair)
nx'D800' - nx'DBFF'	Other cases	Invalid	Not applicable
nx'E000' - nx'FFFF'	None	Valid	3

Example 1

If A is an alphabetic or alphanumeric data item that contains value x'4BC3A4666572' ('Käfer') in UTF-8 encoding, the returned value from UVALID(A) is 0.

Example 2

If B is a national data item that contains value x'005400F6006200750072D858DC6B0073' ('Töber??s') in UTF-16 encoding, the returned value from UVALID(B) is 0.

Example 3

If C is a national data item that contains value x'0054D9C3006200750072D858DC6B0073' in UTF-16 encoding, the returned value from UVALID(C) is 2 because x'D9C3' does not have a low surrogate pair.

Example 4

If D is a national data item that contains value x'005400F60062DC010072D858DC6B0073' in UTF-16 encoding, the returned value from UVALID(D) is 4 because x'DC01' does not have a corresponding high surrogate pair.