Intrinsic functions - Enterprise COBOL for z/OS 6.3.0
If a character data item contains valid UTF-8 or UTF-16 data, the UVALID function returns the value zero.
If a character data item contains invalid UTF-8 or UTF-16 data, the UVALID function returns the index of the first invalid element.
The function type is integer.
- argument-1
- Must be of class alphabetic, alphanumeric, national or UTF-8.
- Must be of class alphabetic, alphanumeric, or national.
The returned value is an integer, which differs based on argument-1, and is 9-digit if LP(32) is in effect or 18-digit if LP(64) is in effect:
- If argument-1 is of class alphabetic, alphanumeric or UTF-8, and it consists of valid UTF-8 encoded Unicode data, the returned value is zero.
- If argument-1 is of class alphabetic, or alphanumeric, and it consists of valid UTF-8 encoded Unicode data, the returned value is zero.
- If argument-1 is of class alphabetic, alphanumeric or UTF-8, and it contains invalid UTF-8 encoded Unicode data, the returned value is the position of the first
byte where the invalid UTF-8 data starts.
- If argument-1 is of class alphabetic, or alphanumeric, and it contains invalid UTF-8 encoded Unicode data, the returned value is the position of the first
byte where the invalid UTF-8 data starts.
- If argument-1 is of class national, and it consists of valid UTF-16 encoded Unicode data, the returned value is zero.
- If argument-1 is of class national, and it contains invalid UTF-16 encoded Unicode data, the returned value is the position of the first UTF-16 encoding unit
where the invalid UTF-16 data starts. This position is one plus the number of well-formed UTF-16 encoding units that precede the invalid data.
Note:
The UVALID function indicates whether the character string contains well-formed Unicode UTF-8 or UTF-16 data.
It does not indicate whether any or all of the Unicode code points represented by the character string are assigned to characters.
For UTF-8 data, the validity of a byte varies according to its range as listed in the table:/p>
Table 1. Byte validity for UTF-8 data
| Value Range |
Dependency |
Validity |
| x'00' - x'7F' |
None |
Valid |
| x'80' - x'C1' |
None |
Invalid |
| x'C2' - x'DF' |
Followed by another byte that is in the range x'80' to x'BF' |
Valid |
| x'E0' - x'EF' |
If the first byte is x'E0', followed by two more bytes that meet the following requirements:
- The second byte is in the range x'A0' to x'BF'
- The third byte is in the range x'80' to x'BF'
|
Valid |
| If the first byte is in the range x'E1' to x'EC', both the second and third bytes are in the range x'80' to x'BF' |
Valid |
If the first byte is x'ED', followed by two more bytes that meet the following requirements:
- The second byte is in the range x'80' to x'9F'
- The third byte is in the range x'80' to x'BF'
|
Valid |
| If the first byte is in the range x'EE' to x'EF', both the second and third bytes are in the range x'80' to x'BF' |
Valid |
| x'F0' - x'F4' |
If the first byte is x'F0', followed by three more bytes that meet the following requirements:
- The second byte is in the range x'90' to x'BF'
- The third byte is in the range x'80' to x'BF'
- The fourth byte is in the range x'80' to x'BF'
|
Valid |
| If the first byte is in the range x'F1' to x'F3', all the second, third, and fourth bytes are in the range x'80' to x'BF' |
Valid |
If the first byte is x'F4', followed by three more bytes that meet the following requirements:
- The second byte is in the range x'80' to x'8f'
- The third byte is in the range x'80' to x'BF'
- The fourth byte is in the range x'80' to x'BF'
|
Valid |
| x'F5' - x'FF' |
None |
Invalid |
For UTF-16 data, the validity of an encoding unit varies according to its range as listed in the table:
Table 2. Encoding unit validity for UTF-16 data
| Value Range |
Dependency |
Validity |
Number of bytes if converted to UTF-8 |
| nx'0000' - nx'007F' |
None |
Valid |
1 |
| nx'0080' - nx'07FF' |
None |
Valid |
2 |
| nx'0800' - nx'D7FF' |
None |
Valid |
3 |
| nx'D800' - nx'DBFF' |
Must be followed by a second encoding unit with a value in the range nx'DC00' to nx'DFFF' |
Valid |
4 (A Unicode surrogate pair) |
| Other cases |
Invalid |
Not applicable |
| nx'E000' - nx'FFFF' |
None |
Valid |
3 |
Example 1
If A is an alphabetic or alphanumeric data item that contains value x'4BC3A4666572' ('Käfer') in UTF-8 encoding, the returned value from UVALID(A) is 0.
Example 2
If B is a national data item that contains value x'005400F6006200750072D858DC6B0073' ('Töber??s') in UTF-16 encoding, the returned value from UVALID(B) is 0.
Example 3
If C is a national data item that contains value x'0054D9C3006200750072D858DC6B0073' in UTF-16 encoding, the returned value from UVALID(C) is 2 because x'D9C3'
does not have a low surrogate pair.
Example 4
If D is a national data item that contains value x'005400F60062DC010072D858DC6B0073' in UTF-16 encoding, the returned value from UVALID(D) is 4 because x'DC01'
does not have a corresponding high surrogate pair.
© Copyright IBM Corp.