@NovGetRight wrote: Sorry, I don't get it, I run your code in SAS UTF-8 and SAS EN, the result is same. I am not sure whether I expressed my question clearly, I means I hope to have a list of characters, which hex value is different between UTF-8 and WLATIN1, then I can use a macro to deal with all such cases.
There are only 256 possible characters in a single byte encoding system like WLATIN1.
Of those only the normal 7-bit ASCII characters, ones with codes of less than 128, are insured of being exactly the same.
It is practically impossible to to test all of the possible UTF-8 characters.
So instead just work on figuring the mapping of those 128 high order WLATIN1 character codes.
data char_check;
length decimal 8 different 8 hex $2 hexutf8 $8 utf8len 8 char $1 utf8char $4 char256 $256;
char256 = collate(0,255);
do decimal=128 to 255 ;
index=decimal+1;
hex=put(decimal,hex2.);
char=input(hex,$hex2.);
utf8char = kcvt(char,'wlatin1','utf-8');
different = char ne utf8char ;
utf8len=lengthn(utf8char)+(char=' ');
hexutf8=putc(utf8char,cats('$hex',2*utf8len,'.'));
output;
end;
drop char256 index ;
format char $hex2. utf8char $hex8.;
run;
1579 data _null_;
1580 set char_check;
1581 put hex $2. '->' hexutf8 $8. ' ' @;
1582 if mod(_n_+1,8)=1 then put;
1583 run;
80->E282AC 81->C281 82->E2809A 83->C692 84->E2809E 85->E280A6 86->E280A0 87->E280A1
88->CB86 89->E280B0 8A->C5A0 8B->E280B9 8C->C592 8D->C28D 8E->C5BD 8F->C28F
90->C290 91->E28098 92->E28099 93->E2809C 94->E2809D 95->E280A2 96->E28093 97->E28094
98->CB9C 99->E284A2 9A->C5A1 9B->E280BA 9C->C593 9D->C29D 9E->C5BE 9F->C5B8
A0->C2A0 A1->C2A1 A2->C2A2 A3->C2A3 A4->C2A4 A5->C2A5 A6->C2A6 A7->C2A7
A8->C2A8 A9->C2A9 AA->C2AA AB->C2AB AC->C2AC AD->C2AD AE->C2AE AF->C2AF
B0->C2B0 B1->C2B1 B2->C2B2 B3->C2B3 B4->C2B4 B5->C2B5 B6->C2B6 B7->C2B7
B8->C2B8 B9->C2B9 BA->C2BA BB->C2BB BC->C2BC BD->C2BD BE->C2BE BF->C2BF
C0->C380 C1->C381 C2->C382 C3->C383 C4->C384 C5->C385 C6->C386 C7->C387
C8->C388 C9->C389 CA->C38A CB->C38B CC->C38C CD->C38D CE->C38E CF->C38F
D0->C390 D1->C391 D2->C392 D3->C393 D4->C394 D5->C395 D6->C396 D7->C397
D8->C398 D9->C399 DA->C39A DB->C39B DC->C39C DD->C39D DE->C39E DF->C39F
E0->C3A0 E1->C3A1 E2->C3A2 E3->C3A3 E4->C3A4 E5->C3A5 E6->C3A6 E7->C3A7
E8->C3A8 E9->C3A9 EA->C3AA EB->C3AB EC->C3AC ED->C3AD EE->C3AE EF->C3AF
F0->C3B0 F1->C3B1 F2->C3B2 F3->C3B3 F4->C3B4 F5->C3B5 F6->C3B6 F7->C3B7
F8->C3B8 F9->C3B9 FA->C3BA FB->C3BB FC->C3BC FD->C3BD FE->C3BE FF->C3BF
NOTE: There were 128 observations read from the data set WORK.CHAR_CHECK.
... View more