Czech republic letters example

kumarsandip975 · Posted 06-10-2024 05:34 PM

Hi Everyone,

I am getting below error message when trying to run below code with ENCODING=WLATIN1 from SAS Studio. Can you please suggest on it.

Note : The same code is running fine with ENCODING=UTF-8

DATA T1;
var1="aábcčdď";
run;

kumarsandip975 · Posted 06-10-2024 05:49 PM

Additionally, I tried something to make Czech republic character , but same error message.

https://support.sas.com/documentation/cdl/en/nlsref/61893/HTML/default/viewer.htm#a002613623.htm

data t1 encoding=wlatin2 locale=cs_CZ DFLANG=Czech DATESTYLE=DMY PAPERSIZE=A4;
var1="aábcčdď";
run;

kumarsandip975 · Posted 06-15-2024 05:58 PM

If anyone can suggest here, would be great help.

SASKiwi · Posted 06-15-2024 07:52 PM

If you can get Czech characters to work OK using UTF-8 coding then what is your question? From my understanding the Latin1 does not include Czech characters but Latin2 does. See Wikipedia: https://en.m.wikipedia.org/wiki/ISO/IEC_8859-2

I suggest you open a track with Tech Support to confirm which encodings in SAS support Czech characters.

Tom · Posted 06-15-2024 08:18 PM

Use UTF-8 session to see what actual bytes SAS is storing for that string:

 73         data T1;
 74           var1='aábcčdď';
 75           put var1= / var1 $hex.;
 76         run;
 
 var1=aábcčdď
 61C3A16263C48D64C48F

Two of those characters do not exist in the LATIN1 encoding.

U+010D

č

c4 8d

LATIN SMALL LETTER C WITH CARON

U+010F

ď

c4 8f

LATIN SMALL LETTER D WITH CARON

One of them does, but would use only one byte and not two in LATIN1.

U+00E2

â

c3 a2

LATIN SMALL LETTER A WITH CIRCUMFLEX

How do you propose to enter them into your program when using an encoding that does not support those characters? How do you propose to print them?

In theory you could write the HEX code for those characters instead.

 73         data T1;
 74           var1='aábcčdď';
 75           var2='a'||'c3a1'x||'bc'||'c48d'x||'d'||'c48f'x;
 76           put var1= / var2= / var1=$hex. / var2=$hex.;
 77         run;
 
 var1=aábcčdď
 var2=aábcčdď
 var1=61C3A16263C48D64C48F
 var2=61C3A16263C48D64C48F

kumarsandip975 · Posted 06-17-2024 07:18 AM

Thanks everyone for the reply.

Actually, this is not the original test case, I can try to explain a bit.

We have integrated into SAS Studio to access SharePoint sites, so ideally we can import/export data between SAS Studio to Sharepoint.

As you know, this connection is based on API's from Microsoft and from sas we can get response and push the file on sharepoint via proc https.

When we are trying to get response into json from sharepoint from sas studio(with proc https) , the json is not coming correctly with wlatin(u8 is all ok) encoding when reading details from sharepoint for Czech character(name they have some special chacter).
For example : If column from sharepoint name are czech republic character , them in json it is reflecting display name as "displayName":"Veronika Va\u0161t\u00edkov\u00e1"}} why it is coming as because it has first name + last name as Czech character(Veronika Vaštíková), because of this , we are getting below error message.

MPRINT(IMPORTBATCH.REFRESH): ;
MPRINT(IMPORTBATCH): ;
MPRINT(IMPORTBATCH): filename resp temp;
MPRINT(IMPORTBATCH): proc http url="https://graph.microsoft.com/v1.0/sites/abcdgroup.sharepoint.com:/sites/12abc:/drive"
oauth_bearer="." out = resp;
MPRINT(IMPORTBATCH): run;

NOTE: 401 Unauthorized
NOTE: PROCEDURE HTTP used (Total process time):
real time 0.18 seconds
cpu time 0.01 seconds

Tom · Posted 06-17-2024 09:12 AM

So why not just only use the UTF-8 sessions of SAS?

Why do you need (or want) WLATIN1 sessions if you have to deal with UTF-8 characters?

kumarsandip975 · Posted 06-17-2024 09:19 AM

well, they(end user) are getting data from mainframe, which default in wlatin nature.
they have some scheduler UAC tool, where you can command to get the data from mainframe with encoding=en/u8 parameter.

so, they have written one sas program with logic
1.) connect mainframe from sas studio server,

2.) get data as wlatin from mainframe to sas studio server.

3). then push same data/file to sharepoint.

when they tried in one go through UAC scheduler, with encoding U8, it ran fine, but issues with EN(waltin), as json is no able to convert Czech character.

kumarsandip975 · Posted 06-17-2024 09:23 AM

We have suggested to make two UAC jobs , first get data from mainframe with wlatin, and then push data to sharepoint with u8.

For them , two many jobs monitoring is bit challenging, but that I see workaround.

Tom · Posted 06-17-2024 09:34 AM

Reading data FROM a source that is in WLATIN1 into a SAS session that is using UTF-8 should not be a problem. What is the issue they are having with this?

Reading a dataset written with encoding=WLATIN1 into a dataset using encoding=UTF-8 should also not be a problem.

The only thing you need to guard against is that if your WLATIN1 source data has some non-7bit ASCII characters (those characters with accents or en-dash or stupid quotes for example) then you might need to make the target SAS dataset have longer character variables since some UTF-8 character take more than one byte of storage.

SAS have some tools to try and automatically adjust the length of character variables. Basically you give a mulitplication factor and it just expands every character variable by that factor.

Or you could use ENCODING=ANY dataset option on the input dataset and write your own logic using KCVT() and other functions to calculate the length you need for each variable based on the data that is actually in the dataset.

kumarsandip975 · Posted 06-17-2024 09:44 AM

I think some misunderstanding . Let me clear.

issue is not with data .

See when you connect to sharepoint from sas, it creates json file which contains the details of sharepoint , for example, urls, file details, last modified name, if person have name with special character (here Czech name), it creates like ""displayName":"Veronika Va\u0161t\u00edkov\u00e1"}}" , normally it should reflectlike this ""displayName":"Veronika Veronika Vaštíková"} in json file.

Once you get json file correctly, move towards sharepoint does really check your source data(which they have received from mainframe).

Conclusion - issues with json file, which has SharePoint sites data as I explained, not source data.

kumarsandip975 · Posted 06-17-2024 09:47 AM

For reference - https://blogs.sas.com/content/sasdummy/2020/07/09/sas-programming-office-365-onedrive/

Tom · Posted 06-17-2024 09:51 AM

You have lost me.

I thought you said it worked when using SAS session with UTF-8 encoding?

Is the UTF-8 SAS session the one what is writing the JSON file with the non-ANSI characters using the encoding? Or is it the WLATIN1 SAS session that is doing that? If the later why do you continue to use the WLATIN1 SAS sessions? Is there some other process that breaks when using SAS utf-8 sessions?

Czech republic letters example

Re: Czech republic letters example

Re: Czech republic letters example

Re: Czech republic letters example

Re: Czech republic letters example

Re: Czech republic letters example

Re: Czech republic letters example

Re: Czech republic letters example

Re: Czech republic letters example

Re: Czech republic letters example

Re: Czech republic letters example

Re: Czech republic letters example

Re: Czech republic letters example

The 2025 SAS Hackathon has begun!

SAS Training: Just a Click Away