BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SASdevAnneMarie
Barite | Level 11

Hello Experts,

 

I woud like to import the attached file, but I don't know how to choose the encoding option. When I try to convert this file to csv I have the same encoding problem. My code is : 

filename mon_f temp;

	data _null_;
		infile "XXXXX\data_act.xlsx" dsd truncover lrecl=32767 encoding=wlatin1;
		file mon_f lrecl=1000000;
		input ligne :$char32767.;

		if notdigit(scan(ligne,1,',')) and length(scan(ligne,1,','))>1 then
			put '|' @;
		else if length(scan(ligne,1,','))>1 then
			do;
				put;
			end;

		len=lengthn(ligne);
		put ligne $varying32767. len @;
	run;

	proc import datafile=mon_f dbms=csv out=donnee replace;
		getnames=no;
	run;

 

Thank you very much ! 

1 ACCEPTED SOLUTION

Accepted Solutions
japelin
Rhodochrosite | Level 12

Files with the extension xlsx are binary files, so I think it is difficult to process them with the file statement.

I tried the following code, and it worked somewhat.

I checked it with SAS9.4(Unicode support).

 

 

filename mon_f temp encoding='utf-8';
libname x xlsx "XXXXX\data_act.xlsx";

data _null_;
  set x.TWEET_ACTIVITY_METRICS_AG2RLMP_;
  file mon_f lrecl=1000000;

  if notdigit(scan(Identifiant_du_Tweet__Permalien,1,',')) and length(scan(Identifiant_du_Tweet__Permalien,1,','))>1 then
    put '|' @;
    else if length(scan(Identifiant_du_Tweet__Permalien,1,','))>1 then
    do; 
      put;
	end;
    len=lengthn(Identifiant_du_Tweet__Permalien);
	put Identifiant_du_Tweet__Permalien $varying32767. len @;
run;

proc import datafile=mon_f dbms=csv out=donnee replace;
	getnames=no;
run;
libname x;

 

 

If you can handle the newline in the red frame correctly, you may be able to import the file using proc import.

2021-02-15_18h19_09.png

 

The SAS dataset display is garbled, but I think that's just a problem of not having a font in my SAS that can display emoji.

2021-02-15_18h25_12.png

View solution in original post

9 REPLIES 9
japelin
Rhodochrosite | Level 12

Files with the extension xlsx are binary files, so I think it is difficult to process them with the file statement.

I tried the following code, and it worked somewhat.

I checked it with SAS9.4(Unicode support).

 

 

filename mon_f temp encoding='utf-8';
libname x xlsx "XXXXX\data_act.xlsx";

data _null_;
  set x.TWEET_ACTIVITY_METRICS_AG2RLMP_;
  file mon_f lrecl=1000000;

  if notdigit(scan(Identifiant_du_Tweet__Permalien,1,',')) and length(scan(Identifiant_du_Tweet__Permalien,1,','))>1 then
    put '|' @;
    else if length(scan(Identifiant_du_Tweet__Permalien,1,','))>1 then
    do; 
      put;
	end;
    len=lengthn(Identifiant_du_Tweet__Permalien);
	put Identifiant_du_Tweet__Permalien $varying32767. len @;
run;

proc import datafile=mon_f dbms=csv out=donnee replace;
	getnames=no;
run;
libname x;

 

 

If you can handle the newline in the red frame correctly, you may be able to import the file using proc import.

2021-02-15_18h19_09.png

 

The SAS dataset display is garbled, but I think that's just a problem of not having a font in my SAS that can display emoji.

2021-02-15_18h25_12.png

SASdevAnneMarie
Barite | Level 11
Thank you, Kawakami
japelin
Rhodochrosite | Level 12

If you get garbled characters, try using wlatin1.

 

filename mon_f temp encoding='wlatin1';

japelin
Rhodochrosite | Level 12

Please try this code.

I can get Emoji.

 

filename mon_f 'temp.dat' encoding='wlatin1';
libname x xlsx "C:\Users\xyzab\Downloads\data_act.xlsx";

data _null_;
  set x.TWEET_ACTIVITY_METRICS_AG2RLMP_;
  file mon_f lrecl=1000000;

  if notdigit(scan(Identifiant_du_Tweet__Permalien,1,',')) and length(scan(Identifiant_du_Tweet__Permalien,1,','))>1 then
    put '|' @;
    else if length(scan(Identifiant_du_Tweet__Permalien,1,','))>1 then
    do; 
      put;
	end;
    len=lengthn(Identifiant_du_Tweet__Permalien);
	put Identifiant_du_Tweet__Permalien $varying32767. len @;
run;
filename mon_f;
filename mon_f 'temp.dat' encoding='utf-8' nobom;
proc import datafile=mon_f dbms=csv out=donnee replace;
	getnames=no;
run;
filename mon_f;
libname x;

2021-02-16_10h11_35.png

 

SASdevAnneMarie
Barite | Level 11
Thank you very much Kawakami ! For importing the emoji, do you use firstly the wlatin1 and after the utf-8 ?

I don't have the authorization for temp.dat, could I use another .dat ? NOTE: Variable Identifiant_du_Tweet__Permalien is uninitialized.
ERROR: Insufficient authorization to access C:\Applications\SAS\SASConfig\Lev1\SASApp\temp.dat.
NOTE: The SAS System stopped processing this step because of errors.

Thank you very much !
japelin
Rhodochrosite | Level 12

please try this code.

I think the Excel file was created on a PC with wlatin1 character encoding.
So, the encoding of the first filename needs to be wlatin1.
The encoding of the next filename needs to be utf-8 so that emoji can be imported into SAS.

You can use any file name you want, even temp.dat.
The following code will create the file in the work library.

 

filename mon_f "%sysfunc(pathname(work))\temp.dat" encoding='wlatin1';
libname x xlsx "XXXXX\data_act.xlsx";

data _null_;
  set x.TWEET_ACTIVITY_METRICS_AG2RLMP_;
  file mon_f lrecl=1000000;

  if notdigit(scan(Identifiant_du_Tweet__Permalien,1,',')) and length(scan(Identifiant_du_Tweet__Permalien,1,','))>1 then
    put '|' @;
    else if length(scan(Identifiant_du_Tweet__Permalien,1,','))>1 then
    do; 
      put;
	end;
    len=lengthn(Identifiant_du_Tweet__Permalien);
	put Identifiant_du_Tweet__Permalien $varying32767. len @;
run;
filename mon_f;
filename mon_f "%sysfunc(pathname(work))\temp.dat" encoding='utf-8' nobom;
proc import datafile=mon_f dbms=csv out=donnee replace;
	getnames=no;
run;
filename mon_f;
libname x;
SASdevAnneMarie
Barite | Level 11

Hello Kawakami,

 

Thank you very much ! Unfortunately, I have this error :

MarieT_0-1613734637935.pngMarieT_1-1613734666049.png

I can't get the expected result 😞

Kurt_Bremser
Super User

Maxim 3: Know Your Data.

Your variable is numeric, so you can't use a character format with it.

Please do ALWAYS post the COMPLETE log of your steps, and do ALWAYS COPY/PASTE the text into a window opened with the </> button. Do NOT post logs as pictures.

japelin
Rhodochrosite | Level 12

Are you running the SAS nls (National Language Support/Local language) version?

Can you run SAS by "SAS(Unicode support)" short cut?

The Excel file you are trying to work with contains unicode strings, so you should need SAS that can handle unicode, i.e. SAS unicode support.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 4757 views
  • 3 likes
  • 3 in conversation