Hi SAS Unsers,
This morning when I try to adjust my code, I face a noticeable problem as below.
My code for running through all file in a directory is:
filename mydir 'C:\Users\pnguyen\Desktop\New folder';
data _null_;
did = dopen('mydir');
do i = 1 to dnum(did);
  fname = dread(did,i);
  /*fname: AustriaEUR*/
  length short_fn $29 currency $3 ;
  short_fn= cats(substr(fname, 1,length(fname)-3),'_');
  currency=substr(fname,length(fname)-2);
  cmd=cats('%ImportAndTranspose(File=C:\Users\pnguyen\Desktop\New folder\',
      strip(fname),
      ',cur=',currency,
      ',outf=',short_fn,'sheet,startsheet=1,endsheet=45);');
	  /*strip(fname) to link 
	 File=C:\Users\pnguyen\Desktop\New folder\AustriaEUR*/
        call execute(cmd);
end;
keep fname;
run;A part of my macro is
%macro ImportAndTranspose(
      File=
	  , cur=
	  , outf=
      , StartSheet=
      , EndSheet=
   );
   %local i;
   
   %do i = &StartSheet. %to &EndSheet.;
    %if &i=1 %then %do;
	
    proc import datafile= "&File." 
                  out= &outf.&i.
                  dbms= xlsx 
                  replace;
          range= "Sheet1$A:X";
          getnames= yes;
      run;
      
      proc sort data= &outf.&i.;
      by Type;
      run;
	%end;So, I test with one file named "AustriaEUR" in the folder "New Folder". I have the PROC MPRINT as below, which is quite strange to me
MPRINT(IMPORTANDTRANSPOSE):   proc import datafile= "C:\Users\pnguyen\Desktop\New folder\AustriaEUR.xlsx" out= AustriaEUR.x_sheet1 
dbms= xlsx replace;
MPRINT(IMPORTANDTRANSPOSE):   range= "Sheet1$A:X";
MPRINT(IMPORTANDTRANSPOSE):   getnames= yes;
MPRINT(IMPORTANDTRANSPOSE):   run;
MPRINT(IMPORTANDTRANSPOSE):   proc sort data= AustriaEUR.x_sheet1;
MPRINT(IMPORTANDTRANSPOSE):   by Type;
MPRINT(IMPORTANDTRANSPOSE):   run;The MPRINT as above means that outf = AustriaEUR.x_sheet
However, if we substitute to the initial code, we have:
fname=AustriaEUR
then
short_fn= cats(substr(fname, 1,length(fname)-3),'_');So short_fn=Austria_
And
',outf=',short_fn,'sheetmeans that outf=Austria_sheet which is much different from the one shown from OPTION MPRINT (outf = AustriaEUR.x_sheet)
Can you please tell me where I did wrongly.
Warm regards!
@Phil_NZ wrote:
But why don't you put any delimiter in the condition of scan function? In case we don't put the delimiter, what is the default one?
And I am looking for a further comprehensive solution because other than Austria, I have other files named with two words like United Kingdom or United States (you can see the blank in the middle of the name). And in these cases, it seems that the scan function can not work properly.
In this case, of course, you must specify the delimiter because the list of default delimiters includes the blank:
fname = scan(dread(did,i),1,'.');
And, yes, I should have specified it in the first place to provide more robust code. Sorry for the omission.
One way to debug this is to change data _null_; to data null; and this creates a SAS data set named NULL. Then you can look into this data set and see if the text strings are being created properly.
Hi @PaigeMiller !
I replaced
data _null_;by
data null;However, there is no output popping up so far with some new warnings
WARNING: The data set WORK.NULL may be incomplete.  When this step was stopped there were 1 observations and 1 variables.
NOTE: Compressing data set WORK.NULL decreased size by 0.00 percent. 
      Compressed is 1 pages; un-compressed would require 1 pages.
WARNING: Data set WORK.NULL was not replaced because this step was stopped.Thank you!
@Phil_NZ wrote:
Hi @PaigeMiller !
I replaced
data _null_;by
data null;However, there is no output popping up so far with some new warnings
WARNING: The data set WORK.NULL may be incomplete. When this step was stopped there were 1 observations and 1 variables. NOTE: Compressing data set WORK.NULL decreased size by 0.00 percent. Compressed is 1 pages; un-compressed would require 1 pages. WARNING: Data set WORK.NULL was not replaced because this step was stopped.Thank you!
Although you have an answer, I point out that there is not supposed to be "output popping up" here. The point here was for you to LOOK AT (with your own eyes) the data set created called NULL to see if the text strings were being properly created. This is a valuable debugging method when you are creating strings to be placed into a CALL EXECUTE.
Thank you @PaigeMiller , it is a valuable lesson for me, thank you!
And is there any document talking about the comparison between NULL and _NULL_, can I ask? I google it but no result so far!
Many thanks and warm regards!
data _null_; does not create a data set you can look at. This is fine if you have worked out the bugs in your program and you don't need an actual data set.
data null; (or any valid data set name except _null_) creates a data set you can look at.
Hi @PaigeMiller
Thank you for your dedicated answer, I really appreciate it. From people like you, I believe that there are still a lot of great people all over the world, motivating me to be the next great person!
Many thanks and warmest regards.
Hi @Phil_NZ,
The DREAD function returns the filename including the suffix, probably .xlsx, so your attempt to cut off "EUR" did in fact cut off "lsx".
Try if
fname = scan(dread(did,i),1);
resolves the issue.
It looks like that you are right, the MPRINT now is more reasonable
MPRINT(IMPORTANDTRANSPOSE):   proc import datafile= "C:\Users\pnguyen\Desktop\New folder\AustriaEUR" out= Austria_sheet1 dbms= xlsx 
replace;Many thanks for your solution.
But why don't you put any delimiter in the condition of scan function? In case we don't put the delimiter, what is the default one?
And I am looking for a further comprehensive solution because other than Austria, I have other files named with two words like United Kingdom or United States (you can see the blank in the middle of the name). And in these cases, it seems that the scan function can not work properly.
Many thanks and warm regards.
@Phil_NZ wrote:
But why don't you put any delimiter in the condition of scan function? In case we don't put the delimiter, what is the default one?
And I am looking for a further comprehensive solution because other than Austria, I have other files named with two words like United Kingdom or United States (you can see the blank in the middle of the name). And in these cases, it seems that the scan function can not work properly.
In this case, of course, you must specify the delimiter because the list of default delimiters includes the blank:
fname = scan(dread(did,i),1,'.');
And, yes, I should have specified it in the first place to provide more robust code. Sorry for the omission.
And apart from that, it is interesting to me, because when the filename is Austria_
when I run the code
filename mydir 'C:\Users\pnguyen\Desktop\New folder';
data _null_;
did = dopen('mydir');
do i = 1 to dnum(did);
  fname = dread(did,i);
  short_fn= cats(substr(fname, 1,3),'_');
  cmd=cats('%ImportAndTranspose(File=C:\Users\pnguyen\Desktop\New folder\',
      strip(fname),',outf=',short_fn,'sheet,startsheet=1,endsheet=45);');
  call execute(cmd);
end;
keep fname;
run;with the macro
%macro ImportAndTranspose(
      File=
	  , outf=
      , StartSheet=
      , EndSheet=
   );
   %local i;
   
   %do i = &StartSheet. %to &EndSheet.;
    %if &i=1 %then %do;
	
    proc import datafile= "&File." 
                  out= &outf.&i.
                  dbms= xlsx 
                  replace;
          range= "Sheet1$A:X";
          getnames= yes;
      run;
      
      proc sort data= &outf.&i.;
      by Type;
      run;
	%end;The outf seems not to include the suffix xlsx because it works smoothly
MPRINT(IMPORTANDTRANSPOSE):   proc import datafile= "C:\Users\pnguyen\Desktop\New folder\Austria_.xlsx" out= Aus_sheet1 dbms= xlsx 
replace;It is quite an interesting part to me when I compare this workable code and the problematic code in this post that you explain because of the existence of the suffix.
I am wondering if there is any reasonable explanation for this difference?
@Phil_NZ wrote:
And apart from that, it is interesting to me, because when the filename is Austria_
when I run the code
filename mydir 'C:\Users\pnguyen\Desktop\New folder'; data _null_; did = dopen('mydir'); do i = 1 to dnum(did); fname = dread(did,i); short_fn= cats(substr(fname, 1,3),'_'); ...The outf seems not to include the suffix xlsx because it works smoothly (...)
I am wondering if there is any reasonable explanation for this difference?
Well, the above code is different in that it cuts the value of fname after the first three characters rather than before the last three. That way the suffix isn't involved.
Thank you for your straight answer, one last further question is
In case I want to have EUR, I can still keep the same code without changing anything, am I correct?
currency=substr(fname,length(fname)-2);TIA.
@Phil_NZ wrote:
In case I want to have EUR, I can still keep the same code without changing anything, am I correct?
currency=substr(fname,length(fname)-2);
Yes, once the suffix has been removed (by the SCAN function), fname ends in the currency code and the above statement will retrieve that code. Of course, this assumes that the currency code consists of three characters and is not followed by other name characters (e.g. an underscore).
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
