BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Phil_NZ
Barite | Level 11

Hi SAS Users,

As we know Proc import causing quite a bit of noise. However, it is my way of importing excel files into SAS at the initial stage, afterwards, I can use libname to refer to the file.

And one of the issues is that the length and data types are inconsistent.

I am wondering if you can have a look to make the data consistent regarding Type and length in my code as below:

 

options compress=yes reuse=yes;
options mergenoby=error;
OPTIONS MPRINT;
/*options validvarname=any; in case transform to SAS Base*/
%macro ImportAndTranspose(
      File=
	  , cur=
	  , outf=
      , StartSheet=
      , EndSheet=
   );
   %local i;
   
   %do i = &StartSheet. %to &EndSheet.;

    %if &i=1 %then %do;
	
    proc import datafile= "&File." 
                  out= &outf.&i.
                  dbms= xlsx 
                  replace;
          range= "Sheet1$A:X";
          getnames= yes;
      run;
      
      proc sort data= &outf.&i.;
      by Type;
      run;
	%end;

	  %else %if &i=34 %then %do;
      proc import datafile= "&File." 
                  out= &outf.&i.
                  dbms= xlsx 
                  replace;
          range= "Sheet&i.$A:AG";
          getnames= yes;
      run;
      
      proc sort data= &outf.&i.;
      by Type;
      run;

      proc transpose data= &outf.&i. 
            out= &outf.&i._out(rename=(COL1=s&i. _NAME_=Year) drop=_label_);
         by Type;
         VAR '1988'N - '2019'N;
      run;
	  proc print data=&outf.&i._out;
	  run;

	  data &outf.&i._outx;
	  set &outf.&i._out;
      if s&i. in: ('NA', '$$', '') then s&i. = " ";/********/
	  run;

	  %end;
    
   
    %else %if (&i ne 1) and (&i ne 34) %then %do;
      proc import datafile= "&File." 
                  out= &outf.&i.
                  dbms= xlsx 
                  replace;
          range= "Sheet&i.$A:AG";
          getnames= yes;
      run;
      
      proc sort data= &outf.&i.;
      by Type;
      run;

      proc transpose data= &outf.&i. 
            out= &outf.&i._out(rename=(COL1=s&i. _NAME_=Year) drop=_label_);
         by Type;
         VAR '1988'N - '2019'N;
      run;
	  proc print data=&outf.&i._out;
	  run;

      data &outf.&i._outx;
      set &outf.&i._out;
	  if input(s&i., ?? 32.) = . then s&i.2=.;
      else s&i.2=input(s&i., 32.);
	 
	  drop s&i.;
	  rename s&i.2=s&i.;
      run;
	  %end;
	%end;
 /* Merging*/
%let outfm = %substr(&outf,1,%length(&outf)-5);
  data &outfm.merge1;
     merge
	  %do i =&StartSheet.+1 %to &EndSheet.;
	    &outf.&i._outx(keep= type year s&i.)
	  %end;
	;
	by type year;
  run;


 data &outfm.merge2;
     merge
	  &outf.&StartSheet.
	  &outfm.merge1 
	 ;
	 by type;
 run;

%mend;

/*Replicate all files in one folder*/

filename mydir 'C:\Users\pnguyen\Desktop\New folder';
data _null_;
did = dopen('mydir');
do i = 1 to dnum(did);
  fname = scan(dread(did,i),1,'.');
  /*filename:ARGENTINAARS*/
  length short_fn $29 currency $3 ;
  short_fn= cats(substr(fname, 1,length(fname)-3),'_');
  currency=substr(fname,length(fname)-2);

  cmd=cats('%ImportAndTranspose(File=C:\Users\pnguyen\Desktop\New folder\',
      strip(fname),
      ',cur=',currency,
      ',outf=',short_fn,'sheet,startsheet=1,endsheet=45);');

  call execute(cmd);
end;
keep fname;
run;

Many thanks and warmest regards.

Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.
1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Can you clarify the plan for combining these and the reason for the variability? 

Depending on the answers I would recommend different strategies of help in automating the combination.

 

If you truly have no idea what is in each xlsx file then using a LIBNAME with XLSX engine will let you get access to information on all of the sheets in the files.

libname src xlsx "&file";
proc contents data=src._all_ noprint out=contents;
run;

Note this works very well when the sheets are actually structured like a dataset.  Header row with varaible names. Data starts in second row. No "footers" or "titles". One type of value per column.  But if you are dealing with messy spreadsheets this will not work that well.

 

If you have a well defined structure for that data (even if there is some flexibility) then you can easily use the PROC CONTENTS output to check for things like wrong data type,  wrong length of character variables. duplicate variable names.  So that you could use it to generate the code to "fix" the issues so that the data can be combined.

 

For example your step that is trying to use PROC TRANSPOSE to get the YEAR value out of the column header (variable name) and into an actual variable.  That should work fine if the only variability in the sheets is how many "year" columns there are.  You could easily handle some sheets with years from 1988 to 2019 and others that have different range of years.

 

View solution in original post

8 REPLIES 8
Shmuel
Garnet | Level 18

If I understand correctly you are importing sheets from one excel file.

Is a sheet an expand of the other sheets (different variables except identifier(s))

or all sheets are of same format (same variables) ?

 

Is the target to concatenate/merge all imported sheets into one sas data set?

 

Supposing all sheets are of the same format, same variables, then copy the generated code from the log and adapt it to serve for all sheets by applying:

1) max length of character variables

2) appropriate informat and format of numeric variables depending on the specific variable.

3) wrap the code with a macro and execute it for all sheets in a loop, similar to what you have done.

 

Reeza
Super User
You need to "manually" check it after the import or convert to cSV and import from that.

By manually, I mean you have to check each variable type and format and do the conversions if necessary and name alignment. At one point in time I considered writing a macro that would convert everything based on a specification file, this would be a good use case for the development of something like that.
Phil_NZ
Barite | Level 11

Hi @Shmuel  and @Reeza 

 

Thank you for your helpful comments. I am wondering if I just import them by using proc import like that. And after all, I will concatenate them, before concatenating them, I will set max length (by using length statement) and play some tricks to make sure all the variables' types are consistent through all dataset (because after all, most of the variables are consistent in Type but inconsistent in length (just one or two useless variables inconsistent in Types).

 

Is it a reasonable solution?

 

Many thanks.

Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.
Shmuel
Garnet | Level 18

It seems to me that you are doing hard work again and again.

By adapting the generated code in advance you may save a lot of time in the future.

You just need to know your data and assign appropriate length, informat and format.

You even can add more code, may be, to eliminate some later steps of adapting the data set for you needs. 

Reeza
Super User
If the length was truncated in the import you may have lost information and not know it.
Kurt_Bremser
Super User

This is why I NEVER import Excel files directly. When I receive data in this format, I always save it to a csv file and import that with a simple data step. Since that step never changes, the datasets will have a consistent structure.

Tom
Super User Tom
Super User

Can you clarify the plan for combining these and the reason for the variability? 

Depending on the answers I would recommend different strategies of help in automating the combination.

 

If you truly have no idea what is in each xlsx file then using a LIBNAME with XLSX engine will let you get access to information on all of the sheets in the files.

libname src xlsx "&file";
proc contents data=src._all_ noprint out=contents;
run;

Note this works very well when the sheets are actually structured like a dataset.  Header row with varaible names. Data starts in second row. No "footers" or "titles". One type of value per column.  But if you are dealing with messy spreadsheets this will not work that well.

 

If you have a well defined structure for that data (even if there is some flexibility) then you can easily use the PROC CONTENTS output to check for things like wrong data type,  wrong length of character variables. duplicate variable names.  So that you could use it to generate the code to "fix" the issues so that the data can be combined.

 

For example your step that is trying to use PROC TRANSPOSE to get the YEAR value out of the column header (variable name) and into an actual variable.  That should work fine if the only variability in the sheets is how many "year" columns there are.  You could easily handle some sheets with years from 1988 to 2019 and others that have different range of years.

 

Phil_NZ
Barite | Level 11

Hi @Tom 

 

I intend to let the Type and Year consistent because other experts recommend that I should change from xlsx to csv type of data and then import afterwards or I should make the data being consistent regarding type of data and length of data that I can avoid a lot of issues afterwards.

 

When I tried to learn how to transform from csv file to xlsx file, I found some other posts but the code is quite complex to me so far...

 

And as you said, my data is like that, yes

Note this works very well when the sheets are actually structured like a dataset.  Header row with varaible names. Data starts in second row. No "footers" or "titles". One type of value per column.  But if you are dealing with messy spreadsheets this will not work that well.

 

 

 

 

Other experts also help me regarding spotting out the errors and else, I just try to optimize the code to make sure all length and charatcer type be consistent since importing because I did not understand the code to change from xlsx to csv

 

Best regards,

Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1512 views
  • 7 likes
  • 5 in conversation