BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
windy
Quartz | Level 8

Hello everyone, 

 

I need to import a lot of excel files to SAS, but the file names are defined by year and state code (like CA, FL, TX,...) so I don't know how to set up a macro for proc import. An example of the excel filename I have is 98zpal.xlsx (98 is year 1998, zp stands for zipcode and al is state code of Alabama). I have data from 1998 to 2018, and each year, I have 51 files for 51 states (more than 1000 files)

I only know how to do macro where files are named with the numbers, so I really don't know how to do in this case.

Could you please help me out with this proc import?

 

Thank you so much in advance.

Best regards, 

Windy.

 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Are these files in the same folder or multiple folders?

 

Following is an example of if all of them are in the same folder any building the list from the information: years, state abbreviations. Then use a data step to write a text file of the code. That way you can see what code is generated and test some without executing 1000's of import calls that fail.

data names;
 do st="AK","AL","AR","AZ","CA","CO","CT","DC","DE","FL","GA","HI","IA","ID","IL","IN","KS",
       "KY","LA","MA","MD","ME","MI","MN","MO","MS","MT","NC","ND","NE","NH","NJ","NM",
       "NV","NY","OH","OK","OR","PA","RI","SC","SD","TN","TX","UT","VA","VT","WA","WI","WV","WY";

      do yr=1998 to 2018;
         year=put(mod(yr,100),z2.);
         filename=cats('<yourpath goes here>',year,'zp',st,'.xlsx');
         dsname= cats('yourlib.',st,put(yr,4.),'zip');
         output;
      end;
 end;
run;

data _null_;
   set names;
   file print;
   put 'Proc import datafile=' filename;
   put 'out=' dsname ;
   put 'dbms= xlsx replace;';
   put 'run;';
run;

Some caveats: 1) Assumes all of the files are in one folder because you did not provide any information about folders which is critical to use proc import.

2) The statecodes above are uppercase. If running in a Unix /Linux environment where names are case sensitive you may need to make them lower case.

3) Replace the <yourpath goes here> with a valid path ending with the path separator character for your OS, \ or / as needed.

4) Create a library to keep these data sets before running the data _null_ and use that name in place of YOURLIB. Note that the dot needs to be there to create the proper libname.datasetname syntax.

5) As written the File Print will send the generated code to the results window so you can see if the proc import steps are written correctly. You could copy one or two from there into the Editor and submit to see if they work. If so, either save the entire set of code to a text file with the .sas extension and submit or modify the File statement to write to a text file directly and then %include it.

6) the created data set names start with the State to avoid all of the obnoxious name literals to deal with if you try to start with a number. The years are in 4 digits so they sort properly if needed and because in this day using 2 digit years is poor at best. See the hoop with the MOD function to get the 2 digits. If the year had been 4 digits in the file name such a step wouldn't have been needed.

 

7) don't come back complaining that when you attempt to combine any of these data sets that you have problems because variables don't have the same properties in all the sets and can't be combined. That is a result of spreadsheets not imposing any standard properties for columns.

 

View solution in original post

7 REPLIES 7
PaigeMiller
Diamond | Level 26

The easiest situation to handle (IMO) would be if all of these Excel files were in a single folder, and had consistent naming. If that's the case, you can read all of the Excel files into SAS, parse the file names and you're done. There are examples of code to read all of the Excel files in the folder here in this forum.

 

But if the above conditions do not hold, then to write code would be much more complicated.

--
Paige Miller
ballardw
Super User

Are these files in the same folder or multiple folders?

 

Following is an example of if all of them are in the same folder any building the list from the information: years, state abbreviations. Then use a data step to write a text file of the code. That way you can see what code is generated and test some without executing 1000's of import calls that fail.

data names;
 do st="AK","AL","AR","AZ","CA","CO","CT","DC","DE","FL","GA","HI","IA","ID","IL","IN","KS",
       "KY","LA","MA","MD","ME","MI","MN","MO","MS","MT","NC","ND","NE","NH","NJ","NM",
       "NV","NY","OH","OK","OR","PA","RI","SC","SD","TN","TX","UT","VA","VT","WA","WI","WV","WY";

      do yr=1998 to 2018;
         year=put(mod(yr,100),z2.);
         filename=cats('<yourpath goes here>',year,'zp',st,'.xlsx');
         dsname= cats('yourlib.',st,put(yr,4.),'zip');
         output;
      end;
 end;
run;

data _null_;
   set names;
   file print;
   put 'Proc import datafile=' filename;
   put 'out=' dsname ;
   put 'dbms= xlsx replace;';
   put 'run;';
run;

Some caveats: 1) Assumes all of the files are in one folder because you did not provide any information about folders which is critical to use proc import.

2) The statecodes above are uppercase. If running in a Unix /Linux environment where names are case sensitive you may need to make them lower case.

3) Replace the <yourpath goes here> with a valid path ending with the path separator character for your OS, \ or / as needed.

4) Create a library to keep these data sets before running the data _null_ and use that name in place of YOURLIB. Note that the dot needs to be there to create the proper libname.datasetname syntax.

5) As written the File Print will send the generated code to the results window so you can see if the proc import steps are written correctly. You could copy one or two from there into the Editor and submit to see if they work. If so, either save the entire set of code to a text file with the .sas extension and submit or modify the File statement to write to a text file directly and then %include it.

6) the created data set names start with the State to avoid all of the obnoxious name literals to deal with if you try to start with a number. The years are in 4 digits so they sort properly if needed and because in this day using 2 digit years is poor at best. See the hoop with the MOD function to get the 2 digits. If the year had been 4 digits in the file name such a step wouldn't have been needed.

 

7) don't come back complaining that when you attempt to combine any of these data sets that you have problems because variables don't have the same properties in all the sets and can't be combined. That is a result of spreadsheets not imposing any standard properties for columns.

 

windy
Quartz | Level 8

Hi @ballardw

Thanks for helping me out. I really appreciate this. I managed to change the year to 4 digits and put all files in the same folder. Thanks to your advice, I learn how to write codes into an external file for the first time.

There's a small problem with the proc import. The filename needs to be put in the double quotation "filename". Could you please help me one more step to fix it?

 

Tom
Super User Tom
Super User

@windy wrote:

... The filename needs to be put in the double quotation "filename". Could you please help me one more step to fix it?

 


   put 'Proc import datafile=' filename :$quote. ;
ballardw
Super User

@windy wrote:

Hi @ballardw

Thanks for helping me out. I really appreciate this. I managed to change the year to 4 digits and put all files in the same folder. Thanks to your advice, I learn how to write codes into an external file for the first time.

There's a small problem with the proc import. The filename needs to be put in the double quotation "filename". Could you please help me one more step to fix it?

 


Sorry about that. I so seldom use Proc Import some things will slip. Either @Tom's solution or change

  filename=cats('<yourpath goes here>',year,'zp',st,'.xlsx');

 to

  filename=Quote( cats('<yourpath goes here>',year,'zp',st,'.xlsx') );
windy
Quartz | Level 8

Thank you so much for your help @ballardw 

I got what I need now. 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1099 views
  • 4 likes
  • 4 in conversation