Hello everyone,
I need to import a lot of excel files to SAS, but the file names are defined by year and state code (like CA, FL, TX,...) so I don't know how to set up a macro for proc import. An example of the excel filename I have is 98zpal.xlsx (98 is year 1998, zp stands for zipcode and al is state code of Alabama). I have data from 1998 to 2018, and each year, I have 51 files for 51 states (more than 1000 files)
I only know how to do macro where files are named with the numbers, so I really don't know how to do in this case.
Could you please help me out with this proc import?
Thank you so much in advance.
Best regards,
Windy.
Are these files in the same folder or multiple folders?
Following is an example of if all of them are in the same folder any building the list from the information: years, state abbreviations. Then use a data step to write a text file of the code. That way you can see what code is generated and test some without executing 1000's of import calls that fail.
data names; do st="AK","AL","AR","AZ","CA","CO","CT","DC","DE","FL","GA","HI","IA","ID","IL","IN","KS", "KY","LA","MA","MD","ME","MI","MN","MO","MS","MT","NC","ND","NE","NH","NJ","NM", "NV","NY","OH","OK","OR","PA","RI","SC","SD","TN","TX","UT","VA","VT","WA","WI","WV","WY"; do yr=1998 to 2018; year=put(mod(yr,100),z2.); filename=cats('<yourpath goes here>',year,'zp',st,'.xlsx'); dsname= cats('yourlib.',st,put(yr,4.),'zip'); output; end; end; run; data _null_; set names; file print; put 'Proc import datafile=' filename; put 'out=' dsname ; put 'dbms= xlsx replace;'; put 'run;'; run;
Some caveats: 1) Assumes all of the files are in one folder because you did not provide any information about folders which is critical to use proc import.
2) The statecodes above are uppercase. If running in a Unix /Linux environment where names are case sensitive you may need to make them lower case.
3) Replace the <yourpath goes here> with a valid path ending with the path separator character for your OS, \ or / as needed.
4) Create a library to keep these data sets before running the data _null_ and use that name in place of YOURLIB. Note that the dot needs to be there to create the proper libname.datasetname syntax.
5) As written the File Print will send the generated code to the results window so you can see if the proc import steps are written correctly. You could copy one or two from there into the Editor and submit to see if they work. If so, either save the entire set of code to a text file with the .sas extension and submit or modify the File statement to write to a text file directly and then %include it.
6) the created data set names start with the State to avoid all of the obnoxious name literals to deal with if you try to start with a number. The years are in 4 digits so they sort properly if needed and because in this day using 2 digit years is poor at best. See the hoop with the MOD function to get the 2 digits. If the year had been 4 digits in the file name such a step wouldn't have been needed.
7) don't come back complaining that when you attempt to combine any of these data sets that you have problems because variables don't have the same properties in all the sets and can't be combined. That is a result of spreadsheets not imposing any standard properties for columns.
The easiest situation to handle (IMO) would be if all of these Excel files were in a single folder, and had consistent naming. If that's the case, you can read all of the Excel files into SAS, parse the file names and you're done. There are examples of code to read all of the Excel files in the folder here in this forum.
But if the above conditions do not hold, then to write code would be much more complicated.
Are these files in the same folder or multiple folders?
Following is an example of if all of them are in the same folder any building the list from the information: years, state abbreviations. Then use a data step to write a text file of the code. That way you can see what code is generated and test some without executing 1000's of import calls that fail.
data names; do st="AK","AL","AR","AZ","CA","CO","CT","DC","DE","FL","GA","HI","IA","ID","IL","IN","KS", "KY","LA","MA","MD","ME","MI","MN","MO","MS","MT","NC","ND","NE","NH","NJ","NM", "NV","NY","OH","OK","OR","PA","RI","SC","SD","TN","TX","UT","VA","VT","WA","WI","WV","WY"; do yr=1998 to 2018; year=put(mod(yr,100),z2.); filename=cats('<yourpath goes here>',year,'zp',st,'.xlsx'); dsname= cats('yourlib.',st,put(yr,4.),'zip'); output; end; end; run; data _null_; set names; file print; put 'Proc import datafile=' filename; put 'out=' dsname ; put 'dbms= xlsx replace;'; put 'run;'; run;
Some caveats: 1) Assumes all of the files are in one folder because you did not provide any information about folders which is critical to use proc import.
2) The statecodes above are uppercase. If running in a Unix /Linux environment where names are case sensitive you may need to make them lower case.
3) Replace the <yourpath goes here> with a valid path ending with the path separator character for your OS, \ or / as needed.
4) Create a library to keep these data sets before running the data _null_ and use that name in place of YOURLIB. Note that the dot needs to be there to create the proper libname.datasetname syntax.
5) As written the File Print will send the generated code to the results window so you can see if the proc import steps are written correctly. You could copy one or two from there into the Editor and submit to see if they work. If so, either save the entire set of code to a text file with the .sas extension and submit or modify the File statement to write to a text file directly and then %include it.
6) the created data set names start with the State to avoid all of the obnoxious name literals to deal with if you try to start with a number. The years are in 4 digits so they sort properly if needed and because in this day using 2 digit years is poor at best. See the hoop with the MOD function to get the 2 digits. If the year had been 4 digits in the file name such a step wouldn't have been needed.
7) don't come back complaining that when you attempt to combine any of these data sets that you have problems because variables don't have the same properties in all the sets and can't be combined. That is a result of spreadsheets not imposing any standard properties for columns.
Hi @ballardw
Thanks for helping me out. I really appreciate this. I managed to change the year to 4 digits and put all files in the same folder. Thanks to your advice, I learn how to write codes into an external file for the first time.
There's a small problem with the proc import. The filename needs to be put in the double quotation "filename". Could you please help me one more step to fix it?
@windy wrote:
... The filename needs to be put in the double quotation "filename". Could you please help me one more step to fix it?
put 'Proc import datafile=' filename :$quote. ;
@windy wrote:
Hi @ballardw
Thanks for helping me out. I really appreciate this. I managed to change the year to 4 digits and put all files in the same folder. Thanks to your advice, I learn how to write codes into an external file for the first time.
There's a small problem with the proc import. The filename needs to be put in the double quotation "filename". Could you please help me one more step to fix it?
Sorry about that. I so seldom use Proc Import some things will slip. Either @Tom's solution or change
filename=cats('<yourpath goes here>',year,'zp',st,'.xlsx');
to
filename=Quote( cats('<yourpath goes here>',year,'zp',st,'.xlsx') );
Thank you so much for your help @ballardw
I got what I need now.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.