11-01-2016 11:05 AM
Hi, I need to import multiple CSV''s. I have 22 folders named 201409 201412 201502 201503 etc, and within each folder there are 2 x csv's named like this (but year and month will differ)
Both csv's have the same variables - Mort, acc, sys, leg, pool
Can you please advise how I would go about importing these in with the condition - where pool ne blank
I want to create an individual dataset for each month passed in and NOT combine them togther.
11-01-2016 11:57 AM
You need few steps:
1) create a code to run on just one specific file and check it to run as expected
2) enclose that program into a macro program geting the input file name as 1st argument
and libref+dataset name of output as 2nd argument -
adapting the program code to use the arguments as macro variables, instead the fixed names.
3) create a datastep to create 22*2 macro variables as combination of:
file_name = cat(path,folder,filename,suffix);
you can do it in a loop using arrays of: folders and of file prefixes
4) create a macro program to loop across the 44 filenames in a loop
calling previous macro, each time with appropriate arguments
to create desired output.
After having your code, in case of having still difficulties come back to the forum.
11-01-2016 12:00 PM - edited 11-01-2016 12:04 PM
I haven't tried to do this before so I have pieced this togther from Google.
I tried this but doesn't work -
%Macro ReadCSV (infile , outfile );
PROC IMPORT OUT= &ofile
I get this error messgage -
MPRINT(READCSV): PROC IMPORT OUT= &ofile DATAFILE= "O:Rep/Mort/EnData/201608/Pool_Code_S_201608.csv" DBMS=CSV REPLACE;
NOTE: The SAS System stopped processing this step because of errors.
MLOGIC(READCSV): Ending execution.
ERROR 22-322: Expecting a name.
ERROR 200-322: The symbol is not recognized and will be ignored.
29 GOPTIONS NOACCESSIBLE;
30 %LET _CLIENTTASKLABEL=;
31 %LET _CLIENTPROJECTPATH=;
32 %LET _CLIENTPROJECTNAME=;
33 %LET _SASPROGRAMFILE=;
36 ODS _ALL_ CLOSE;
39 QUIT; RUN;
11-01-2016 12:14 PM
In your code, you use a macro variable &ofile. - this is never described, do you mean outfile? Also couple of tips, don't write code all in upper case, it is offensive to the eye. Finish your macro variables with a full stop, they highlight nicer in the editor.
As for your problem, simplest ideda would be to do an OS dir command into a dataset, then call execute() to either call your import code or exeucte the code:
data _null_; set dirlist; call execute(cats('%Imp (infile=',file,',outfile=',outfile,');')); run;
Leaving now so can't elaborate more.
11-01-2016 12:39 PM
in your macro program change a line,
from: PROC IMPORT OUT= &ofile
to PROC IMPORT OUT= &outfile
so you have steps 1 and 2 from my previous post.
Sometimes it is easier to duplicate lines and make light changes instead
developing a mcro with a loop, and it is OK from my point of view.
11-01-2016 04:39 PM
Please note @Shmuel's instruction about the DATASTEP.
If you use multiple Proc import statements you will almost certainly get text variables of different lengths and depending upon your actual data could have some variables change from numeric to text between files. For instance a Zip code that is all 5 digit values in one file may be imported as numeric but the next file with some Zip+4 codes, with a - in the middle, will be read as a 10 character text field.
So when you go to process the data or combine files you end up with incompatible data. Date fields may also be problematic depending on what program build your CSV files. I have some I deal with that create quoted date fields like "10/16/2016" that proc import treats as character, informat $13.
Also you look at the data step generated by proc import to see if the data types match your expectations and if the text variables are long enough to hold the longest expected value.
Then in the data step you have something like:
infile "filepath\name.csv" <infile options>.
<rest of data step>.
Once you have a datastep that works you use that in your macro and have the SETNAME and file information as the macro variables.
11-01-2016 01:46 PM
While the comments you have received are on the money, I'm not sure you know what to do at this point. Here are some "next steps".
First, fix the macro. Wherever it refers to &ofile, change that to &outfile.
Second, test it. Make sure you know:
That won't form a full solution, but it will at least let you participate in some of the more complex steps going forward.
11-01-2016 10:46 PM
Firstly, use OS command to get these file list , and then call execute() or data step to import them. Suppose those 22 folders are all under TEMP folder. filename xx pipe 'dir c:\temp\*.csv /s /b'; data _null_; infile xx; input; put _infile_; call execute(); run;
11-02-2016 07:35 AM
This could do the trick - basefolder is the root folder for the folders containing the csv files.
%macro importcsv(startyear=,endyear=,basefolder=); %local year month; %do year=&startyear %to &endyear; %do month=1 %to 12; %if &month<10 %then /* fix 01 02 etc */ %let month=0&month; proc import out=sas_&year&month.(where=(not missing(pool))) datafile="&basefolder.Pool_Code_C_&year&month..csv" dbms=csv replace; getnames=yes; run; proc import out=sas_&year&month.(where=(not missing(pool))) datafile="&basefolder.Pool_Code_S_&year&month..csv" dbms=csv replace; getnames=yes; run; %end; %end; %mend; options mprint; %importcsv(startyear=2014,endyear=2016,basefolder=h:\csvfiles\);