DATA Step, Macro, Functions and more

Import multiple CSV's

Reply
Frequent Contributor
Posts: 140

Import multiple CSV's

Hi, I need to import multiple CSV''s. I have 22 folders named 201409 201412 201502 201503 etc, and within each folder there are 2 x csv's named like this (but year and month will differ)

Pool_Code_C_201409

Pool_Code_S_201409

 

Both csv's have the same variables - Mort, acc, sys, leg, pool

 

Can you please advise how I would go about importing these in with the condition - where pool ne blank

 

I want to create an individual dataset for each month passed in and NOT combine them togther.

 

Thanks

Trusted Advisor
Posts: 1,400

Re: Import multiple CSV's

You need few steps:

1) create a code to run on just one specific file and check it to run as expected

2) enclose that program into a macro program geting the input file name as 1st argument

    and libref+dataset name of output as 2nd argument -
    adapting the program code to use the arguments as macro variables, instead the fixed names.

3) create a datastep to create 22*2 macro variables as combination of:

         file_name = cat(path,folder,filename,suffix);

    you can do it in a loop using arrays of: folders and of  file prefixes

4) create a macro program to loop across the 44 filenames in a loop

     calling previous macro, each time with appropriate arguments

     to create desired output.

 

After having your code, in case of having still difficulties come back to the forum.

    

Frequent Contributor
Posts: 140

Re: Import multiple CSV's

[ Edited ]

I haven't tried to do this before so I have pieced this togther from Google.

 

I tried this but doesn't work -

 

%Macro ReadCSV (infile , outfile );

PROC IMPORT OUT= &ofile

DATAFILE= &infile

DBMS=CSV REPLACE;

GETNAMES=YES;

RUN;

%Mend ReadCSV;

%ReadCSV("O:Rep/Mort/EnData/201609/Pool_Code_C_201609.csv", work.ofile1);

%ReadCSV("O:Rep/Mort/EnData/201609/Pool_Code_S_201609.csv", work.ofile);

%ReadCSV("O:Rep/Mort/EnData/201608/Pool_Code_C_201608.csv", work.ofile1);

%ReadCSV("O:Rep/Mort/EnData/201608/Pool_Code_S_201608.csv", work.ofile);

 

I get this error messgage -

 

MPRINT(READCSV): PROC IMPORT OUT= &ofile DATAFILE= "O:Rep/Mort/EnData/201608/Pool_Code_S_201608.csv" DBMS=CSV REPLACE;

NOTE: The SAS System stopped processing this step because of errors.

MPRINT(READCSV): GETNAMES=YES;

MPRINT(READCSV): DATAROW=2;

MPRINT(READCSV): RUN;

MLOGIC(READCSV): Ending execution.

ERROR 22-322: Expecting a name.

ERROR 200-322: The symbol is not recognized and will be ignored.

28

29 GOPTIONS NOACCESSIBLE;

30 %LET _CLIENTTASKLABEL=;

31 %LET _CLIENTPROJECTPATH=;

32 %LET _CLIENTPROJECTNAME=;

33 %LET _SASPROGRAMFILE=;

34

35 ;*';*";*/;quit;run;

36 ODS _ALL_ CLOSE;

37

38

39 QUIT; RUN;

40

 

Super User
Super User
Posts: 7,413

Re: Import multiple CSV's

In your code, you use a macro variable &ofile. - this is never described, do you mean outfile?  Also couple of tips, don't write code all in upper case, it is offensive to the eye.  Finish your macro variables with a full stop, they highlight nicer in the editor.

As for your problem, simplest ideda would be to do an OS dir command into a dataset, then call execute() to either call your import code or exeucte the code:

data _null_;
  set dirlist;
  call execute(cats('%Imp (infile=',file,',outfile=',outfile,');'));
run;

Leaving now so can't elaborate more.

Trusted Advisor
Posts: 1,400

Re: Import multiple CSV's

in your macro program change a line,

  from: PROC IMPORT OUT= &ofile

   to     PROC IMPORT OUT= &outfile

 

so you have steps 1 and 2 from my previous post.

 

Sometimes it is easier to duplicate lines and make light changes instead

developing a mcro with a loop, and it is OK from my point of view.

Super User
Posts: 10,538

Re: Import multiple CSV's

Please note @Shmuel's instruction about the DATASTEP.

If you use multiple Proc import statements you will almost certainly get text variables of different lengths and depending upon your actual data could have some variables change from numeric to text between files. For instance a Zip code that is all 5 digit values in one file may be imported as numeric but the next file with some Zip+4 codes, with a - in the middle, will be read as a 10 character text field.

So when you go to process the data or combine files you end up with incompatible data. Date fields may also be problematic depending on what program build your CSV files. I have some I deal with that create quoted date fields like "10/16/2016" that proc import treats as character, informat $13.

 

Also you look at the data step generated by proc import to see if the data types match your expectations and if the text variables are long enough to hold the longest expected value.

Then in the data step you have something like:

 

data setname;

    infile "filepath\name.csv" <infile options>.

<rest of data step>.

 

Once you have a datastep that works you use that in your macro and have the SETNAME and file information as the macro variables.

Super User
Posts: 5,092

Re: Import multiple CSV's

Brandon,

 

While the comments you have received are on the money, I'm not sure you know what to do at this point.  Here are some "next steps".

 

First, fix the macro.  Wherever it refers to &ofile, change that to &outfile. 

 

Second, test it.  Make sure you know:

 

  • What is its output?  Does it contain what you need, or are further changes required?
  • If the output file already exists when the macro runs, what will happen to it?  (error message? replace? append?)

That won't form a full solution, but it will at least let you participate in some of the more complex steps going forward.

Super User
Posts: 9,687

Re: Import multiple CSV's

Firstly, use OS command to get these file list , and then call execute() or data step to import them.
Suppose those 22 folders are all under TEMP folder.

filename xx pipe 'dir c:\temp\*.csv  /s /b';
data _null_;
 infile xx;
input;
put _infile_;
call execute();
run;

Occasional Contributor
Posts: 8

Re: Import multiple CSV's

This could do the trick - basefolder is the root folder for the folders containing the csv files.

%macro importcsv(startyear=,endyear=,basefolder=);
   %local year month;
   %do year=&startyear %to &endyear;
      %do month=1 %to 12;
         %if &month<10 %then /* fix 01 02 etc */
            %let month=0&month;
            proc import out=sas_&year&month.(where=(not missing(pool))) 
               datafile="&basefolder.Pool_Code_C_&year&month..csv"
               dbms=csv replace;
               getnames=yes;
            run;
            proc import out=sas_&year&month.(where=(not missing(pool))) 
               datafile="&basefolder.Pool_Code_S_&year&month..csv"
               dbms=csv replace;
               getnames=yes;
            run;
      %end;
   %end;
%mend;
options mprint;
%importcsv(startyear=2014,endyear=2016,basefolder=h:\csvfiles\);
Ask a Question
Discussion stats
  • 8 replies
  • 366 views
  • 0 likes
  • 7 in conversation