Append CSV files

Fox01 · Posted 09-13-2021 09:01 AM

Hello,

I want to load csv files into sas and append them , but the issue is that the number of files is not fixe ( it’s could be 2, 3 …).

So to load them I’m using this macro :

%macro drive(dir,ext);

%local cnt filrf rc did memcnt name;

%let cnt=0;

%let filrf=mydir;

%let rc=%sysfunc(filename(filrf,&dir));

%let did=%sysfunc(dopen(&filrf));

%if &did ne 0 %then %do;

%let memcnt=%sysfunc(dnum(&did));

%do i=1 %to &memcnt;

%let name=%qscan(%qsysfunc(dread(&did,&i)),-1,.);

%if %qupcase(%qsysfunc(dread(&did,&i))) ne %qupcase(&name) %then %do;

%if %superq(ext) = %superq(name) %then %do;

%let cnt=%eval(&cnt+1);

%put %qsysfunc(dread(&did,&i));

proc import datafile="&dir\%qsysfunc(dread(&did,&i))" out=casuser.tbl_&cnt

dbms=csv replace;

run;

%end;

%else %put &dir cannot be open.;

%let rc=%sysfunc(dclose(&did));

%mend drive;

%drive(c:\fldr,csv)

But I don’t know where I should add the append in this code

Any help please ?

Thank you

Tom · Posted 09-13-2021 09:29 AM

I am not sure what you are asking. It looks like you already have code that is looking for all files. Do you want to exclude some of them? What is the logic for which ones to exclude?

Are you trying to append the data from the CSV files into the same SAS dataset?

Are the CSV files supposed to have the exact same variables in the same order in all of the CSV files?

If so DO NOT use PROC IMPORT to convert them individually into SAS datasets and then append the individual SAS datasets together. You will find that PROC IMPORT will create different data structures for each file since it is only looking at the data in that single file when it makes its guesses on how to define the structure of the SAS dataset.

If the CSV files should all have the same structure the best solution is to write your own data step to read them. That way you have control over how the variables are defined.

Fox01 · Posted 09-13-2021 10:09 AM

The Numbers of files in the c:\fldr is not fixed , the directory can contain 2,3 or 4 files, it's changes every month.

I have to automate my code so that it loads and append files regardless of their number each month.

My macro for loading files works but I need to append automatically csv files in the same sas dataset .

The files have the same structure but I can’t use the data step to append them

In the data step I should specify the name of table wich changes every month

Exempl : in my directory I have 4 files this month

The macro wil provide Tbl_1 Tbl_ 2 Tbl_3 Tbl_4

To append them I can use

Data test ;

Set Tbl_1 Tbl_ 2 Tbl_3 Tbl_4 ;

Run ;

The next month I suppose that I have juste 2 files in mi directory c:\fldr

The macro will provide Tbl_1 Tbl_2 and in this case my data step will not work because I have juste Tbl_1 Tbl_ 2 .

Tom · Posted 09-13-2021 10:34 AM

You cannot depend on PROC IMPORT generating TBL_1 and TBL_2 in compatible structures.

So don't do it that way.

If you really do not know what the files contain you can use PROC IMPORT, but first combine the text files and import all of the data at once.

You can use a wildcard filename used in the INFILE statement to read all of the CSV files in a directory in one step. You can use the FILENAME= option on the INFILE statement to be able to know when the step starts reading a new file to skip the extra header rows.

You could even remember the first header so that you can make sure the header rows are the same for all of the files.

%let dir=c:\fldr ;
filename all temp;
data _null_;
  length fname $256 ;
  infile "&dir\*.csv" filename=fname ;
  file all ;
  input ;
  if _n_=1 then do;
    put _infile_;
    first_header=_infile_;
    first_file=fname;
    retain first_header first_file ;
  end;
  if fname ne lag(fname) then do;
    if first_header ne _infile_ then do;
      put 'ERROR: The header rows do not match. ' ;
      put 'ERROR: ' first_file= :$quote. fname= :$quote.;
      put 'ERROR: ' first_header= :$quote. 'header=' _infile_ :$quote.;
      stop; 
    end;
  end;
  else put _infile_;
run;

proc import datafile=all dbms=dlm out=test replace;
  delimiter=';';
  guessingrows=max;
run;

Fox01 · Posted 09-13-2021 11:37 AM

Great !

Just out of curiosity, if the structure of files are different but are some common filed

Can I use keep or something similar in this code ?

Tom · Posted 09-13-2021 11:47 AM

If you want to consistently read CSV files where the structure varies then you will need to do a lot more work.

Is the extra work worth it? It depends on how often you need to read such files. For a one off PROC IMPORT works fine. But if you expect to run this day after day on varying inputs then the extra work to get the program smart enough to consistently read the same variable in the same way across different CSV files might be work the effort.

It might be that the best approach is the automate the process that is generating the files. Create a standard definition and make sure whoever is generating the CSV files is following that standard. You could even code steps to check the files before reading them to make sure they are consistent with the standard.

You could build some metadata or template dataset that defines the type/length and other attributes for the variables you know about. Then check just the header row to determine which variables to read from this specific file.

ballardw · Posted 09-13-2021 11:58 AM

@Fox01 wrote:

Great !

Just out of curiosity, if the structure of files are different but are some common filed

Can I use keep or something similar in this code ?

Can you describe how the files will differ?

This could be quite easy if you want the first X number of columns and the ones after that are not wanted. But if the column positions change that want from file to file I would strongly suggest reaching out to whoever is making the files in the first place and working with a common structure, at least for the data you want.

If you need to get management involved you may be able to get some attention to the costs involved with changing data file layouts for ALL users, not just your part of the process.

When I worked as a contractor dealing some data extracts and the asked why we were routinely charging them for programming costs. The files they sent, 12 to 15 per month, would have different column orders, column headers and such. We explained that changing file structures meant the programs to read them had to change. For almost every file. They went to their data department and finally specified a standard file structure that saved them thousands of dollars over the life of the project.

You may something similar for you to point out to your management team.

Append CSV files

Re: Append CSV files

Re: Append CSV files

Re: Append CSV files

Re: Append CSV files

Re: Append CSV files

Re: Append CSV files

Registration is open

Call for Content EXTENDED

Append CSV files

Re: Append CSV files

Re: Append CSV files

Re: Append CSV files

Re: Append CSV files

Re: Append CSV files

Re: Append CSV files

Registration is open

Call for Content EXTENDED

SAS Training: Just a Click Away