BookmarkSubscribeRSS Feed
Fox01
Fluorite | Level 6

Hello,

I want to load csv files into sas and append them , but the issue is that the number of files is not fixe ( it’s could be 2, 3 …).

So to load them I’m using this macro :

 

%macro drive(dir,ext);

   %local cnt filrf rc did memcnt name;

   %let cnt=0;         

   %let filrf=mydir;   

   %let rc=%sysfunc(filename(filrf,&dir));

   %let did=%sysfunc(dopen(&filrf));

    %if &did ne 0 %then %do;  

   %let memcnt=%sysfunc(dnum(&did));   

    %do i=1 %to &memcnt;             

                   

      %let name=%qscan(%qsysfunc(dread(&did,&i)),-1,.);                   

                    

      %if %qupcase(%qsysfunc(dread(&did,&i))) ne %qupcase(&name) %then %do;

       %if %superq(ext) = %superq(name) %then %do;                        

          %let cnt=%eval(&cnt+1);      

          %put %qsysfunc(dread(&did,&i)); 

          proc import datafile="&dir\%qsysfunc(dread(&did,&i))" out=casuser.tbl_&cnt

           dbms=csv replace;           

          run;         

       %end;

      %end; 

 

    %end;

      %end;

  %else %put &dir cannot be open.;

  %let rc=%sysfunc(dclose(&did));     

            

 %mend drive;

 

%drive(c:\fldr,csv)

 

But I don’t know where I  should add the append in this code

Any help please ?

Thank you

6 REPLIES 6
Tom
Super User Tom
Super User

I am not sure what you are asking. It looks like you already have code that is looking for all files.  Do you want to exclude some of them? What is the logic for which ones to exclude?

 

Are you trying to append the data from the CSV files into the same SAS dataset?

Are the CSV files supposed to have the exact same variables in the same order in all of the CSV files?

If so DO NOT use PROC IMPORT to convert them individually into SAS datasets and then append the individual SAS datasets together.  You will find that PROC IMPORT will create different data structures for each file since it is only looking at the data in that single file when it makes its guesses on how to define the structure of the SAS dataset.

If the CSV files should all have the same structure the best solution is to write your own data step to read them. That way you have control over how the variables are defined.

Fox01
Fluorite | Level 6

The Numbers of files in the c:\fldr is not fixed , the directory can contain 2,3 or 4 files, it's changes every month.

 I have to automate my code so that it loads and append  files regardless of their number each month.

My macro for loading files works  but I need to append automatically csv files in the same sas dataset .

The files have the same structure but I can’t use the data step to append them

In the data step I should specify the name of table wich changes every month

Exempl : in my directory I have 4 files this month

The macro wil provide Tbl_1  Tbl_ 2 Tbl_3 Tbl_4

To append them I can  use

 

Data test ;

Set Tbl_1  Tbl_ 2 Tbl_3 Tbl_4 ;

Run ;

 

The next month I suppose that I have juste 2 files in mi directory c:\fldr

The macro will provide Tbl_1  Tbl_2 and in this case my data step will not work because I have juste Tbl_1  Tbl_ 2 .

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Tom
Super User Tom
Super User

You cannot depend on PROC IMPORT generating TBL_1 and TBL_2 in compatible structures.

So don't do it that way.

If you really do not know what the files contain you can use PROC IMPORT, but first combine the text files and import all of the data at once.

You can use a wildcard filename used in the INFILE statement to read all of the CSV files in a directory in one step.  You can use the FILENAME= option on the INFILE statement to be able to know when the step starts reading a new file to skip the extra header rows.

You could even remember the first header so that you can make sure the header rows are the same for all of the files.

%let dir=c:\fldr ;
filename all temp;
data _null_;
  length fname $256 ;
  infile "&dir\*.csv" filename=fname ;
  file all ;
  input ;
  if _n_=1 then do;
    put _infile_;
    first_header=_infile_;
    first_file=fname;
    retain first_header first_file ;
  end;
  if fname ne lag(fname) then do;
    if first_header ne _infile_ then do;
      put 'ERROR: The header rows do not match. ' ;
      put 'ERROR: ' first_file= :$quote. fname= :$quote.;
      put 'ERROR: ' first_header= :$quote. 'header=' _infile_ :$quote.;
      stop; 
    end;
  end;
  else put _infile_;
run;

proc import datafile=all dbms=dlm out=test replace;
  delimiter=';';
  guessingrows=max;
run; 

 

Fox01
Fluorite | Level 6

Great !

Just out of curiosity, if the structure of files are different but are some common filed

Can I use keep or something similar in this code ?

 

 

Tom
Super User Tom
Super User

If you want to consistently read CSV files where the structure varies then you will need to do a lot more work.

Is the extra work worth it?  It depends on how often you need to read such files.  For a one off PROC IMPORT works fine.  But if you expect to run this day after day on varying inputs then the extra work to get the program smart enough to consistently read the same variable in the same way across different CSV files might be work the effort.

It might be that the best approach is the automate the process that is generating the files.  Create a standard definition and make sure whoever is generating the CSV files is following that standard.  You could even code steps to check the files before reading them to make sure they are consistent with the standard.

 

You could build some metadata or template dataset that defines the type/length and other attributes for the variables you know about.  Then check just the header row to determine which variables to read from this specific file.

ballardw
Super User

@Fox01 wrote:

Great !

Just out of curiosity, if the structure of files are different but are some common filed

Can I use keep or something similar in this code ?

 

 


Can you describe how the files will differ?

This could be quite easy if you want the first X number of columns and the ones after that are not wanted. But if the column positions change that want from file to file I would strongly suggest reaching out to whoever is making the files in the first place and working with a common structure, at least for the data you want.

 

If you need to get management involved you may be able to get some attention to the costs involved with changing data file layouts for ALL users, not just your part of the process.

When I worked as a contractor dealing some data extracts and the asked why we were routinely charging them for programming costs. The files they sent, 12 to 15 per month, would have different column orders, column headers and such. We explained that changing file structures meant the programs to read them had to change. For almost every file. They went to their data department and finally specified a standard file structure that saved them thousands of dollars over the life of the project.

You may something similar for you to point out to your management team.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 2436 views
  • 1 like
  • 3 in conversation