BookmarkSubscribeRSS Feed
RADAGBE9
Fluorite | Level 6

I have a folder located at C:\MyDataFolder

This folder contains multiple and identical excel .csv files and each file has the structure  as shown in table below. My goal is to combine (append) all these multiple files into one big file with the same structure and single header. The number of files in the folder continues to grow each day. 

VAR1VAR2VAR3
364125
484429
295033
423733
222322
393542
284234
324824
302123

 

 

12 REPLIES 12
novinosrin
Tourmaline | Level 20

Hi @RADAGBE9 , Courtesy @Reeza  gave me this link

 

http://support.sas.com/kb/45/805.html

that lists all files in a directory

Then use FILEVAR= to read those files

Use this structure

  Filename filelist pipe "dir /b /s c:\temp\*.sas"; /*your directory location*/
                                                                                   
   data want;                                        
     Infile filelist truncover;
     Input filename $300.;
    
     infile dummy filevar=filename dsd end=z;

     do until (z);
  
        input . . .;

        output;

     end;

  
   Run; 

RADAGBE9
Fluorite | Level 6

Thank you  I'm trying to read .CSV files  from my folder:  /home/username/

I'm unable to modify your code to read that.  New .csv files are added daily but I would not like to read all of the same files everyday. I want it to import only the newly added .csv files and append to the one BigFile.sas.

 

Thanks

 

Tom
Super User Tom
Super User

You don't need to actually make any SAS datasets?  That is you just want to copy the lines of text from those files into another text file?  Where are you writing the new file? For simplicity let's assume it is in a different folder.

 

%let inpath=C:\MyDataFolder\ ;
%let outfile=C:\MyDataFolder2\bigfile.csv;
data _null_;
  length fname $256 ;
  infile "&inpath\*.csv" filename=fname ;
  file "&outfile";
  input;
  if _n_=1 or (fname=lag(fname) ) then put _infile_;
run;

Do you want to re-copy everything every day?  or do you need a more complex process that keeps track of which files you have already copied?  Or perhaps just checks the timestamps on the files and appends the new files?

RADAGBE9
Fluorite | Level 6

Thanks for your response.

I want to actually combine all the .CSV files into one sas dataset for further manipulation.

The new data set would be written into a library.

My goal is to recopy the new files (for efficiency). So YES I'd not want to recopy and combine all the files everyday.

Also, the filesnames are timestamped and unique.

 

So for example the files are:

C:\MyDataFolder\File.10.1.2019.csv

C:\MyDataFolder\File.10.2.2019.csv

C:\MyDataFolder\File.10.3.2019.csv

...etc

 

And the final should be

WORK.bigfile.sas7bdat

 

 

 

Tom
Super User Tom
Super User

Not hard to read all of the files at once.

 

%let inpath=C:\MyDataFolder\ ;

data bigfile;
  length fname $256 ;
  infile "&inpath\*.csv" filename=fname truncover;
  input @;
  if _n_=1 or (fname=lag(fname) ) then delete;
  input var1 var2 var3;
run;

You might want to add code to parse the date out of the filename (if the date is not already in one of the fields in the actual data).  You might need to add options like DSD to the INFILE statement and clear up the INPUT statement to properly read the variables you have.

 

 

If you want to only read files after a particular date it might be better to generate the list of filenames first and filter that, then use the list to drive the reading.  (Note this type of filtering would be easier if the filenames use Y/M/D order of the date string in the filename since then the filenames would sort lexicographically into chronological order.)

%let inpath=C:\MyDataFolder\ ;
data filelist ;
  infile "dir ""&inpath.*.csv"" /b" pipe truncover;
  input fname $256. ;
run;

data bigfile;
  set filelist ;
  where '01JAN2019'd <=
mdy(input(scan(filename,-4,'.'),32.)
   ,input(scan(filename,-3,'.'),32.)
   ,input(scan(filename,-2,'.'),32.)
  );
  infile TXT filevar=fname truncover end=eof firstobs=2;
  do while (not eof);
    input var1 var2 var3;
    output;
  end;
run;
RADAGBE9
Fluorite | Level 6

Hi Tom,

I run the code below but it returned no data. Actually the variables in the data are 20. Do I need to list all of them in place of Var1, Var2 & Var3? Also what does fname represent?

 

%let inpath=C:\MyDataFolder\ ;

data bigfile;
  length fname $256 ;
  infile "&inpath\*.csv" filename=fname truncover;
  input @;
  if _n_=1 or (fname=lag(fname) ) then delete;
  input var1 var2 var3;
run;

 

 

Tom
Super User Tom
Super User

Try reading just one file first to get the proper INPUT statement syntax for reading your 20 variables.  You might also want to add other statements like FORMAT, LABEL and possible INFORMAT.

 

By using the FILENAME= option on the INFILE statement the FNAME variable named there is set to the name of the file that is currently being read. 

The FNAME=LAG(FNAME) is testing whether you are currently reading the 2nd,3rd, etc line of that particular file.  For this you code you need to reverse that to FNAME ne LAG(FNAME) since you are deleting the records that match that condition.  The DELETE will let you skip the header row in each file.

 

%let inpath=C:\MyDataFolder;

data bigfile;
  length fname $256 ;
  infile "&inpath\*.csv" filename=fname truncover;
  input @;
  if fname ne lag(fname) then delete;
  input var1-var20;
run;
RADAGBE9
Fluorite | Level 6

Thanks Tom,

 

I'm using SAS EG, so I'm able to import one .CSV into SAS successfully, which generates its own LENGTH, FORMAT & INFORMAT statements. After that I added your code below and replace  " input var1-var20" with the INPUT from the SAS EG generated code. I also replaced the first inpath with the correct path. When I run this code, it creates an empty dataset instead of the combined dataset that I'm looking for. I appreciate your assistance.  Thanks

 

%let inpath=C:\MyDataFolder;

data bigfile;
  length fname $256 ;
  infile "&inpath\*.csv" filename=fname truncover;
  input @;
  if fname ne lag(fname) then delete;
  input var1-var20;
run;

Tom
Super User Tom
Super User

You need to show the actual log from SAS to see if it is finding any of the files.

You mentioned Enterprise Guide.  That is a program that runs on Windows and can connect to SAS running somewhere else.  It might connect to SAS running on the same machine where EG is running, but it might not.  Only you know how your setup is working.  

For the posted code to work the folder with the CSV files in it needs to be visible on the machine where SAS is running.  So you might need to copy the files or place them in a shared folder of some type. If the server running SAS is using Unix then the path might look a lot different than it looks on your PC that is running EG.  Also on Unix filenames are case sensitive.  So x.csv is a different file than x.CSV. 

RADAGBE9
Fluorite | Level 6

Thanks for your response.

 

I successfully used the copy files task in SAS EG to copy *.csv  from  a shared folder on our network. So I can see all the CSV files have been loaded into my home directory in SAS EG,  which is visible to the to the SAS Server. Also our SAS runs on UNIX so the INFILE generated by File-->Import Data looks a bit different but it loads one .csv  all right. Also to add, I would not like the files to keep on loading all the files in the folder everyday. I want it to load the new ones that have been added and append to the one big SAS dataset having all the daily files loaded.

 

 

Tom
Super User Tom
Super User

Sounds like you can now find the files and read them.

Now to only read the new files and append them you need some method for (1) getting the list of files (2) knowing which files have been read.  With those two pieces of information you can determine which files need to be read. Once you have read the new files can then append the data to end of your "big" dataset.

 

One way to get the list of files it to read the output of your operating system command to list the file.

data filelist;
  infile "ls &path/*.csv" pipe truncover ;
  input filename $256.;
run;

If that doesn't work you can use the DOPEN() and DREAD() functions.  That question has been answered many times on this site.

 

One way to keep track is to store the name of the file into a variable on the big dataset and then just query to find the list. 

proc sql ;
  create already_read_list as
  select distinct filename from sasfiles.big
  order by filename
  ;
quit;

Another way is to maintain some separate file with just the list of files.  If the file names follow a pattern (such as having a date as part of the name) then you might not need to store the full filename, just the part that varies.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 12 replies
  • 6163 views
  • 0 likes
  • 4 in conversation