- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have a folder located at C:\MyDataFolder
This folder contains multiple and identical excel .csv files and each file has the structure as shown in table below. My goal is to combine (append) all these multiple files into one big file with the same structure and single header. The number of files in the folder continues to grow each day.
VAR1 | VAR2 | VAR3 |
36 | 41 | 25 |
48 | 44 | 29 |
29 | 50 | 33 |
42 | 37 | 33 |
22 | 23 | 22 |
39 | 35 | 42 |
28 | 42 | 34 |
32 | 48 | 24 |
30 | 21 | 23 |
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Looks like you're trying to do this:
-unison
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @RADAGBE9 , Courtesy @Reeza gave me this link
http://support.sas.com/kb/45/805.html
that lists all files in a directory
Then use FILEVAR= to read those files
Use this structure
Filename filelist pipe "dir /b /s c:\temp\*.sas"; /*your directory location*/
data want;
Infile filelist truncover;
Input filename $300.;
infile dummy filevar=filename dsd end=z;
do until (z);
input . . .;
output;
end;
Run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you novinosrin. I'm trying to read .CSV files from my folder: /home/username/
I'm unable to modify your code to read that. New .csv files are added daily but I would not like to read all of the same files everyday. I want it to import only the newly added .csv files and append to the one BigFile.sas.
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You don't need to actually make any SAS datasets? That is you just want to copy the lines of text from those files into another text file? Where are you writing the new file? For simplicity let's assume it is in a different folder.
%let inpath=C:\MyDataFolder\ ;
%let outfile=C:\MyDataFolder2\bigfile.csv;
data _null_;
length fname $256 ;
infile "&inpath\*.csv" filename=fname ;
file "&outfile";
input;
if _n_=1 or (fname=lag(fname) ) then put _infile_;
run;
Do you want to re-copy everything every day? or do you need a more complex process that keeps track of which files you have already copied? Or perhaps just checks the timestamps on the files and appends the new files?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for your response.
I want to actually combine all the .CSV files into one sas dataset for further manipulation.
The new data set would be written into a library.
My goal is to recopy the new files (for efficiency). So YES I'd not want to recopy and combine all the files everyday.
Also, the filesnames are timestamped and unique.
So for example the files are:
C:\MyDataFolder\File.10.1.2019.csv
C:\MyDataFolder\File.10.2.2019.csv
C:\MyDataFolder\File.10.3.2019.csv
...etc
And the final should be
WORK.bigfile.sas7bdat
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Not hard to read all of the files at once.
%let inpath=C:\MyDataFolder\ ;
data bigfile;
length fname $256 ;
infile "&inpath\*.csv" filename=fname truncover;
input @;
if _n_=1 or (fname=lag(fname) ) then delete;
input var1 var2 var3;
run;
You might want to add code to parse the date out of the filename (if the date is not already in one of the fields in the actual data). You might need to add options like DSD to the INFILE statement and clear up the INPUT statement to properly read the variables you have.
If you want to only read files after a particular date it might be better to generate the list of filenames first and filter that, then use the list to drive the reading. (Note this type of filtering would be easier if the filenames use Y/M/D order of the date string in the filename since then the filenames would sort lexicographically into chronological order.)
%let inpath=C:\MyDataFolder\ ;
data filelist ;
infile "dir ""&inpath.*.csv"" /b" pipe truncover;
input fname $256. ;
run;
data bigfile;
set filelist ;
where '01JAN2019'd <=
mdy(input(scan(filename,-4,'.'),32.)
,input(scan(filename,-3,'.'),32.)
,input(scan(filename,-2,'.'),32.)
);
infile TXT filevar=fname truncover end=eof firstobs=2;
do while (not eof);
input var1 var2 var3;
output;
end;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Tom,
I run the code below but it returned no data. Actually the variables in the data are 20. Do I need to list all of them in place of Var1, Var2 & Var3? Also what does fname represent?
%let inpath=C:\MyDataFolder\ ;
data bigfile;
length fname $256 ;
infile "&inpath\*.csv" filename=fname truncover;
input @;
if _n_=1 or (fname=lag(fname) ) then delete;
input var1 var2 var3;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Try reading just one file first to get the proper INPUT statement syntax for reading your 20 variables. You might also want to add other statements like FORMAT, LABEL and possible INFORMAT.
By using the FILENAME= option on the INFILE statement the FNAME variable named there is set to the name of the file that is currently being read.
The FNAME=LAG(FNAME) is testing whether you are currently reading the 2nd,3rd, etc line of that particular file. For this you code you need to reverse that to FNAME ne LAG(FNAME) since you are deleting the records that match that condition. The DELETE will let you skip the header row in each file.
%let inpath=C:\MyDataFolder;
data bigfile;
length fname $256 ;
infile "&inpath\*.csv" filename=fname truncover;
input @;
if fname ne lag(fname) then delete;
input var1-var20;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Tom,
I'm using SAS EG, so I'm able to import one .CSV into SAS successfully, which generates its own LENGTH, FORMAT & INFORMAT statements. After that I added your code below and replace " input var1-var20" with the INPUT from the SAS EG generated code. I also replaced the first inpath with the correct path. When I run this code, it creates an empty dataset instead of the combined dataset that I'm looking for. I appreciate your assistance. Thanks
%let inpath=C:\MyDataFolder;
data bigfile;
length fname $256 ;
infile "&inpath\*.csv" filename=fname truncover;
input @;
if fname ne lag(fname) then delete;
input var1-var20;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You need to show the actual log from SAS to see if it is finding any of the files.
You mentioned Enterprise Guide. That is a program that runs on Windows and can connect to SAS running somewhere else. It might connect to SAS running on the same machine where EG is running, but it might not. Only you know how your setup is working.
For the posted code to work the folder with the CSV files in it needs to be visible on the machine where SAS is running. So you might need to copy the files or place them in a shared folder of some type. If the server running SAS is using Unix then the path might look a lot different than it looks on your PC that is running EG. Also on Unix filenames are case sensitive. So x.csv is a different file than x.CSV.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for your response.
I successfully used the copy files task in SAS EG to copy *.csv from a shared folder on our network. So I can see all the CSV files have been loaded into my home directory in SAS EG, which is visible to the to the SAS Server. Also our SAS runs on UNIX so the INFILE generated by File-->Import Data looks a bit different but it loads one .csv all right. Also to add, I would not like the files to keep on loading all the files in the folder everyday. I want it to load the new ones that have been added and append to the one big SAS dataset having all the daily files loaded.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Sounds like you can now find the files and read them.
Now to only read the new files and append them you need some method for (1) getting the list of files (2) knowing which files have been read. With those two pieces of information you can determine which files need to be read. Once you have read the new files can then append the data to end of your "big" dataset.
One way to get the list of files it to read the output of your operating system command to list the file.
data filelist;
infile "ls &path/*.csv" pipe truncover ;
input filename $256.;
run;
If that doesn't work you can use the DOPEN() and DREAD() functions. That question has been answered many times on this site.
One way to keep track is to store the name of the file into a variable on the big dataset and then just query to find the list.
proc sql ;
create already_read_list as
select distinct filename from sasfiles.big
order by filename
;
quit;
Another way is to maintain some separate file with just the list of files. If the file names follow a pattern (such as having a date as part of the name) then you might not need to store the full filename, just the part that varies.