Hi,
I'm trying to use data step to import & merge many similarly structured CSV files into one, where their first few lines are like:
Title
Retrieved on xxx
.
Var1,Var2,Var3
1,2,3...
Apparently the actual data begins at row 5, so I tried first:
data Want;
infile '&path:\*.csv' dlm=',' missover dsd firstobs=5;
input var1 var2 var3;
run;
But the outcome turned out to be that the entering four rows were skipped for the first file, but not for the rest. Namely I had:
Var1,Var2,Var3
1,2,3
Title,.,.
Retrieved on xxx,.,.
.,.,.
4,5,6...
I then tried the infile option evo= to skip lines for every files:
data Want;
infile '&path:\*.csv' dlm=',' missover dsd evo=evo firstobs=5;
input @;
if evo then input;
input var1 var2 var3;
evo=0;
run;
But it didn't work. I have concrete confidence that the above would work for files with row 1 being its header and data beginning at row 2 when you set firstobs=2; However it seemed the same would not work for my scenario. Any help would be appreciated!
Drop the "firstobs=" approach.
Instead you can use the FILENAME= option of the infile statement. Comparing the current value of the filename= variable to its lagged value will let you know when you're starting a new csv file. Then, whenever that is the case, skip 4 records using "input #4".
data want;
length fnam $30;
infile 'c:\temp\*.csv' dlm=',' filename=fnam;
input @; /* Set pointer to current record */
if fnam^=lag(fnam) then input #4; /* New file?, then skip 4 records */
input v1 v2 ;
run;
In your example, you have
infile '&path:\*.csv' ...
which would not work, since the macro processor would not resolve the macrovar &path, because it is in single quotes. And I guess since you are appending a ":" to &path, that macrovar path is nothing more than a drive letter.
For some reason it's not working on my end. My environment is SAS Enterprise Guide 7.1 running on a Window 10. In fact, the effect of your code is exactly the same as running:
data want;
infile 'c:\temp\*.csv' dlm=',';
input v1 v2 ;
run;
So I guess it's the examining the current file name with the lagged file name part that is not working. Thanks for your reply anyway.
It seems to work for me. Simplify your problem until you can pinpoint the issue.
data _null_;
file "&wdir\file1.txt";
put 'Title';
put 'Retrieved on xxx ';
put '. ';
put 'Var1,Var2,Var3 ';
put '1,2,3 ';
file "&wdir\file2.txt";
put 'Title';
put 'Retrieved on xxx ';
put '. ';
put 'Var1,Var2,Var3 ';
put '4,5,6 ';
run;
data _null_;
do I=1,2;
FILEVAR=cats("&wdir\file",I,".txt");
infile DUMMY filevar=FILEVAR firstobs=5;
input ;
putlog _infile_;
end;
stop;
run;
1,2,3
4,5,6
Hi! Your solution seems to work, but why is it only taking the first row of each file? The result is like:
filename Var1 Var2 Var3
file1 file1(1,1) file1(1,2) file1(1,3)
file2 file2(1,1) file2(1,2) file2(1,3)
file3 file3(1,1) file3(1,2) file3(1,3)
I hope you can get it, and thank you for your time.
This is an enhanced version of the solution posted by @ChrisNZ automatically processing all files of interest. Variables most likely not relevant for the created dataset are prefixed with an underscore.
data want;
length
v1-v3 8
_filename $ 200
;
keep v1-v3;
_rc = filename("mydir", "YOUR_PATH");
_did = dopen("mydir");
do _i = 1 to dnum(_did);
_filename = dread(_did, _i);
if prxmatch("/.*\.txt/i", trim(_filename)) then do;
_fileVar = cats("YOUR_PATH\", _filename);
infile dummy delimiter=',' filevar= _fileVar firstobs= 5;
input v1--v3;
output;
end;
end;
_rc = dclose(_did);
stop;
run;
In an effort to be more specific about my question, the following are the data sets I use:
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.