- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I'm trying to use data step to import & merge many similarly structured CSV files into one, where their first few lines are like:
Title
Retrieved on xxx
.
Var1,Var2,Var3
1,2,3...
Apparently the actual data begins at row 5, so I tried first:
data Want;infile '&path:\*.csv' dlm=',' missover dsd firstobs=5;input var1 var2 var3;run;
But the outcome turned out to be that the entering four rows were skipped for the first file, but not for the rest. Namely I had:
Var1,Var2,Var3
1,2,3
Title,.,.
Retrieved on xxx,.,.
.,.,.
4,5,6...
I then tried the infile option evo= to skip lines for every files:
data Want;infile '&path:\*.csv' dlm=',' missover dsd evo=evo firstobs=5;input @;if evo then input;input var1 var2 var3;evo=0;run;
But it didn't work. I have concrete confidence that the above would work for files with row 1 being its header and data beginning at row 2 when you set firstobs=2; However it seemed the same would not work for my scenario. Any help would be appreciated!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Drop the "firstobs=" approach.
Instead you can use the FILENAME= option of the infile statement. Comparing the current value of the filename= variable to its lagged value will let you know when you're starting a new csv file. Then, whenever that is the case, skip 4 records using "input #4".
data want;
length fnam $30;
infile 'c:\temp\*.csv' dlm=',' filename=fnam;
input @; /* Set pointer to current record */
if fnam^=lag(fnam) then input #4; /* New file?, then skip 4 records */
input v1 v2 ;
run;
In your example, you have
infile '&path:\*.csv' ...
which would not work, since the macro processor would not resolve the macrovar &path, because it is in single quotes. And I guess since you are appending a ":" to &path, that macrovar path is nothing more than a drive letter.
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
For some reason it's not working on my end. My environment is SAS Enterprise Guide 7.1 running on a Window 10. In fact, the effect of your code is exactly the same as running:
data want;
infile 'c:\temp\*.csv' dlm=',';
input v1 v2 ;
run;
So I guess it's the examining the current file name with the lagged file name part that is not working. Thanks for your reply anyway.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It seems to work for me. Simplify your problem until you can pinpoint the issue.
data _null_;
file "&wdir\file1.txt";
put 'Title';
put 'Retrieved on xxx ';
put '. ';
put 'Var1,Var2,Var3 ';
put '1,2,3 ';
file "&wdir\file2.txt";
put 'Title';
put 'Retrieved on xxx ';
put '. ';
put 'Var1,Var2,Var3 ';
put '4,5,6 ';
run;
data _null_;
do I=1,2;
FILEVAR=cats("&wdir\file",I,".txt");
infile DUMMY filevar=FILEVAR firstobs=5;
input ;
putlog _infile_;
end;
stop;
run;
1,2,3
4,5,6
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi! Your solution seems to work, but why is it only taking the first row of each file? The result is like:
filename Var1 Var2 Var3
file1 file1(1,1) file1(1,2) file1(1,3)
file2 file2(1,1) file2(1,2) file2(1,3)
file3 file3(1,1) file3(1,2) file3(1,3)
I hope you can get it, and thank you for your time.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This is an enhanced version of the solution posted by @ChrisNZ automatically processing all files of interest. Variables most likely not relevant for the created dataset are prefixed with an underscore.
data want;
length
v1-v3 8
_filename $ 200
;
keep v1-v3;
_rc = filename("mydir", "YOUR_PATH");
_did = dopen("mydir");
do _i = 1 to dnum(_did);
_filename = dread(_did, _i);
if prxmatch("/.*\.txt/i", trim(_filename)) then do;
_fileVar = cats("YOUR_PATH\", _filename);
infile dummy delimiter=',' filevar= _fileVar firstobs= 5;
input v1--v3;
output;
end;
end;
_rc = dclose(_did);
stop;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
In an effort to be more specific about my question, the following are the data sets I use: