03-03-2017 11:10 AM - edited 03-06-2017 02:28 AM
I would like to import all the rows of a csv file, except the rows satisfyng some conditions together with the previous and/or following rows. For example, in the following example I would like to import all rows, except those
- in which var1 = paris, and the previous row.
- in which var1 = madrid, and both the previous and the following row.
data want; infile datalines; input var1 $ var2 var3; datalines; abc 73 985 abc 132 163 paris 220 193
gft 685 741 cde 195 163 paris 220 198 cde 173 155 abe 135 116 madrid 735 161 " 173 155; run;
It follows that the dataset want only contains the observations "abc 73 985" and "gft 685 741".
Any help is appreciated.
03-06-2017 02:45 AM
Ideally, I would like to do the subsetting while importing.
The reason is that the rows that I would like to leave out contain invalid data, and SAS throws errors in the log when I import them.
03-03-2017 12:54 PM
As shown the "gft 685 741" can't match your requirement as it is on the same row as 'paris'.
Please correct you datalines section.
Also if the " is supposed to represent a blank/missing value that is incorrect. Use a single period instead. The double quote becomes the value of a variable as does two single quotes.
03-06-2017 02:38 AM
Thank you for spotting that. For some reason I cannot write it in the correct way as a SAS code. But yes, it should be:
input var1 $ var2 var3;
abc 73 985
abc 132 163
paris 220 193
gft 685 741
cde 195 163
paris 220 198
cde 173 155
abe 135 116
madrid 735 161
" 173 155;
The " could be the value of a variable, but if creates confusion I could replace it with anything else.
On the other hand, there could also be blank\missing values.
03-07-2017 02:53 PM
The following may be overkill, but it does let you read the data twice, get rid of the undesired records, yet result it no errors in the log:
data recs (keep=_recnum); infile '/folders/myfolders/datahave.txt' end=eof; do until (eof); input; rec_count+1; if scan(_infile_,1) eq 'paris' then do; _recnum=rec_count; output; _recnum=rec_count-1; output; end; else if scan(_infile_,1) eq 'madrid' then do; _recnum=rec_count; output; _recnum=rec_count-1; output; _recnum=rec_count+1; output; end; end; run; data want (drop=_:); array _drops (999999) _temporary_; do until (eof1); set recs end=eof1; _rec_num+1; _drops(_rec_num)=_recnum; end; _rec_num=0; infile '/folders/myfolders/datahave.txt' end=eof2; do until (eof2); _rec_num+1; input @; if _rec_num not in _drops then do; input var1$ var2 var3; output; end; else input; end; run;
Art, CEO, AnalystFinder.com