Import rows of csv file except those satisfying conditions plus the previous/following ones

Reply
Occasional Contributor
Posts: 11

Import rows of csv file except those satisfying conditions plus the previous/following ones

[ Edited ]

I would like to import all the rows of a csv file, except the rows satisfyng some conditions together with the previous and/or following rows. For example, in the following example I would like to import all rows, except those 

- in which var1 = paris, and the previous row.

- in which var1 = madrid, and both the previous and the following row.

 

 data want;
 
 infile datalines;
 input 
   var1 $
   var2
   var3;
 datalines;
 abc 73 985
 abc 132  163
 paris 220  193 
gft 685 741 cde 195 163 paris 220 198 cde 173 155 abe 135 116 madrid 735 161 " 173 155; run;

 

It follows that the dataset want only contains the observations "abc 73 985" and "gft 685 741".

 

Any help is appreciated.

 

Thanks!!

PROC Star
Posts: 288

Re: Import rows of csv file except those satisfying conditions plus the previous/following ones

Are you opposed to importing first and then doing the subsetting?

Occasional Contributor
Posts: 11

Re: Import rows of csv file except those satisfying conditions plus the previous/following ones

Ideally, I would like to do the subsetting while importing.

The reason is that the rows that I would like to leave out contain invalid data, and SAS throws errors in the log when I import them.

 

Thanks.

Super User
Posts: 10,500

Re: Import rows of csv file except those satisfying conditions plus the previous/following ones

As shown the "gft 685 741" can't match your requirement as it is on the same row as 'paris'.

Please correct you datalines section.

 

Also if the " is supposed to represent a blank/missing value that is incorrect. Use a single period instead. The double quote becomes the value of a variable as does two single quotes.

Occasional Contributor
Posts: 11

Re: Import rows of csv file except those satisfying conditions plus the previous/following ones

Thank you for spotting that. For some reason I cannot write it in the correct way as a SAS code. But yes, it should be:

 

data want;

infile datalines;

input var1 $ var2 var3;

datalines;

abc 73 985

abc 132 163

paris 220 193
gft 685 741

cde 195 163

paris 220 198

cde 173 155

abe 135 116

madrid 735 161

" 173 155;

run;

 

The " could be the value of a variable, but if creates confusion I could replace it with anything else.

On the other hand, there could also be blank\missing values.

PROC Star
Posts: 7,363

Re: Import rows of csv file except those satisfying conditions plus the previous/following ones

The following may be overkill, but it does let you read the data twice, get rid of the undesired records, yet result it no errors in the log:

 

data recs (keep=_recnum);
  infile '/folders/myfolders/datahave.txt' end=eof;
  do until (eof);
    input;
    rec_count+1;
    if scan(_infile_,1) eq 'paris' then do;
      _recnum=rec_count;
      output;
      _recnum=rec_count-1;
      output;
    end;
    else if scan(_infile_,1) eq 'madrid' then do;
      _recnum=rec_count;
      output;
      _recnum=rec_count-1;
      output;
      _recnum=rec_count+1;
      output;
    end;
  end;
run;
data want (drop=_:);
  array _drops (999999) _temporary_;
  do until (eof1);
    set recs end=eof1;
    _rec_num+1;
    _drops(_rec_num)=_recnum;
  end;
  _rec_num=0;
  infile '/folders/myfolders/datahave.txt' end=eof2;
  do until (eof2);
    _rec_num+1;
    input @;
    if _rec_num not in _drops then do;
      input var1$ var2 var3;
      output;
    end;
    else input;
  end;
run;

Art, CEO, AnalystFinder.com

 

Ask a Question
Discussion stats
  • 5 replies
  • 164 views
  • 0 likes
  • 4 in conversation