- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi folks,
I am working on a dataset (have) with each id has multiple rows. I wanna delete the ids for which at least one value of variable diffdate is less than 60. I am struggling with the proper condition. The following are the sample "have" and "want" datasets. Thanks in advanced.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Here's an easy way, with a forgiveable flaw (explained below):
data want;
merge have (in=delete_me where=(. < diffdate < 60)) have;
by id;
if delete_me then delete;
run;
Naturally, the data must be in sorted order by ID. In addition, there can easily be a note on the log about more than one data set containing multiple observations for the BY variable. Usually that note should trigger an investigation to see if things are working properly. In this case, the note is expected and can be ignored.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Here's an easy way, with a forgiveable flaw (explained below):
data want;
merge have (in=delete_me where=(. < diffdate < 60)) have;
by id;
if delete_me then delete;
run;
Naturally, the data must be in sorted order by ID. In addition, there can easily be a note on the log about more than one data set containing multiple observations for the BY variable. Usually that note should trigger an investigation to see if things are working properly. In this case, the note is expected and can be ignored.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Many Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
proc sql;
create table want as
select * from have where id not in(
select distinct id from have where diffdate<60 or diffdate=.);
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data have;
input id $ diffdate;
datalines;
T10008 .
T10008 25
T10008 125
T10064 .
T10064 100
T10079 .
T10079 253
T10079 36
T10096 .
T10096 58
T10096 32
T10096 39
T10135 .
T10135 147
T10139 .
T10139 98
T10139 80
;
data _null_;
set have;
if _n_=1 then do;
declare hash h(dataset: 'have', multidata: 'y',ordered:'y');
h.definekey('id');
h.definedata('id','diffdate');
h.definedone();
end;
set have(where=(. < diffdate < 60)) end=last;
if h.check()=0 then h.remove();
if last then h.output(dataset:'want');
run;