I want to only keep the group of observations (by 'id') if the value of 'gestage' is not missing or </= 20 or >/=43. Delete any group as long as one observation has a missing value in 'gestage' or an implausible value in 'gestage'.
Thanks for helping!
data test;
input id gestage;
datalines;
1 36
1 38
2 .
2 41
3 30
3 .
3 32
4 19
4 41
5 35
5 36
5 36
;
run;
Hi,
What is the desired output.
The desired output is:
data want
input id gestage;
datalines;
1 36
1 38
5 35
5 36
5 36
;
run;
This </= 20 or >/=43 does not cover the above output. Are you looking for >=20 or <=43?
If any 'gestage' is less than 20 or greater than 43 (or missing), I want to delete the entire group.
I didn't have the 'greater than 43' in my example, but 'id'=4 has 'gestage'=19, so I want to delete all observations where 'id'=4.
proc sql;
create table want as
select * from test where id not in (select distinct id from test where gestage not between 20 and 43) ;
quit;
Another way.
proc sort data=test out=have;
by id gestage;
run;
data want(drop=flag);
set have;
by id;
if first.id then flag=0;
if gestage=. or gestage<=20 or gestage>=43 then flag+1;
if flag=0;
run;
data test; input id gestage; datalines; 1 36 1 38 2 . 2 41 3 30 3 . 3 32 4 19 4 41 5 35 5 36 5 36 ; run; proc sql; create table want as select * from test group by id having nmiss(gestage)=0 and min(gestage) ge 20 and max(gestage) le 43; quit;
Xia Keshan
Hello,
data have;
input id gestage;
datalines;
1 36
1 38
2 .
2 41
3 30
3 .
3 32
4 65
4 41
5 35
5 36
5 36
;
run;
proc sort data=have;
by id gestage;
run;
data want;
merge have (in=h1)
have(rename=(gestage=gestage_drop)
where=(missing(gestage_drop) or gestage_drop lt 20 or gestage_drop gt 43) in=h2);
by id;
if h1 and not h2;
drop gestage_drop;
run;
Thank you for the helpful replies. As a novice SAS user, I learn so much from these forums!
I'm having trouble applying the suggested code to another variable in my data set. I want to delete any group (by ID) as long as one observation has an implausible value for FIRSTBIRTH. That is, delete any group where FIRSTBIRTH=1 and BIRTHNUM ne 1 for any observation. This represents inaccurately reported data: a birth is noted as being a woman's first birth, but it's not consistent with the actual order of her births.
data have;
input id firstbirth birthnum;
datalines
1 1 1
1 1 2
2 1 1
2 0 2
2 0 3
3 1 1
3 1 2
3 1 3
4 1 1
4 0 2
4 0 3;
run;
This is what I want:
data want;
input id firstbirth birthnum;
datalines;
2 1 1
2 0 2
2 0 3
4 1 1
4 0 2
4 0 3;
run;
I used the following code:
proc sort data=have;
by id birthnum;
run;
data want (drop=flag);
set have;
by id;
if first.id then flag=0;
if FIRSTBIR=1 and BIRTHNUM ne 1 then flag+1;
if flag=0;
run;
What I get as output:
data want;
input id firstbirth birthnum;
datalines
1 1 1
2 1 1
2 0 2
2 0 3
3 1 1
4 1 1
4 0 2
4 0 3;
run;
When I run the code, only observations where FIRSTBIR=1 and BIRTHNUM ne 1 are deleted; not the entire group by ID where this occurs. Can someone please help me? Thank you.
Please try
proc sort data=have;
by id descending birthnum firstbirth;
run;
data want;
set have;
by id descending birthnum firstbirth ;
retain firstbirth2;
if first.id then firstbirth2=firstbirth;
if firstbirth2=0;
run;
proc sort data=want;
by id birthnum firstbirth;
run;
Thanks,
Jag
The SQL solutions might be the easiest way. This would be the right approach for a DATA step:
data want;
wanted='Y';
do until (last.id);
set have;
by id;
if firstbirth=1 and birthnum ne 1 then wanted='N';
end;
do until (last.id);
set have;
by id;
if wanted='Y' then output;
end;
drop wanted;
run;
Thank you for the replies! @Astounding the approach you posted worked great.
An alternative approach is to use a hash table lookup.
You first load the hash table with the id's matching the condition you don't want. You then delete all records who have a match with the hash table over id.
data have;
input id firstbirth birthnum;
datalines;
1 1 1
1 1 2
2 1 1
2 0 2
2 0 3
3 1 1
3 1 2
3 1 3
4 1 1
4 0 2
4 0 3
;
data want;
if _n_=1 then
do;
dcl hash h1(dataset:'have(where=(FIRSTBIRTH=1 and BIRTHNUM ne 1))');
h1.defineKey('id');
h1.defineDone();
end;
set have;
if h1.check()=0 then delete;
run;
proc print data=want;
run;
Using hash tables has the big advantage that your source data doesn't need to be sorted.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.