I want to only keep the group of observations (by 'id') if the value of 'gestage' is not missing or </= 20 or >/=43. Delete any group as long as one observation has a missing value in 'gestage' or an implausible value in 'gestage'.
Thanks for helping!
data test;
input id gestage;
datalines;
1 36
1 38
2 .
2 41
3 30
3 .
3 32
4 19
4 41
5 35
5 36
5 36
;
run;
Hi,
What is the desired output.
The desired output is:
data want
input id gestage;
datalines;
1 36
1 38
5 35
5 36
5 36
;
run;
This </= 20 or >/=43 does not cover the above output. Are you looking for >=20 or <=43?
If any 'gestage' is less than 20 or greater than 43 (or missing), I want to delete the entire group.
I didn't have the 'greater than 43' in my example, but 'id'=4 has 'gestage'=19, so I want to delete all observations where 'id'=4.
proc sql;
create table want as
select * from test where id not in (select distinct id from test where gestage not between 20 and 43) ;
quit;
Another way.
proc sort data=test out=have;
by id gestage;
run;
data want(drop=flag);
set have;
by id;
if first.id then flag=0;
if gestage=. or gestage<=20 or gestage>=43 then flag+1;
if flag=0;
run;
data test; input id gestage; datalines; 1 36 1 38 2 . 2 41 3 30 3 . 3 32 4 19 4 41 5 35 5 36 5 36 ; run; proc sql; create table want as select * from test group by id having nmiss(gestage)=0 and min(gestage) ge 20 and max(gestage) le 43; quit;
Xia Keshan
Hello,
data have;
input id gestage;
datalines;
1 36
1 38
2 .
2 41
3 30
3 .
3 32
4 65
4 41
5 35
5 36
5 36
;
run;
proc sort data=have;
by id gestage;
run;
data want;
merge have (in=h1)
have(rename=(gestage=gestage_drop)
where=(missing(gestage_drop) or gestage_drop lt 20 or gestage_drop gt 43) in=h2);
by id;
if h1 and not h2;
drop gestage_drop;
run;
Thank you for the helpful replies. As a novice SAS user, I learn so much from these forums!
I'm having trouble applying the suggested code to another variable in my data set. I want to delete any group (by ID) as long as one observation has an implausible value for FIRSTBIRTH. That is, delete any group where FIRSTBIRTH=1 and BIRTHNUM ne 1 for any observation. This represents inaccurately reported data: a birth is noted as being a woman's first birth, but it's not consistent with the actual order of her births.
data have;
input id firstbirth birthnum;
datalines
1 1 1
1 1 2
2 1 1
2 0 2
2 0 3
3 1 1
3 1 2
3 1 3
4 1 1
4 0 2
4 0 3;
run;
This is what I want:
data want;
input id firstbirth birthnum;
datalines;
2 1 1
2 0 2
2 0 3
4 1 1
4 0 2
4 0 3;
run;
I used the following code:
proc sort data=have;
by id birthnum;
run;
data want (drop=flag);
set have;
by id;
if first.id then flag=0;
if FIRSTBIR=1 and BIRTHNUM ne 1 then flag+1;
if flag=0;
run;
What I get as output:
data want;
input id firstbirth birthnum;
datalines
1 1 1
2 1 1
2 0 2
2 0 3
3 1 1
4 1 1
4 0 2
4 0 3;
run;
When I run the code, only observations where FIRSTBIR=1 and BIRTHNUM ne 1 are deleted; not the entire group by ID where this occurs. Can someone please help me? Thank you.
Please try
proc sort data=have;
by id descending birthnum firstbirth;
run;
data want;
set have;
by id descending birthnum firstbirth ;
retain firstbirth2;
if first.id then firstbirth2=firstbirth;
if firstbirth2=0;
run;
proc sort data=want;
by id birthnum firstbirth;
run;
Thanks,
Jag
The SQL solutions might be the easiest way. This would be the right approach for a DATA step:
data want;
wanted='Y';
do until (last.id);
set have;
by id;
if firstbirth=1 and birthnum ne 1 then wanted='N';
end;
do until (last.id);
set have;
by id;
if wanted='Y' then output;
end;
drop wanted;
run;
Thank you for the replies! @Astounding the approach you posted worked great.
An alternative approach is to use a hash table lookup.
You first load the hash table with the id's matching the condition you don't want. You then delete all records who have a match with the hash table over id.
data have;
input id firstbirth birthnum;
datalines;
1 1 1
1 1 2
2 1 1
2 0 2
2 0 3
3 1 1
3 1 2
3 1 3
4 1 1
4 0 2
4 0 3
;
data want;
if _n_=1 then
do;
dcl hash h1(dataset:'have(where=(FIRSTBIRTH=1 and BIRTHNUM ne 1))');
h1.defineKey('id');
h1.defineDone();
end;
set have;
if h1.check()=0 then delete;
run;
proc print data=want;
run;
Using hash tables has the big advantage that your source data doesn't need to be sorted.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.