DATA Step, Macro, Functions and more

How to remove a group of observations based on conditions

Reply
Occasional Contributor
Posts: 18

How to remove a group of observations based on conditions

I want to only keep the group of observations (by 'id') if the value of 'gestage' is not missing or </= 20 or >/=43. Delete any group as long as one observation has a missing value in 'gestage' or an implausible value in 'gestage'.

Thanks for helping!

data test;

input id gestage;

datalines;

1 36

1 38

2 .

2 41

3 30

3 .

3 32

4 19

4 41

5 35

5 36

5 36

;

run;

Trusted Advisor
Posts: 1,204

Re: How to remove a group of observations based on conditions

Hi,

What is the desired output.

Occasional Contributor
Posts: 18

Re: How to remove a group of observations based on conditions

The desired output is:

data want

input id gestage;

datalines;

1 36

1 38

5 35

5 36

5 36

;

run;

Trusted Advisor
Posts: 1,204

Re: How to remove a group of observations based on conditions

This </= 20 or >/=43 does not cover the above output. Are you looking for >=20 or <=43?

Occasional Contributor
Posts: 18

Re: How to remove a group of observations based on conditions

If any 'gestage' is less than 20 or greater than 43 (or missing), I want to delete the entire group.

I didn't have the 'greater than 43' in my example, but 'id'=4 has 'gestage'=19, so I want to delete all observations where 'id'=4.

Super Contributor
Posts: 275

Re: How to remove a group of observations based on conditions

proc sql;

    create table want as

  select * from test where id not in (select distinct id  from test  where gestage not between 20 and 43) ;

  quit;

Trusted Advisor
Posts: 1,204

Re: How to remove a group of observations based on conditions

Another way.

proc sort data=test out=have;

by id gestage;

run;

data want(drop=flag);

set have;

by id;

if first.id then flag=0;

if gestage=. or gestage<=20 or gestage>=43 then flag+1;

if flag=0;

run;

Super User
Posts: 9,691

Re: How to remove a group of observations based on conditions


data test;
input id gestage;
datalines;
1 36
1 38
2 .
2 41
3 30
3 .
3 32
4 19
4 41
5 35
5 36
5 36
;
run;
proc sql;
 create table want as
  select *
   from test
    group by id
      having nmiss(gestage)=0 and min(gestage) ge 20 and max(gestage) le 43;
quit;

Xia Keshan

Super Contributor
Posts: 305

Re: How to remove a group of observations based on conditions

Hello,

data have;
input id gestage;
datalines;
1 36
1 38
2 .
2 41
3 30
3 .
3 32
4 65
4 41
5 35
5 36
5 36
;
run;

proc sort data=have;
by id gestage;
run;

data want;
merge have (in=h1)
have(rename=(gestage=gestage_drop)
where=(missing(gestage_drop) or gestage_drop lt 20 or gestage_drop gt 43) in=h2);

by id;
if h1 and not h2;

drop gestage_drop;

run;

Occasional Contributor
Posts: 18

Re: How to remove a group of observations based on conditions

Thank you for the helpful replies. As a novice SAS user, I learn so much from these forums!

Occasional Contributor
Posts: 18

Re: How to remove a group of observations based on conditions

I'm having trouble applying the suggested code to another variable in my data set. I want to delete any group (by ID) as long as one observation has an implausible value for FIRSTBIRTH. That is, delete any group where FIRSTBIRTH=1 and BIRTHNUM ne 1 for any observation. This represents inaccurately reported data: a birth is noted as being a woman's first birth, but it's not consistent with the actual order of her births.

data have;

input id firstbirth birthnum;

datalines

1 1 1

1 1 2

2 1 1

2 0 2

2 0 3

3 1 1

3 1 2

3 1 3

4 1 1

4 0 2

4 0 3;

run;

This is what I want:

data want;

input id firstbirth birthnum;

datalines;

2 1 1

2 0 2

2 0 3

4 1 1

4 0 2

4 0 3;

run;

I used the following code:

proc sort data=have;

by id birthnum;

run;

data want (drop=flag);

set have;

by id;

if first.id then flag=0;

if FIRSTBIR=1 and BIRTHNUM ne 1 then flag+1;

if flag=0;

run;

What I get as output:

data want;

input id firstbirth birthnum;

datalines

1 1 1

2 1 1

2 0 2

2 0 3

3 1 1

4 1 1

4 0 2

4 0 3;

run;

When I run the code, only observations where FIRSTBIR=1 and BIRTHNUM ne 1 are deleted; not the entire group by ID where this occurs. Can someone please help me? Thank you.

Trusted Advisor
Posts: 1,131

Re: How to remove a group of observations based on conditions

Please try

proc sort data=have;

by id descending birthnum firstbirth;

run;

data want;

set have;

by  id descending birthnum firstbirth ;

retain firstbirth2;

if first.id then firstbirth2=firstbirth;

if firstbirth2=0;

run;

proc sort data=want;

by  id  birthnum firstbirth;

run;

Thanks,

Jag

Thanks,
Jag
Super User
Posts: 5,099

Re: How to remove a group of observations based on conditions

The SQL solutions might be the easiest way.  This would be the right approach for a DATA step:

data want;

  wanted='Y';

  do until (last.id);

     set have;

     by id;

     if firstbirth=1 and birthnum ne 1 then wanted='N';

  end;

  do until (last.id);

     set have;

     by id;

     if wanted='Y' then output;

  end;

  drop wanted;

run;

Occasional Contributor
Posts: 18

Re: How to remove a group of observations based on conditions

Thank you for the replies! @Astounding the approach you posted worked great.

Ask a Question
Discussion stats
  • 13 replies
  • 3159 views
  • 8 likes
  • 7 in conversation