BookmarkSubscribeRSS Feed
burtsm
Calcite | Level 5

I want to only keep the group of observations (by 'id') if the value of 'gestage' is not missing or </= 20 or >/=43. Delete any group as long as one observation has a missing value in 'gestage' or an implausible value in 'gestage'.

Thanks for helping!

data test;

input id gestage;

datalines;

1 36

1 38

2 .

2 41

3 30

3 .

3 32

4 19

4 41

5 35

5 36

5 36

;

run;

15 REPLIES 15
stat_sas
Ammonite | Level 13

Hi,

What is the desired output.

burtsm
Calcite | Level 5

The desired output is:

data want

input id gestage;

datalines;

1 36

1 38

5 35

5 36

5 36

;

run;

stat_sas
Ammonite | Level 13

This </= 20 or >/=43 does not cover the above output. Are you looking for >=20 or <=43?

burtsm
Calcite | Level 5

If any 'gestage' is less than 20 or greater than 43 (or missing), I want to delete the entire group.

I didn't have the 'greater than 43' in my example, but 'id'=4 has 'gestage'=19, so I want to delete all observations where 'id'=4.

slchen
Lapis Lazuli | Level 10

proc sql;

    create table want as

  select * from test where id not in (select distinct id  from test  where gestage not between 20 and 43) ;

  quit;

stat_sas
Ammonite | Level 13

Another way.

proc sort data=test out=have;

by id gestage;

run;

data want(drop=flag);

set have;

by id;

if first.id then flag=0;

if gestage=. or gestage<=20 or gestage>=43 then flag+1;

if flag=0;

run;

Ksharp
Super User

data test;
input id gestage;
datalines;
1 36
1 38
2 .
2 41
3 30
3 .
3 32
4 19
4 41
5 35
5 36
5 36
;
run;
proc sql;
 create table want as
  select *
   from test
    group by id
      having nmiss(gestage)=0 and min(gestage) ge 20 and max(gestage) le 43;
quit;

Xia Keshan

Loko
Barite | Level 11

Hello,

data have;
input id gestage;
datalines;
1 36
1 38
2 .
2 41
3 30
3 .
3 32
4 65
4 41
5 35
5 36
5 36
;
run;

proc sort data=have;
by id gestage;
run;

data want;
merge have (in=h1)
have(rename=(gestage=gestage_drop)
where=(missing(gestage_drop) or gestage_drop lt 20 or gestage_drop gt 43) in=h2);

by id;
if h1 and not h2;

drop gestage_drop;

run;

burtsm
Calcite | Level 5

Thank you for the helpful replies. As a novice SAS user, I learn so much from these forums!

burtsm
Calcite | Level 5

I'm having trouble applying the suggested code to another variable in my data set. I want to delete any group (by ID) as long as one observation has an implausible value for FIRSTBIRTH. That is, delete any group where FIRSTBIRTH=1 and BIRTHNUM ne 1 for any observation. This represents inaccurately reported data: a birth is noted as being a woman's first birth, but it's not consistent with the actual order of her births.

data have;

input id firstbirth birthnum;

datalines

1 1 1

1 1 2

2 1 1

2 0 2

2 0 3

3 1 1

3 1 2

3 1 3

4 1 1

4 0 2

4 0 3;

run;

This is what I want:

data want;

input id firstbirth birthnum;

datalines;

2 1 1

2 0 2

2 0 3

4 1 1

4 0 2

4 0 3;

run;

I used the following code:

proc sort data=have;

by id birthnum;

run;

data want (drop=flag);

set have;

by id;

if first.id then flag=0;

if FIRSTBIR=1 and BIRTHNUM ne 1 then flag+1;

if flag=0;

run;

What I get as output:

data want;

input id firstbirth birthnum;

datalines

1 1 1

2 1 1

2 0 2

2 0 3

3 1 1

4 1 1

4 0 2

4 0 3;

run;

When I run the code, only observations where FIRSTBIR=1 and BIRTHNUM ne 1 are deleted; not the entire group by ID where this occurs. Can someone please help me? Thank you.

Jagadishkatam
Amethyst | Level 16

Please try

proc sort data=have;

by id descending birthnum firstbirth;

run;

data want;

set have;

by  id descending birthnum firstbirth ;

retain firstbirth2;

if first.id then firstbirth2=firstbirth;

if firstbirth2=0;

run;

proc sort data=want;

by  id  birthnum firstbirth;

run;

Thanks,

Jag

Thanks,
Jag
Astounding
PROC Star

The SQL solutions might be the easiest way.  This would be the right approach for a DATA step:

data want;

  wanted='Y';

  do until (last.id);

     set have;

     by id;

     if firstbirth=1 and birthnum ne 1 then wanted='N';

  end;

  do until (last.id);

     set have;

     by id;

     if wanted='Y' then output;

  end;

  drop wanted;

run;

burtsm
Calcite | Level 5

Thank you for the replies! @Astounding the approach you posted worked great.

Patrick
Opal | Level 21

@burtsm 

An alternative approach is to use a hash table lookup.

You first load the hash table with the id's matching the condition you don't want. You then delete all records who have a match with the hash table over id.

data have;
  input id firstbirth birthnum;
  datalines;
1 1 1
1 1 2
2 1 1
2 0 2
2 0 3
3 1 1
3 1 2
3 1 3
4 1 1
4 0 2
4 0 3
;

data want;
  if _n_=1 then
    do;
      dcl hash h1(dataset:'have(where=(FIRSTBIRTH=1 and BIRTHNUM ne 1))');
      h1.defineKey('id');
      h1.defineDone();
    end;
  set have;
  if h1.check()=0 then delete;
run;

proc print data=want;
run;

Using hash tables has the big advantage that your source data doesn't need to be sorted.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 15 replies
  • 14300 views
  • 10 likes
  • 8 in conversation