BookmarkSubscribeRSS Feed
JaneNYC
Calcite | Level 5

Hi all. I’m reasonably new to base SAS and SQL and I am trying to learn how to simplify the programs I write. I would like to try and condense the program below down into fewer steps with both a proc SQL and base SAS version. 

Basically the program below identifies cases which contain individuals that are 21 and older in the case. I am looking to delete from the final data set all the cases which contain only one case and that age of that person in greater than 21 BUT if the case contains more than one individual and all the individuals’ age is greater than or equal 21 that case is deleted also.

This is what I’m trying to do:

case_no

age

S6034840

21

Delete Case

S5439846

22

Delete Case

S5324014

17

Keep Case

S5324014

21

S5324014

22

S5324014

13

S5286132

21

Delete Case

S5270236

22

Delete Case

S5268581

18

Keep Case

S5268581

14

S5268581

14

S5232142

17

Keep Case

S5232142

13

S5171614

21

Delete Case

S5034350

18

Keep Case

S5028771

17

Keep Case

S5028771

18

S5023333

21

Delete Case

S5023333

22

1. This first step brings in DOB (date of birth) into a previously created dataset and then calculates the age at the time on a certain data (DTDUE).

proc sql;

create table want  as

select

  1. a.* ,
  2. b.DOB,

%age(DTDUE,dob)as age

from  have1 A LEFT JOIN have2 B

on a.id=b.id

order by case_no;

quit;

2. This step just pulls all the observations where the age is greater or equal to 21.

data GE_21;

set agytable;

where age ge 21;run;

3. Then is step goes back to the want dataset and pulls all the cases which have individuals that are 21 or older in them.

proc sql;

create table GE_21_cases as

select

  1. a.*

from want a

where CaseNo in (select CaseNo from GE_21)

order by CaseNo, age desc;

quit;

4. This step deletes cases where there is only one person in the case and that person’s age is greater than or equal two 21.

data GE_21_cases;

set Ge_21_cases ;

by caseno;

if first.caseno;

if last.caseno;

run;

5.This step goes back the original data set and deletes the cases. 

Proc sql;

create table want as

select *

from want a

where CaseNo not in (select CaseNo from GE_21_cases);

quit;

Again, I would like to try and condense the program below down into fewer steps with both a proc SQL and base SAS version. And if there is anything I’m missing your advice is appreciated with that also.

Thanks!! 

8 REPLIES 8
data_null__
Jade | Level 19

To me it sounds like your criteria is "IF MINAGE LT 21 THEN KEEP CASE"

proc summary data=case nway;
  
class case_no;
   var age;
   output out=minage(where=(min lt 21) index=(case_no))
     
n=n min=min;
  
run;
data keepcase;
   set case;
   set minage(keep=case_no) key=case_no/unique;
  
if _error_ then do;
      _error_ =
0;
     
delete;
     
end;
  
run;
JaneNYC
Calcite | Level 5

Wow thanks so much Data_Null_. I am not familiar with some of the staments in your code (still learning base sas) like nway,index,key,unique and what is happeneing in this step:

if _error_ then do;
_error_ =
0;

Your feedback is greatly appreciated. Again, still learning. Smiley Happy

Thanks!!

data_null__
Jade | Level 19

Rather than have me try to explain each of those statements and options, SAS has online help that can answer all your questions much better and more completely.  There is no shame in reading the documentation, I keep SUPPORT.SAS.COM open to the help pages everyday and I've been learning SAS for 33 years.

INDEX and KEY are linked so you might start with the section on indexes.

It should be noted that the SQL suggest by Jeroen is a much simplier version of my proc summary/data step.

Jeroen
SAS Employee

To go further on the min age idea, a simple SQL would work as well:


proc sql;

     create table want as

          select   case_no,  

                     age

         from     case

         group by case_no

         having   min(age) < 21 ;

quit;

MikeZdeb
Rhodochrosite | Level 12

hi ... I think this works ... looks as if your rule of "delete 1 person in a group with age greater than 21"  is really " ...greater than or equal to 21"

data x;

input case_no :$8. age @@;

datalines;

S5023333   21  S5023333   22

S5028771   17  S5028771   18

S5034350   18  S5171614   21

S5232142   13  S5232142   17

S5268581   14  S5268581   14

S5268581   18  S5270236   22

S5286132   21  S5324014   13

S5324014   17  S5324014   21

S5324014   22  S5439846   22

S6034840   21 

;

proc sql;

create data want as

select * from x

group by case_no

having (count(*) eq 1 and age lt 21) or (count(*) gt 1 and min(age) lt 21);

quit;

case_no        age

S5028771        17

S5028771        18

S5034350        18

S5232142        13

S5232142        17

S5268581        18

S5268581        14

S5268581        14

S5324014        17

S5324014        13

S5324014        22

S5324014        21

Haikuo
Onyx | Level 15

Mike,

The having statement can also be like:

having(sum(age<21)>0);

Regards,

Haikuo


Scott_A_Miller
Calcite | Level 5

It's a bit difficult to answer your request because you provide a table of sample data, but after reviewing your code, it's clear that the actual source data looks much different. I understand the actual data is likely to be sensitive, and contain a bunch of variables outside the scope of the question, but providing a simplified version of this data would make it easier to provide good ansewrs.

I took a bit of a stab at recreating what I'm guessing your underlying data looks like. I figured theres probably 3 datasets: a dataset of cases (case_info), a dataset of people (people_info), and a dataset to contain the many-to-many relationship between those two datasets (case_to_people).

I discovered this cool %age macro on the SAS site -- perhaps that is what you are using.

So with the above data and the macro, the solution is quite simple -- it can be done in a single PROC SQL quite easily. It could also be done with one or more datasteps without much trouble as well, especially if the data is small enough be quickly joined with a couple of sort/sort/merges.For small amounts of data, I think that things like index lookups and hash tables are more trouble than they are worth.

So take a look at this and see if this helps.

%macro age(date,birth);
%*  this macro obtained from "Calculating Age with Only One Line of Code" by William Kreuter  *;
%*  http://support.sas.com/kb/24/808.html                                                     *;
floor ((intck('month',&birth,&date)
- (day(&date) < day(&birth))) / 12)
%mend age;


data case_info;
   input @1 case_no $8.  @11 dtdue date9.;
   format dtdue date9.;
datalines;
S6034840  05mar2012
S5439846  15mar2012
S5324014  20mar2012
;
run;

data case_to_people;
   input @1 case_no $8.  @11 id 5.;
datalines;
S6034840  00001
S6034840  00006
S5439846  00002
S5324014  00003
S5324014  00004
S5324014  00005
;
run;

data people_info;
   input @1 id 5. @8 DOB date9.;
   format DOB date9.;
datalines;
00001  15jun1989
00002  13jun1997
00003  17jun1988
00004  21jun1990
00005  22jun1993
00006  19jun1988
;
run;

*  This dataset is for purposes of human-review of the data / verifying the solution.  *;
proc sql;
   create table cases_and_ages as
   select a.*, c.*, %age (a.dtdue, c.DOB) as age
   from case_info a
        inner join case_to_people b on a.case_no = b.case_no
  inner join people_info c on b.id = c.id
   order by a.case_no, c.DOB
   ;
quit;

*  This is the actual solution  *;
proc sql;
   create table cases_with_under_21 as
   select distinct a.case_no
   from case_info a
        inner join case_to_people b on a.case_no = b.case_no
  inner join people_info c on b.id = c.id
   where %age (a.dtdue, c.DOB) < 21
   order by a.case_no
   ;
quit;

Ksharp
Super User

Another version is DOW.

data x;
input case_no :$8. age @@;
datalines;
S5023333   21  S5023333   22
S5028771   17  S5028771   18
S5034350   18  S5171614   21
S5232142   13  S5232142   17
S5268581   14  S5268581   14
S5268581   18  S5270236   22
S5286132   21  S5324014   13
S5324014   17  S5324014   21
S5324014   22  S5439846   22
S6034840   21 
;
run;
data want;
 do until(last.case_no);
  set x ;
  by case_no;
  if age lt 21 then found=1;
 end;
 do until(last.case_no);
  set x ;
  by case_no;
  if found then output;
 end;
run;

Ksharp

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 1671 views
  • 0 likes
  • 7 in conversation