Condense and Simplify program

JaneNYC · Posted 10-04-2012 09:05 AM

Hi all. I’m reasonably new to base SAS and SQL and I am trying to learn how to simplify the programs I write. I would like to try and condense the program below down into fewer steps with both a proc SQL and base SAS version.

Basically the program below identifies cases which contain individuals that are 21 and older in the case. I am looking to delete from the final data set all the cases which contain only one case and that age of that person in greater than 21 BUT if the case contains more than one individual and all the individuals’ age is greater than or equal 21 that case is deleted also.

This is what I’m trying to do:

case_no	age
S6034840	21	Delete Case
S5439846	22	Delete Case
S5324014	17	Keep Case
S5324014	21
S5324014	22
S5324014	13
S5286132	21	Delete Case
S5270236	22	Delete Case
S5268581	18	Keep Case
S5268581	14
S5268581	14
S5232142	17	Keep Case
S5232142	13	Keep Case
S5171614	21	Delete Case
S5034350	18	Keep Case
S5028771	17	Keep Case
S5028771	18	Keep Case
S5023333	21	Delete Case
S5023333	22	Delete Case

1. This first step brings in DOB (date of birth) into a previously created dataset and then calculates the age at the time on a certain data (DTDUE).

proc sql;

create table want as

select

a.* ,
b.DOB,

%age(DTDUE,dob)as age

from have1 A LEFT JOIN have2 B

on a.id=b.id

order by case_no;

quit;

2. This step just pulls all the observations where the age is greater or equal to 21.

data GE_21;

set agytable;

where age ge 21;run;

3. Then is step goes back to the want dataset and pulls all the cases which have individuals that are 21 or older in them.

proc sql;

create table GE_21_cases as

select

a.*

from want a

where CaseNo in (select CaseNo from GE_21)

order by CaseNo, age desc;

quit;

4. This step deletes cases where there is only one person in the case and that person’s age is greater than or equal two 21.

data GE_21_cases;

set Ge_21_cases ;

by caseno;

if first.caseno;

if last.caseno;

run;

5.This step goes back the original data set and deletes the cases.

Proc sql;

create table want as

select *

from want a

where CaseNo not in (select CaseNo from GE_21_cases);

quit;

Again, I would like to try and condense the program below down into fewer steps with both a proc SQL and base SAS version. And if there is anything I’m missing your advice is appreciated with that also.

Thanks!!

data_null__ · Posted 10-04-2012 09:24 AM

To me it sounds like your criteria is "IF MINAGE LT 21 THEN KEEP CASE"

proc summary data=case nway; 
   class case_no;
   var age;
   output out=minage(where=(min lt 21) index=(case_no))
      n=n min=min; 
   run; 
data keepcase;
   set case;
   set minage(keep=case_no) key=case_no/unique; 
   if _error_ then do; 
      _error_ = 0; 
      delete; 
      end; 
   run;

JaneNYC · Posted 10-04-2012 09:43 AM

Wow thanks so much Data_Null_. I am not familiar with some of the staments in your code (still learning base sas) like nway,index,key,unique and what is happeneing in this step:

if _error_ then do;
_error_ = 0;

Your feedback is greatly appreciated. Again, still learning.

Thanks!!

data_null__ · Posted 10-04-2012 10:58 AM

Rather than have me try to explain each of those statements and options, SAS has online help that can answer all your questions much better and more completely. There is no shame in reading the documentation, I keep SUPPORT.SAS.COM open to the help pages everyday and I've been learning SAS for 33 years.

INDEX and KEY are linked so you might start with the section on indexes.

It should be noted that the SQL suggest by Jeroen is a much simplier version of my proc summary/data step.

Jeroen · Posted 10-04-2012 10:28 AM

To go further on the min age idea, a simple SQL would work as well:

proc sql;

create table want as

select case_no,

age

from case

group by case_no

having min(age) < 21 ;

quit;

MikeZdeb · Posted 10-04-2012 09:43 AM

hi ... I think this works ... looks as if your rule of "delete 1 person in a group with age greater than 21" is really " ...greater than or equal to 21"

data x;

input case_no :$8. age @@;

datalines;

S5023333 21 S5023333 22

S5028771 17 S5028771 18

S5034350 18 S5171614 21

S5232142 13 S5232142 17

S5268581 14 S5268581 14

S5268581 18 S5270236 22

S5286132 21 S5324014 13

S5324014 17 S5324014 21

S5324014 22 S5439846 22

S6034840 21

;

proc sql;

create data want as

select * from x

group by case_no

having (count(*) eq 1 and age lt 21) or (count(*) gt 1 and min(age) lt 21);

quit;

case_no age

S5028771 17

S5028771 18

S5034350 18

S5232142 13

S5232142 17

S5268581 18

S5268581 14

S5324014 17

S5324014 13

S5324014 22

S5324014 21

Haikuo · Posted 10-04-2012 10:23 AM

Mike,

The having statement can also be like:

having(sum(age<21)>0);

Regards,

Haikuo

Scott_A_Miller · Posted 10-04-2012 12:21 PM

It's a bit difficult to answer your request because you provide a table of sample data, but after reviewing your code, it's clear that the actual source data looks much different. I understand the actual data is likely to be sensitive, and contain a bunch of variables outside the scope of the question, but providing a simplified version of this data would make it easier to provide good ansewrs.

I took a bit of a stab at recreating what I'm guessing your underlying data looks like. I figured theres probably 3 datasets: a dataset of cases (case_info), a dataset of people (people_info), and a dataset to contain the many-to-many relationship between those two datasets (case_to_people).

I discovered this cool %age macro on the SAS site -- perhaps that is what you are using.

So with the above data and the macro, the solution is quite simple -- it can be done in a single PROC SQL quite easily. It could also be done with one or more datasteps without much trouble as well, especially if the data is small enough be quickly joined with a couple of sort/sort/merges.For small amounts of data, I think that things like index lookups and hash tables are more trouble than they are worth.

So take a look at this and see if this helps.

%macro age(date,birth);
%* this macro obtained from "Calculating Age with Only One Line of Code" by William Kreuter *;
%* http://support.sas.com/kb/24/808.html *;
floor ((intck('month',&birth,&date)
- (day(&date) < day(&birth))) / 12)
%mend age;

data case_info;
input @1 case_no $8. @11 dtdue date9.;
format dtdue date9.;
datalines;
S6034840 05mar2012
S5439846 15mar2012
S5324014 20mar2012
;
run;

data case_to_people;
input @1 case_no $8. @11 id 5.;
datalines;
S6034840 00001
S6034840 00006
S5439846 00002
S5324014 00003
S5324014 00004
S5324014 00005
;
run;

data people_info;
input @1 id 5. @8 DOB date9.;
format DOB date9.;
datalines;
00001 15jun1989
00002 13jun1997
00003 17jun1988
00004 21jun1990
00005 22jun1993
00006 19jun1988
;
run;

* This dataset is for purposes of human-review of the data / verifying the solution. *;
proc sql;
   create table cases_and_ages as
   select a.*, c.*, %age (a.dtdue, c.DOB) as age
   from case_info a
        inner join case_to_people b on a.case_no = b.case_no
inner join people_info c on b.id = c.id
   order by a.case_no, c.DOB
   ;
quit;

* This is the actual solution *;
proc sql;
   create table cases_with_under_21 as
   select distinct a.case_no
   from case_info a
        inner join case_to_people b on a.case_no = b.case_no
inner join people_info c on b.id = c.id
   where %age (a.dtdue, c.DOB) < 21
   order by a.case_no
   ;
quit;

Ksharp · Posted 10-05-2012 12:34 AM

Another version is DOW.

data x;
input case_no :$8. age @@;
datalines;
S5023333   21  S5023333   22
S5028771   17  S5028771   18
S5034350   18  S5171614   21
S5232142   13  S5232142   17
S5268581   14  S5268581   14
S5268581   18  S5270236   22
S5286132   21  S5324014   13
S5324014   17  S5324014   21
S5324014   22  S5439846   22
S6034840   21 
;
run;
data want;
 do until(last.case_no);
  set x ;
  by case_no;
  if age lt 21 then found=1;
 end;
 do until(last.case_no);
  set x ;
  by case_no;
  if found then output;
 end;
run;

Ksharp

Condense and Simplify program

Re: Condense and Simplify program

Re: Condense and Simplify program

Re: Condense and Simplify program

Re: Condense and Simplify program

Re: Condense and Simplify program

Re: Condense and Simplify program

Re: Condense and Simplify program

Re: Condense and Simplify program

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away