Hi all. I’m reasonably new to base SAS and SQL and I am trying to learn how to simplify the programs I write. I would like to try and condense the program below down into fewer steps with both a proc SQL and base SAS version.
Basically the program below identifies cases which contain individuals that are 21 and older in the case. I am looking to delete from the final data set all the cases which contain only one case and that age of that person in greater than 21 BUT if the case contains more than one individual and all the individuals’ age is greater than or equal 21 that case is deleted also.
This is what I’m trying to do:
case_no | age | |
S6034840 | 21 | Delete Case |
S5439846 | 22 | Delete Case |
S5324014 | 17 | Keep Case |
S5324014 | 21 | |
S5324014 | 22 | |
S5324014 | 13 | |
S5286132 | 21 | Delete Case |
S5270236 | 22 | Delete Case |
S5268581 | 18 | Keep Case |
S5268581 | 14 | |
S5268581 | 14 | |
S5232142 | 17 | Keep Case |
S5232142 | 13 | |
S5171614 | 21 | Delete Case |
S5034350 | 18 | Keep Case |
S5028771 | 17 | Keep Case |
S5028771 | 18 | |
S5023333 | 21 | Delete Case |
S5023333 | 22 |
1. This first step brings in DOB (date of birth) into a previously created dataset and then calculates the age at the time on a certain data (DTDUE).
proc sql;
create table want as
select
%age(DTDUE,dob)as age
from have1 A LEFT JOIN have2 B
on a.id=b.id
order by case_no;
quit;
2. This step just pulls all the observations where the age is greater or equal to 21.
data GE_21;
set agytable;
where age ge 21;run;
3. Then is step goes back to the want dataset and pulls all the cases which have individuals that are 21 or older in them.
proc sql;
create table GE_21_cases as
select
from want a
where CaseNo in (select CaseNo from GE_21)
order by CaseNo, age desc;
quit;
4. This step deletes cases where there is only one person in the case and that person’s age is greater than or equal two 21.
data GE_21_cases;
set Ge_21_cases ;
by caseno;
if first.caseno;
if last.caseno;
run;
5.This step goes back the original data set and deletes the cases.
Proc sql;
create table want as
select *
from want a
where CaseNo not in (select CaseNo from GE_21_cases);
quit;
Again, I would like to try and condense the program below down into fewer steps with both a proc SQL and base SAS version. And if there is anything I’m missing your advice is appreciated with that also.
Thanks!!
To me it sounds like your criteria is "IF MINAGE LT 21 THEN KEEP CASE"
Wow thanks so much Data_Null_. I am not familiar with some of the staments in your code (still learning base sas) like nway,index,key,unique and what is happeneing in this step:
if _error_ then do;
_error_ = 0;
Your feedback is greatly appreciated. Again, still learning.
Thanks!!
Rather than have me try to explain each of those statements and options, SAS has online help that can answer all your questions much better and more completely. There is no shame in reading the documentation, I keep SUPPORT.SAS.COM open to the help pages everyday and I've been learning SAS for 33 years.
INDEX and KEY are linked so you might start with the section on indexes.
It should be noted that the SQL suggest by Jeroen is a much simplier version of my proc summary/data step.
To go further on the min age idea, a simple SQL would work as well:
proc sql;
create table want as
select case_no,
age
from case
group by case_no
having min(age) < 21 ;
quit;
hi ... I think this works ... looks as if your rule of "delete 1 person in a group with age greater than 21" is really " ...greater than or equal to 21"
data x;
input case_no :$8. age @@;
datalines;
S5023333 21 S5023333 22
S5028771 17 S5028771 18
S5034350 18 S5171614 21
S5232142 13 S5232142 17
S5268581 14 S5268581 14
S5268581 18 S5270236 22
S5286132 21 S5324014 13
S5324014 17 S5324014 21
S5324014 22 S5439846 22
S6034840 21
;
proc sql;
create data want as
select * from x
group by case_no
having (count(*) eq 1 and age lt 21) or (count(*) gt 1 and min(age) lt 21);
quit;
case_no age
S5028771 17
S5028771 18
S5034350 18
S5232142 13
S5232142 17
S5268581 18
S5268581 14
S5268581 14
S5324014 17
S5324014 13
S5324014 22
S5324014 21
Mike,
The having statement can also be like:
having(sum(age<21)>0);
Regards,
Haikuo
It's a bit difficult to answer your request because you provide a table of sample data, but after reviewing your code, it's clear that the actual source data looks much different. I understand the actual data is likely to be sensitive, and contain a bunch of variables outside the scope of the question, but providing a simplified version of this data would make it easier to provide good ansewrs.
I took a bit of a stab at recreating what I'm guessing your underlying data looks like. I figured theres probably 3 datasets: a dataset of cases (case_info), a dataset of people (people_info), and a dataset to contain the many-to-many relationship between those two datasets (case_to_people).
I discovered this cool %age macro on the SAS site -- perhaps that is what you are using.
So with the above data and the macro, the solution is quite simple -- it can be done in a single PROC SQL quite easily. It could also be done with one or more datasteps without much trouble as well, especially if the data is small enough be quickly joined with a couple of sort/sort/merges.For small amounts of data, I think that things like index lookups and hash tables are more trouble than they are worth.
So take a look at this and see if this helps.
%macro age(date,birth);
%* this macro obtained from "Calculating Age with Only One Line of Code" by William Kreuter *;
%* http://support.sas.com/kb/24/808.html *;
floor ((intck('month',&birth,&date)
- (day(&date) < day(&birth))) / 12)
%mend age;
data case_info;
input @1 case_no $8. @11 dtdue date9.;
format dtdue date9.;
datalines;
S6034840 05mar2012
S5439846 15mar2012
S5324014 20mar2012
;
run;
data case_to_people;
input @1 case_no $8. @11 id 5.;
datalines;
S6034840 00001
S6034840 00006
S5439846 00002
S5324014 00003
S5324014 00004
S5324014 00005
;
run;
data people_info;
input @1 id 5. @8 DOB date9.;
format DOB date9.;
datalines;
00001 15jun1989
00002 13jun1997
00003 17jun1988
00004 21jun1990
00005 22jun1993
00006 19jun1988
;
run;
* This dataset is for purposes of human-review of the data / verifying the solution. *;
proc sql;
create table cases_and_ages as
select a.*, c.*, %age (a.dtdue, c.DOB) as age
from case_info a
inner join case_to_people b on a.case_no = b.case_no
inner join people_info c on b.id = c.id
order by a.case_no, c.DOB
;
quit;
* This is the actual solution *;
proc sql;
create table cases_with_under_21 as
select distinct a.case_no
from case_info a
inner join case_to_people b on a.case_no = b.case_no
inner join people_info c on b.id = c.id
where %age (a.dtdue, c.DOB) < 21
order by a.case_no
;
quit;
Another version is DOW.
data x; input case_no :$8. age @@; datalines; S5023333 21 S5023333 22 S5028771 17 S5028771 18 S5034350 18 S5171614 21 S5232142 13 S5232142 17 S5268581 14 S5268581 14 S5268581 18 S5270236 22 S5286132 21 S5324014 13 S5324014 17 S5324014 21 S5324014 22 S5439846 22 S6034840 21 ; run; data want; do until(last.case_no); set x ; by case_no; if age lt 21 then found=1; end; do until(last.case_no); set x ; by case_no; if found then output; end; run;
Ksharp
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.