BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
desireatem
Pyrite | Level 9

Hello -,

 

I have this data:

birthdate     Age

6/22/99       18.9

7/6/99          18.9

.                    .

.                    .

.                      .   

9/15/05       12.6      

 

run;

 

This are my question.

 

I want to create data sets where the different in age from the youngest and oldest is not more that 3.5 years using the birthdate. You can use the first birthday as first date

 

Thank you,

DA

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

Here is a way to find the largest set of birthdates with a span <= 6 months

 

data have;
input BirthDate :mmddyy. Age;
id = _n_; /* You must have some kind of person id */
format birthdate yymmdd10.;
datalines; 
 6/22/99 18.9 
 7/6/99 18.8 
 6/30/99 18.8 
 7/8/99 18.8 
 7/8/99 18.8 
 6/21/99 18.9 
 7/15/99 18.8 
 7/12/99 18.8 
 4/25/99 19.0 
 8/6/99 18.7 
 7/9/99 18.8 
 8/20/99 18.7 
 7/22/99 18.8 
 3/2/99 19.2 
 7/11/99 18.8 
 8/2/99 18.7 
 7/9/99 18.8 
 8/22/99 18.7 
 6/16/99 18.9 
 7/29/99 18.8 
 8/17/99 18.7 
 8/17/99 18.7 
 1/27/99 19.3 
 5/29/99 18.9 
 7/16/99 18.8 
 8/23/99 18.7 
 9/2/99 18.7 
 2/10/99 19.2 
 8/7/99 18.7 
 1/19/99 19.3 
 9/24/99 18.6 
 7/10/99 18.8 
 5/31/99 18.9 
 9/25/99 18.6 
 6/3/99 18.9 
 5/20/99 19.0 
 6/26/99 18.9 
 8/28/99 18.7 
 9/27/99 18.6 
 9/9/99 18.6 
 7/18/99 18.8 
 8/6/99 18.7 
 10/17/99 18.5 
 4/22/99 19.0 
 1/29/99 19.3 
 10/15/99 18.5 
 10/18/99 18.5 
 7/15/99 18.8 
 10/21/99 18.5 
 7/29/99 18.8 
 8/20/99 18.7 
 8/20/99 18.7 
 5/25/99 18.9 
 9/15/99 18.6 
 8/14/99 18.7 
 9/28/99 18.6 
 10/28/99 18.5 
 5/9/99 19.0 
 10/24/99 18.5 
 8/11/99 18.7 
 5/4/99 19.0 
 10/2/99 18.6 
 11/26/99 18.4 
 11/9/99 18.5 
 4/17/99 19.0 
;

proc sql;
create table groups as
select
    a.id, 
    a.birthdate as startDate, 
    intnx("month",a.birthdate,6,"same") as endDate,
    count(*) as grpSize
from
    have as a inner join
    have as b 
        on b.birthdate between a.birthdate and intnx("month",a.birthdate,6,"same")
group by a.id, startDate, endDate
order by grpSize desc;
quit;

data want;
if _n_ = 1 then set groups;
set have;
if startDate <= birthdate <= endDate;
keep id birthdate age;
run; 

I don't understand your second request.

 

PG

View solution in original post

11 REPLIES 11
PeterClemmensen
Tourmaline | Level 20

I don't get this. What if the birth dates are spread out so that there are like 20 years from the first to the last one? Then two data sets will not cut it if there are not supposed to be more than 3,5 years difference between them?

 

Please be more specific and post some sample data and what you want the desired outcome to look like.

desireatem
Pyrite | Level 9

Below is the birthdate and  age . I want to create a data such that the difference in  birthday between the youngest and oldest person in that data should not be more than 3.5 years. This is just a partial data, the entire data has age difference between the youngest and oldest at 18 years

 

Birth Date Age
 6/22/99 18.9
 7/6/99 18.8
 6/30/99 18.8
 7/8/99 18.8
 7/8/99 18.8
 6/21/99 18.9
 7/15/99 18.8
 7/12/99 18.8
 4/25/99 19.0
 8/6/99 18.7
 7/9/99 18.8
 8/20/99 18.7
 7/22/99 18.8
 3/2/99 19.2
 7/11/99 18.8
 8/2/99 18.7
 7/9/99 18.8
 8/22/99 18.7
 6/16/99 18.9
 7/29/99 18.8
 8/17/99 18.7
 8/17/99 18.7
 1/27/99 19.3
 5/29/99 18.9
 7/16/99 18.8
 8/23/99 18.7
 9/2/99 18.7
 2/10/99 19.2
 8/7/99 18.7
 1/19/99 19.3
 9/24/99 18.6
 7/10/99 18.8
 5/31/99 18.9
 9/25/99 18.6
 6/3/99 18.9
 5/20/99 19.0
 6/26/99 18.9
 8/28/99 18.7
 9/27/99 18.6
 9/9/99 18.6
 7/18/99 18.8
 8/6/99 18.7
 10/17/99 18.5
 4/22/99 19.0
 1/29/99 19.3
 10/15/99 18.5
 10/18/99 18.5
 7/15/99 18.8
 10/21/99 18.5
 7/29/99 18.8
 8/20/99 18.7
 8/20/99 18.7
 5/25/99 18.9
 9/15/99 18.6
 8/14/99 18.7
 9/28/99 18.6
 10/28/99 18.5
 5/9/99 19.0
 10/24/99 18.5
 8/11/99 18.7
 5/4/99 19.0
 10/2/99 18.6
 11/26/99 18.4
 11/9/99 18.5
 4/17/99 19.0

 

ballardw
Super User

What is the actual smallest value of age in your data? The largest?

Are all of you values rounded to one decimal place?

 

Generally almost any process splitting data on values of a single variable are likely to be more difficult.

Suppose your data has the minimum age of 13.4 this time. Then the "first" data set, which you have not given any naming rule for would contain ages of 13.4 to 16.9.

Now suppose you are repeating this process with another data set later where the youngest age is 9.8. Then the "first" data set would have ages 9.8 to 13.3. Which has nothing in common with the first data set from the first process. So every bit of logic would have to change to arrange results differently, select the "correct" data set for any specific task.

 

What is the actual purpose behind splitting the data? Why is 3.5 critical?

desireatem
Pyrite | Level 9

Let make it  easy, let us not split that data. Let just create a data set where the difference between the oldest person and the youngest person is not more than 3.5 years

PGStats
Opal | Level 21

Here is a way to find the largest set of birthdates with a span <= 6 months

 

data have;
input BirthDate :mmddyy. Age;
id = _n_; /* You must have some kind of person id */
format birthdate yymmdd10.;
datalines; 
 6/22/99 18.9 
 7/6/99 18.8 
 6/30/99 18.8 
 7/8/99 18.8 
 7/8/99 18.8 
 6/21/99 18.9 
 7/15/99 18.8 
 7/12/99 18.8 
 4/25/99 19.0 
 8/6/99 18.7 
 7/9/99 18.8 
 8/20/99 18.7 
 7/22/99 18.8 
 3/2/99 19.2 
 7/11/99 18.8 
 8/2/99 18.7 
 7/9/99 18.8 
 8/22/99 18.7 
 6/16/99 18.9 
 7/29/99 18.8 
 8/17/99 18.7 
 8/17/99 18.7 
 1/27/99 19.3 
 5/29/99 18.9 
 7/16/99 18.8 
 8/23/99 18.7 
 9/2/99 18.7 
 2/10/99 19.2 
 8/7/99 18.7 
 1/19/99 19.3 
 9/24/99 18.6 
 7/10/99 18.8 
 5/31/99 18.9 
 9/25/99 18.6 
 6/3/99 18.9 
 5/20/99 19.0 
 6/26/99 18.9 
 8/28/99 18.7 
 9/27/99 18.6 
 9/9/99 18.6 
 7/18/99 18.8 
 8/6/99 18.7 
 10/17/99 18.5 
 4/22/99 19.0 
 1/29/99 19.3 
 10/15/99 18.5 
 10/18/99 18.5 
 7/15/99 18.8 
 10/21/99 18.5 
 7/29/99 18.8 
 8/20/99 18.7 
 8/20/99 18.7 
 5/25/99 18.9 
 9/15/99 18.6 
 8/14/99 18.7 
 9/28/99 18.6 
 10/28/99 18.5 
 5/9/99 19.0 
 10/24/99 18.5 
 8/11/99 18.7 
 5/4/99 19.0 
 10/2/99 18.6 
 11/26/99 18.4 
 11/9/99 18.5 
 4/17/99 19.0 
;

proc sql;
create table groups as
select
    a.id, 
    a.birthdate as startDate, 
    intnx("month",a.birthdate,6,"same") as endDate,
    count(*) as grpSize
from
    have as a inner join
    have as b 
        on b.birthdate between a.birthdate and intnx("month",a.birthdate,6,"same")
group by a.id, startDate, endDate
order by grpSize desc;
quit;

data want;
if _n_ = 1 then set groups;
set have;
if startDate <= birthdate <= endDate;
keep id birthdate age;
run; 

I don't understand your second request.

 

PG
PGStats
Opal | Level 21

Here is a faster way to do the same, without SQL:

 

proc sort data=have out=sortedDates(keep=birthDate); by birthdate; run;

data grps;
set sortedDates(rename=birthDate=startDate);
endDate = intnx("month",startDate,6,"same");
do while(birthDate <= endDate and not done);
    set sortedDates end=done;
    last + 1;
    end;
grpSize = last - _n_ + 1;
output;
if done then stop;
keep startDate endDate grpSize;
run;

proc means data=grps noprint;
output out=maxGrp maxid(grpSize(startDate endDate))=startDate endDate;
run;

data want;
if _n_ = 1 then set maxGrp;
set have;
if startDate <= birthdate <= endDate;
keep id birthdate age;
run; 
PG
desireatem
Pyrite | Level 9

I try to use the faster approach with sql but had issue;

 

proc sort data=DFW out=sortedDates; by birthdate; run;

data grps;
set sortedDates(rename=birthDate=startDate);
endDate = intnx("month",startDate,42,"same");
do while(birthDate <= endDate and not done);
set sortedDates end=done;
last + 1;
end;
grpSize = last - _n_ + 1;
output;
if done then stop;
*keep startDate endDate grpSize;
run;
data grps (rename=(PtMRN=id));
run;

proc means data=grps noprint;
output out=maxGrp maxid(grpSize(startDate endDate))=startDate endDate;
run;

data want;
if _n_ = 1 then set maxGrp;
set have;
if startDate <= birthdate <= endDate;
*keep PtMRN birthdate age;
run;

 

129 data grps (rename=(PtMRN=id));
130 run;

WARNING: The variable PtMRN in the DROP, KEEP, or RENAME list has never been referenced.
NOTE: The data set WORK.GRPS has 1 observations and 0 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds


131
132 proc means data=grps noprint;
133 output out=maxGrp maxid(grpSize(startDate endDate))=startDate endDate;
ERROR: Variable GRPSIZE not found.
ERROR: Variable STARTDATE not found.
ERROR: Variable ENDDATE not found.
134 run;

NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.MAXGRP may be incomplete. When this step was stopped there were 0
observations and 0 variables.
WARNING: Data set WORK.MAXGRP was not replaced because this step was stopped.
NOTE: PROCEDURE MEANS used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds

 

PGStats
Opal | Level 21

You are not missing by much. Try it this way:

 

proc sort data=DFW out=sortedDates(keep=birthDate); by birthdate; run;

data grps;
set sortedDates(rename=birthDate=startDate);
endDate = intnx("month",startDate,42,"same");
do while(birthDate <= endDate and not done);
 set sortedDates end=done;
 last + 1;
 end;
grpSize = last - _n_ + 1;
output;
if done then stop;
keep startDate endDate grpSize;
run;

proc means data=grps noprint;
output out=maxGrp maxid(grpSize(startDate endDate))=startDate endDate;
run;

data want;
if _n_ = 1 then set maxGrp;
set DFW;
if startDate <= birthdate <= endDate;
keep PtMRN birthdate age;
run;
PG
desireatem
Pyrite | Level 9

Thank you, works well!!!!

desireatem
Pyrite | Level 9

So if I want the largest set of birthday with 3.5 years, i will replace 6 by 42 in this statements:

 

intnx("month",a.birthdate,6,"same") as endDate

 

and

 

on b.birthdate between a.birthdate and intnx("month",a.birthdate,6,"same")

???? 

PGStats
Opal | Level 21

Yes. I used 6 months as an example, since your sample data didn't cover 3.5 years.

PG

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 1139 views
  • 2 likes
  • 4 in conversation