Hey everyone,
I have a question, and I am stuck in the final part. I would greatly appreciate it If anyone could help me.
The question is..
f. For each sixth grader, add variables to your data set for the age difference between the sixth grader and their youngest sibling, and the age difference between the sixth grader and their oldest sibling.
The original data (Sorted by family_id ) is ....
Obs Family_id Student_ID School Grade DOB123456789101112131415
90021 | 103699 | 4th | 07/23/2009 | |
90021 | 127945 | RG | 6th | 01/25/2007 |
90021 | 149229 | 2nd | 10/27/2010 | |
90053 | 109831 | RC | 6th | 08/26/2007 |
90053 | 122779 | 5th | 08/26/2008 | |
90053 | 124617 | 8th | 05/06/2005 | |
90097 | 145616 | 4th | 06/05/2009 | |
90097 | 164264 | RC | 6th | 12/19/2006 |
90112 | 147688 | 7th | 10/10/2005 | |
90112 | 171989 | 9th | 06/25/2004 | |
90112 | 197925 | RG | 6th | 02/22/2007 |
90142 | 150175 | RC | 6th | 12/07/2006 |
90181 | 115847 | RG | 6th | 03/08/2007 |
90181 | 192157 | 4th | 05/22/2009 | |
90270 | 150724 | RG | 6th | 01/04/2007 |
Then after coding ...
combined data with age difference
Obs Family_id Student_ID School Grade DOB Student_ID6 School6 Grade6 SixthGDOB Age_diff12345
90021 | 103699 | 4th | 07/23/2009 | 127945 | RG | 6th | 01/25/2007 | 2.49041 | |
90021 | 127945 | RG | 6th | 01/25/2007 | 127945 | RG | 6th | 01/25/2007 | 0.00000 |
90021 | 149229 | 2nd | 10/27/2010 | 127945 | RG | 6th | 01/25/2007 | 3.75342 | |
90053 | 109831 | RC | 6th | 08/26/2007 | 109831 | RC | 6th | 08/26/2007 | 0.00000 |
90053 | 122779 | 5th | 08/26/2008 | 109831 | RC | 6th | 08/26/2007 | 1.00000 |
Obs Family_id Student_ID6 School6 Grade6 SixthGDOB Grade Older_sib Younger_sib12345678910111213
90021 | 127945 | RG | 6th | 01/25/2007 | 2nd | . | 2 |
90053 | 109831 | RC | 6th | 08/26/2007 | 5th | 1 | 1 |
90097 | 164264 | RC | 6th | 12/19/2006 | 4th | . | 1 |
90112 | 197925 | RG | 6th | 02/22/2007 | 7th | 2 | . |
90142 | 150175 | RC | 6th | 12/07/2006 | . | . | |
90181 | 115847 | RG | 6th | 03/08/2007 | 4th | . | 1 |
90270 | 150724 | RG | 6th | 01/04/2007 | . | . | |
90285 | 104967 | GV | 6th | 04/20/2007 | 8th | 2 | . |
90437 | 136364 | GV | 6th | 08/10/2007 | . | . | |
90475 | 161166 | RC | 6th | 08/10/2007 | . | . | |
90497 | 182194 | GV | 6th | 07/08/2007 | 10th | 1 | 2 |
90564 | 184266 | GV | 6th | 01/27/2007 | 11th | 1 | 2 |
90570 | 151125 | GV | 6th | 09/05/2007 | 4th | . | 3 |
What I am thinking is that I should sort it by the school and then use the First. Older_sib and last. Older_sib for each school, but I did not know how can I do it, also if it is correct or not!
it's okay to give me ideas
Thank you in advance for your time.
What I am thinking is that I should sort it by the school and then use the First. Older_sib and last. Older_sib for each school, but I did not know how can I do it, also if it is correct or not!
it's okay to give me ideas
Thank you in advance for your time.
Sort by family and dob. Create new retained variables for the ..._id6 variables and populate them for the row where if grade='6th' is TRUE.
Then for the last row per family: if last.family_id then do; <populate the age diff variable - lookup SAS function yrdif with basis age>; output; call missing(<list of all retained variables>); end;
Just find the MIN() and the MAX() dob. Then calculate the age difference.
data have;
input Family_id Student_ID School $ Grade $ DOB :mmddyy.;
format dob date9.;
cards;
90021 103699 . 4th 07/23/2009
90021 127945 RG 6th 01/25/2007
90021 149229 . 2nd 10/27/2010
90053 109831 RC 6th 08/26/2007
90053 122779 . 5th 08/26/2008
90053 124617 . 8th 05/06/2005
90097 145616 . 4th 06/05/2009
90097 164264 RC 6th 12/19/2006
90112 147688 . 7th 10/10/2005
90112 171989 . 9th 06/25/2004
90112 197925 RG 6th 02/22/2007
90142 150175 RC 6th 12/07/2006
90181 115847 RG 6th 03/08/2007
90181 192157 . 4th 05/22/2009
90270 150724 RG 6th 01/04/2007
;
proc sql ;
create table want as
select distinct
a.*
, count(b.student_id) as n_sibs
, min(b.dob) as min_dob format=date9.
, max(b.dob) as max_dob format=date9.
, yrdif(a.dob,calculated min_dob) as min_age_dif
, yrdif(a.dob,calculated max_dob) as max_age_dif
from have a
left join have b
on a.family_id = b.family_id
and a.student_id ne b.student_id
group by a.family_id,a.student_id
;
quit;
If you only want to see the 6th graders just only print those.
proc print;
where grade='6th';
run;
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Select SAS Training centers are offering in-person courses. View upcoming courses for: