Hello. I am very new to SAS (a student) and one of the problems I have been assigned has me finding the age difference between siblings using the file schoolsurvey.sas7bdat.
The variables are Family_id, Student_id, School, Grade, and DOB.
For the first part of the problem, I imported the data to a dataset called schoolsurvey.
For the second part of the problem, I created another dataset from schoolsurvey that contains only one observation for each 6th grader in the schoolsurvey set. They are both fine.
The third part of the problem asks me to "Combine the data set from part b) with the original data, and create a variable that is the age difference in years between the sixth grader and the sibling."
I know how to combine the data sets
data both;
set schoolsurvey sixthgraders;
run;
, but I do not know where to begin with finding the age difference.
I would assume, in English, that the logic would be "if student a has the same Family_id as student b, then take the difference of the years of their DOBs and that will be the AgeDifference." But I don't know how to put that into code that doesn't use proc sql; I only want to create a variable out of this.
Thank you.
Thank you for the reply! The siblings are indicated by family_id--if they share the same family id, then they are siblings, and yes, sometimes there are more than one sibling to a sixth grader.
I'm not particularly apt at creating examples
The data is simply imported from the file schoolsurvey.sas7bdat; not data that I created or obtained on my own
it has the variables family_id, student_id, school, grade, and DOB
for example, observation one has these values (for each variable respectively, and I am separating them with commas for this post so they are easier to read):
90021, 103699, (the school is blank), 4th, 07/24/2004
Using this code:
data both;
set schoolsurvey sixthgraders;
run;
proc print data=both;
title 'Data Set Appended';
run;
I got this output:
Data Set Appended
Obs Family_id Student_ID School Grade DOB12345
90021 | 103699 | 4th | 07/24/2004 | |
90021 | 127945 | RG | 6th | 01/26/2002 |
90021 | 149229 | 2nd | 10/28/2005 | |
90053 | 109831 | RC | 6th | 08/27/2002 |
90053 | 122779 | 5th | 08/28/2003 |
Using this code:
data combined;
merge schoolsurvey sixthgraders;
by Family_id;
run;
proc print data=combined;
title 'Data Set Merged';
run;
I got this output:
Data Set Merged
Obs Family_id Student_ID School Grade DOB12345
90021 | 127945 | RG | 6th | 01/26/2002 |
90021 | 127945 | RG | 6th | 01/26/2002 |
90021 | 149229 | 2nd | 10/28/2005 | |
90053 | 109831 | RC | 6th | 08/27/2002 |
90053 | 122779 | 5th | 08/28/2003 |
I'm confused, as merging the data replaces the 4th grade sibling with the family_id of 90021 with a duplicate of the 6th grade sibling.
I do not think we are accounting for twins, triplets, skipped grades, etc. I believe it is as simple as seeing if the family_ID is the same in the combined data set for obs 1 and 2 and taking the difference between year(DOB) for observation 1 vs observation 2, and then looping that. I just don't know how to put that into SAS logic, and was hoping someone could provide me with a bare-bones example of how I would go about approaching this.
I'm uncertain as to what you mean by renaming the variables before the merge, as well.
Thank you, and sorry for my misunderstandings.
If you merge data sets with the variable names the same, they overwrite each other instead of adding on to the column. You want to add a new column with the siblings birthdate with a different name.
If you haven't take the first sas training course, it's probably worth the time.
Looking at the data you've posted there can be more than 2 siblings in a family.
Using below sample:
90021 | 103699 | 4th | 07/24/2004 | |
90021 | 127945 | RG | 6th | 01/26/2002 |
90021 | 149229 | 2nd | 10/28/2005 |
How should your desired result look like?
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.