BookmarkSubscribeRSS Feed
skittles0810
Calcite | Level 5

Hello. I am very new to SAS (a student) and one of the problems I have been assigned has me finding the age difference between siblings using the file schoolsurvey.sas7bdat.

 

The variables are Family_id, Student_id, School, Grade, and DOB.

 

For the first part of the problem, I imported the data to a dataset called schoolsurvey.

For the second part of the problem, I created another dataset from schoolsurvey that contains only one observation for each 6th grader in the schoolsurvey set. They are both fine.

 

The third part of the problem asks me to "Combine the data set from part b) with the original data, and create a variable that is the age difference in years between the sixth grader and the sibling."

 

I know how to combine the data sets

data both;
set schoolsurvey sixthgraders;
run;

, but I do not know where to begin with finding the age difference.

I would assume, in English, that the logic would be "if student a has the same Family_id as student b, then take the difference of the years of their DOBs and that will be the AgeDifference." But I don't know how to put that into code that doesn't use proc sql; I only want to create a variable out of this.

 

Thank you.

6 REPLIES 6
Reeza
Super User
How do you know who is a sibling for each kid?
Can you show an example of how your data is structured - doesn't need to be real, fake is fine. You may also what to consider a MERGE instead of an APPEND (set) as you'll then have the data on the same row which is much easier to work with. Also, what happens if someone has more than one sibling? I have 5 running around somewhere in the world.
skittles0810
Calcite | Level 5

Thank you for the reply! The siblings are indicated by family_id--if they share the same family id, then they are siblings, and yes, sometimes there are more than one sibling to a sixth grader.

 

I'm not particularly apt at creating examples

The data is simply imported from the file schoolsurvey.sas7bdat; not data that I created or obtained on my own

 

it has the variables family_id, student_id, school, grade, and DOB

for example, observation one has these values (for each variable respectively, and I am separating them with commas for this post so they are easier to read):

90021, 103699, (the school is blank), 4th, 07/24/2004

Reeza
Super User
Make some fake data in Excel and post that here then.
Without it's very hard to help. I can give you approaches and guides if that's what you're looking for though.

1. What about twins? triplets? Kids who skipped grades? I was in the same math class as my older sister because I skipped a grade for math.
2. I think you need a merge, how familiar are you with merges?

You're doing match merging, see the examples here.
https://documentation.sas.com/?docsetId=lrcon&docsetTarget=n1tgk0uanvisvon1r26lc036k0w7.htm&docsetVe...

You'll need to rename the variables before the merge and then you can do the differences.

This is one way, there are other ways. If you're taking a course, there should likely be a way with the methods you've been taught recently.
skittles0810
Calcite | Level 5

Using this code:

 

data both;
set schoolsurvey sixthgraders;
run;
proc print data=both;
title 'Data Set Appended';
run;

 

I got this output:

Data Set Appended

 Obs Family_id Student_ID School Grade DOB12345

90021103699 4th07/24/2004
90021127945RG6th01/26/2002
90021149229 2nd10/28/2005
90053109831RC6th08/27/2002
90053122779 5th08/28/2003

 

Using this code:

data combined;
merge schoolsurvey sixthgraders;
by Family_id;
run;
proc print data=combined;
title 'Data Set Merged';
run;

 

I got this output:

Data Set Merged

 Obs Family_id Student_ID School Grade DOB12345

90021127945RG6th01/26/2002
90021127945RG6th01/26/2002
90021149229 2nd10/28/2005
90053109831RC6th08/27/2002
90053122779 5th

08/28/2003

 

I'm confused, as merging the data replaces the 4th grade sibling with the family_id of 90021 with a duplicate of the 6th grade sibling.

 

I do not think we are accounting for twins, triplets, skipped grades, etc. I believe it is as simple as seeing if the family_ID is the same in the combined data set for obs 1 and 2 and taking the difference between year(DOB) for observation 1 vs observation 2, and then looping that. I just don't know how to put that into SAS logic, and was hoping someone could provide me with a bare-bones example of how I would go about approaching this.

 

I'm uncertain as to what you mean by renaming the variables before the merge, as well.

 

Thank you, and sorry for my misunderstandings.

Reeza
Super User

If you merge data sets with the variable names the same, they overwrite each other instead of adding on to the column. You want to add a new column with the siblings birthdate with a different name. 


If you haven't take the first sas training course, it's probably worth the time.

Patrick
Opal | Level 21

Looking at the data you've posted there can be more than 2 siblings in a family.

 

Using below sample:

90021 103699   4th 07/24/2004
90021 127945 RG 6th 01/26/2002
90021 149229   2nd 10/28/2005

 

How should your desired result look like?

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1508 views
  • 0 likes
  • 3 in conversation