BookmarkSubscribeRSS Feed
Prat009
Fluorite | Level 6

Dataset description: 

The local school district wants to survey all sixth grade students and their school-aged siblings. There are three different types of surveys: one for the sixth graders, one for their younger siblings, and one for their older siblings. The SAS data set called SCHOOLSURVEY contains data for all sixth graders in the three middle schools in the district (Rachael Carson, Green Valley, and Redwood Grove). The data set also includes data for all their siblings attending schools in the district, which can be linked back to the sixth grader by Family_ID.

 

Fam_id  Sdt_ID  Sch Grade DOB        DOB6th  Age_Diff 
90021   103699      4th 07/24/2004  01/26/2002  2.49
90021   127945  RG  6th 01/26/2002  01/26/2002  0       
90021   149229      2nd 10/28/2005  01/26/2002  3.75
90053   109831  RC  6th 08/27/2002  08/27/2002  0       
90053   122779      5th 08/28/2003  08/27/2002  1
90053   124617      8th 05/07/2000  08/27/2002  -2.31
90097   145616      4th 06/06/2004  12/20/2001  2.46
90097   164264  RC  6th 12/20/2001  12/20/2001  0       
90112   147688      7th 10/11/2000  02/23/2002  -1.37
90112   171989      9th 06/27/1999  02/23/2002  -2.66
90112   197925  RG  6th 02/23/2002  02/23/2002  0       

Here, Fam_id  = Family_id

          Std_id = Student_id

          Sch = School Code

          DOB = Date of birth for all the grade students

          DOB6th = Date of birth of the 6th grade student for a given Family_id

          Agg_Diff = Age difference in years between 6th grade student and his/her siblings for a given Family_id

 

I want to count the number of older siblings and the number of younger siblings of the sixth grader in the family, and add these variables to the data set. It would be great if anyone can help me with this problem.

 

Output should look like this :

 Fam_id   Sdt_ID Sch  Grade   DOB       DOB6th  Age_Diff  Y_sib  O_sib
    90021   103699      4th 07/24/2004  01/26/2002  2.49
    90021   127945  RG  6th 01/26/2002  01/26/2002  0       2       0
    90021   149229      2nd 10/28/2005  01/26/2002  3.75
    90053   109831  RC  6th 08/27/2002  08/27/2002  0       1       1
    90053   122779      5th 08/28/2003  08/27/2002  1
    90053   124617      8th 05/07/2000  08/27/2002  -2.31
    90097   145616      4th 06/06/2004  12/20/2001  2.46
    90097   164264  RC  6th 12/20/2001  12/20/2001  0       1       0
    90112   147688      7th 10/11/2000  02/23/2002  -1.37
    90112   171989      9th 06/27/1999  02/23/2002  -2.66
    90112   197925  RG  6th 02/23/2002  02/23/2002  0       0       2

 where, Y_sib = number of younger siblings of the sixth grader

             O_sib = number of older siblings of the sixth grader

5 REPLIES 5
Patrick
Opal | Level 21

@Prat009

This sounds very much like an assignment where you probably should give it a go first and then only come back here if you get stuck and have detailed questions.

If you get stuck: Also provide the not yet working code which you've so far developed so we can understand where you are at and can take it from there.

Ramesh_165
Obsidian | Level 7

Try this out.

proc sql;
	create table Output as	
		select *, 
			case 
				when Grade = '6th' then 
					select put(count(*),best.) 
					from Input b /*Your input file*/
					where b.Fam_id = a.Fam_id 
						and input(substr(b.Grade,1,1),best.) < 6
				else ""
			end as Y_sib,
 			case 
				when Grade = '6th' then 
					select put(count(*),best.) 
					from Input b /*Your input file*/
					where b.Fam_id = a.Fam_id 
						and input(substr(b.Grade,1,1),best.) > 6
				else ""
			end as O_sib
		from Input a; /*Your input file*/
quit;
mahesh146
Obsidian | Level 7
Hi,
Please check if this helps.

DATA TEMP;
INPUT Fam_id Sdt_ID Sch $ Grade $ DOB DOB6th Age_Diff;
INFORMAT DOB DOB6th mmddyy8.;
FORMAT DOB DOB6th mmddyy10.;
CARDS;
90021 103699 . 4th 07/24/2004 01/26/2002 2.49
90021 127945 RG 6th 01/26/2002 01/26/2002 0
90021 149229 . 2nd 10/28/2005 01/26/2002 3.75
90053 109831 RC 6th 08/27/2002 08/27/2002 0
90053 122779 . 5th 08/28/2003 08/27/2002 1
90053 124617 . 8th 05/07/2000 08/27/2002 -2.31
90097 145616 . 4th 06/06/2004 12/20/2001 2.46
90097 164264 RC 6th 12/20/2001 12/20/2001 0
90112 147688 . 7th 10/11/2000 02/23/2002 -1.37
90112 171989 . 9th 06/27/1999 02/23/2002 -2.66
90112 197925 RG 6th 02/23/2002 02/23/2002 0
;
RUN;


PROC SQL;
CREATE TABLE TEMP_CAL AS
Select
COALESCE(A.Fam_id,B.Fam_id) as Fam_id, /*SAS Functions are also applicable in PROC SQL queries including normal SQL functions*/

CASE WHEN A.Old_Student_Count=. THEN 0
ELSE A.Old_Student_Count
END as Old_Student_Count,

CASE WHEN B.Young_Student_Count=. THEN 0
ELSE B.Young_Student_Count
END as Young_Student_Count
FROM
(
Select
Fam_id,
Count(Distinct Sdt_ID) as Old_Student_Count
FROM
TEMP
WHERE
Age_Diff<0
GROUP BY Fam_id
) A

FULL JOIN

(
Select
Fam_id,
Count(Distinct Sdt_ID) as Young_Student_Count
FROM
TEMP
WHERE
Age_Diff>0
GROUP BY Fam_id
) B

ON A.Fam_id=B.Fam_id
;
QUIT;


DATA TEMP_FINAL;
IF _n_=1 THEN DO;
IF 0 THEN SET TEMP_CAL;

Declare Hash h1(dataset:'TEMP_CAL');
h1.definekey('Fam_id');
h1.definedata('Young_Student_Count','Old_Student_Count');
h1.definedone();
END;

SET TEMP;

h1.find(key:Fam_id);

IF Grade NE '6th' THEN DO;
Young_Student_Count=.;
Old_Student_Count=.;
END;

RUN;

PROC PRINT DATA=TEMP_FINAL;
TITLE 'Final Table';
RUN;
Prat009
Fluorite | Level 6

Thanks a ton !

I had been trying to come up with a solution from past 4 days but I couldn't. Your code works just fine !

Malk020
Fluorite | Level 6

May I ask you a question, please! 

 

How did you calculate the date of birth for only sixth grader ? 

 

Thank you in advance for your time. 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 1927 views
  • 1 like
  • 5 in conversation