My thought process in creating the code was that both MomID and DadID would both have missing values at the beginning. Thus, if that is true, no additional sort should be necessary. The only purpose of the sort was to put the mothers and fathers at the end of each residence's records.
Thanks for the help. If I could extend this problem one step further,I need to now figure out how to store mom's degree and assign it to all of her children. The individual's degree is represented by the variable DEGREE. The variable HASMOM indicates whether a child has a mother in the survey. Here's what the dataset (called temp1) looks like:
DUID | PID | DUPERSID | MOMID | DEGREE | MOTHER | HASMOM |
1001 | 101 | 1001101 | 1 | 1 | ||
1001 | 102 | 1001102 | 2 | |||
1001 | 103 | 1001103 | 3 | 1 | ||
1001 | 104 | 1001104 | 1001101 | 8 | 1 | |
1001 | 105 | 1001105 | 1001103 | 8 | 1 | |
1002 | 101 | 1002101 | 1 | |||
1003 | 101 | 1003101 | 1 | |||
1003 | 102 | 1003102 | 2 | |||
1003 | 103 | 1003103 | 1003102 | 8 | 1 | |
1003 | 104 | 1003104 | 1003102 | 8 | 1 |
I basically want DUPERSID 1001104 to have a variable MOMDEGREE that equals 1 (which is that child's mom's degree). DUPERSID 1001105 should have MOMDEGREE = 3, DUPERSID 1003103 and 1003104 should both have MOMDEGREE = 2. Using the code below, I was able to successfully assign MOMDEGREE to children in one-mother dwellings, but I run into problems with the multiple mother dwellings.
proc sort data = temp1;
by duid descending mother;
run;
data temp1_temp; set temp1;
by duid;
retain hold_momdegree;
length hold_momdegree $100.;
if first.DUID then do;
hold_momdegree="";
end;
momdegree = 0;
if mother = 1 then hold_momdegree = degree;
if hasmom = 1 then momdegree = hold_momdegree;
run;
I think that you only need a fairly simple datastep merge or sql join. In a datastep, you could probably do it with something like:
data have;
infile cards truncover;
input (DUID PID DUPERSID MomID DadID) ($);
cards;
1001 101 1001101
1001 102 1001102
1001 103 1001103
1001 104 1001104 1001101 1001102
1001 105 1001105 1001103 1001102
1002 101 1002101
1003 101 1003101
1003 102 1003102
1003 103 1003103 1003102 1003101
1003 104 1003104 1003102 1003101
;
proc sort data=have;
by DUID descending MomID;
run;
data want (drop=Hold:);
set have;
by DUID;
retain Hold_MomID Hold_DadID;
length Hold_MomID Hold_DadID $100;
if first.DUID then do;
Hold_MomID="";
Hold_DadID="";
end;
Mother=0;
Father=0;
if missing(MomID) then do;
if index(Hold_MomID,DUPERSID) gt 0 then Mother=1;
end;
else Hold_MomID=catx(" ",Hold_MomID,MomID);
if missing(DadID) then do;
if index(Hold_DadID,DUPERSID) gt 0 then Father=1;
end;
else Hold_DadID=catx(" ",Hold_DadID,DadID);
run;
data mdegrees;
input (DUID PID DUPERSID MOMID DEGREE) ($) MOTHER HASMOM;
cards;
1001 101 1001101 . 1 1 .
1001 102 1001102 . 2 . .
1001 103 1001103 . 3 1 .
1001 104 1001104 1001101 8 . 1
1001 105 1001105 1001103 8 . 1
1002 101 1002101 . 1 . .
1003 101 1003101 . 1 . .
1003 102 1003102 . 2 . .
1003 103 1003103 1003102 8 . 1
1003 104 1003104 1003102 8 . 1
;
run;
proc sort data=mdegrees (keep=dupersid degree
rename=(degree=mdegree
dupersid=MomID)
)
out=mdegrees;
by MomID mdegree;
run;
proc sort data=want;
by MomID;
run;
data new_want;
merge want (in=a) mdegrees (in=b);
by MomID;
if a;
run;
Thank you - this wored well.
I think a straightforward sql approach would be easier to understand:
data work.detail_data;
input
Dwelling_ID
Person_ID
ID $
Mom_ID
Dad_ID;
datalines;
1001 101 1001_101 . .
1001 102 1001_102 . .
1001 103 1001_103 . .
1001 104 1001_104 101 .
1001 105 1001_105 . 102
;
run;
proc sql;
select
t1.dwelling_id,
t1.person_id,
t1.id,
t1.Mom_ID,
t1.Dad_ID,
case when count(t2.person_id)>0 then 1 else 0 end as IsMom,
case when count(t3.person_id)>0 then 1 else 0 end as IsDad
from
work.detail_data t1
left outer join work.detail_data t2
on t1.Person_ID=t2.Mom_id
left outer join work.detail_data t3
on t1.Person_ID=t3.Dad_ID
group by
t1.dwelling_id,
t1.person_id,
t1.id,
t1.mom_id;
quit;
produces this output
Dwelling_ID | Person_ID | ID | Mom_ID | Dad_ID | IsMom | IsDad |
---|---|---|---|---|---|---|
1001 | 101 | 1001_101 | . | . | 1 | 0 |
1001 | 102 | 1001_102 | . | . | 0 | 1 |
1001 | 103 | 1001_103 | . | . | 0 | 0 |
1001 | 104 | 1001_104 | 101 | . | 0 | 0 |
1001 | 105 | 1001_105 | . | 102 | 0 | 0 |
OK.There are other alternative ways in SAS. Both proc format and Hash Table are best tool for querying question.But Hash Table is faster and need more code.
data one; input dwellingid : $4. personid : $3. id : $8. (momid dadid) (: $3.); datalines; 1001 101 1001_101 N/A N/A 1001 102 1001_102 N/A N/A 1001 103 1001_103 N/A N/A 1001 104 1001_104 101 102 1001 105 1001_105 103 102 ; run; proc sort data=one(keep=momid) out=one1 nodupkey;by momid;run; proc sort data=one(keep=dadid) out=one2 nodupkey;by dadid;run; data monid; set one1(rename=(momid=start) where=(start ne 'N/A')); retain fmtname 'momid' label 'Y' type 'C'; run; data dadid; set one2(rename=(dadid=start) where=(start ne 'N/A')); retain fmtname 'dadid' label 'Y' type 'C'; run; proc format cntlin=monid;run; proc format cntlin=dadid;run; data want; set one; mother=ifn(put(personid,$momid.)='Y',1,0); father=ifn(put(personid,$dadid.)='Y',1,0); run;
Ksharp
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.