BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sas_bo
Calcite | Level 5

Hello,

 

I have a problem with identifying the observations I need in my data (an also a hard time giving this post a proper subject/title). 

 

My data is at household level and I would like to identify the parents to some of the children (in my control group) within a household.

 

My dataset looks like:

 

data have;

Household_ID Person_ID Person_no_household Mom_no_houshold Dad_no_houshold Control Parent

        100                 201                 1                                     .                             .                      0           .

        100                 202                 3                                    1                             2                     1           .

        101                 203                 4                                    1                             2                     0           .

        102                 204                 3                                    1                             .                      0           .

 

And this is what I would like to have:

 

data want;

Household_ID Person_ID Person_no_household Mom_no_houshold Dad_no_houshold Control Parent

        100                 201                 1                                     .                             .                      0          1

        100                 202                 3                                    1                             2                     1          0

        101                 203                 4                                    1                             2                     0          0

        102                 204                 3                                    1                             .                      0          0

 

Basically I have the parents number within the given household, but I need to have the parent's Unique id number (Person_ID).

I do not know with which procedure to start with or search for, so you have any ideas to what I should look at, it would be highly appreciated! Thank you!

 

(If it is relevant - I am using SAS Enterprise Guide 7.1 (64-bit))

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

This is a good situation to use a data step with two hash objects.  One hash object (think "lookup table"), named MOM_IDS, will have the household_id and mom_no_houshold for all records with control=1.   Similarly have a DAD_IDS hash object.

 

The read all the records and lookup its  household_id/person_no_household  with the household_id/mom_no_houshold in MOM_IDS (and similarly with household_id/dad_no_houshold in DAD_IDS).  If the lookup is successful (the MOMIDS.CHECK() method or DADIDS.CHECK() return a zero), then you have a parent of a control.

 

dm 'clear log';
data have;
 input Household_ID Person_ID Person_no_household Mom_no_houshold Dad_no_houshold Control Parent;
datalines;
100 201 1 . . 0 .
100 202 3 1 2 1 .
101 203 4 1 2 0 .
102 204 3 1 . 0 .
run;

data want;
  set have;
  if _n_=1 then do;
    declare hash mom_ids (dataset:'have (keep=household_id mom_no_houshold control where=(control=1))');
      mom_ids.definekey('household_id','mom_no_houshold');
      mom_ids.definedone();
    declare hash dad_ids (dataset:'have (keep=household_id dad_no_houshold control where=(control=1))');
      dad_ids.definekey('household_id','dad_no_houshold');
      dad_ids.definedone();
    mom_ids.output(dataset:'moms');
    dad_ids.output(dataset:'dads');
  end;
  parent=0;
  if mom_ids.check(key:household_id,key:person_no_household)=0 
  or dad_ids.check(key:household_id,key:person_no_household)=0
  then parent=1;
run;

I leave it to others to show the SQL way.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

2 REPLIES 2
mkeintz
PROC Star

This is a good situation to use a data step with two hash objects.  One hash object (think "lookup table"), named MOM_IDS, will have the household_id and mom_no_houshold for all records with control=1.   Similarly have a DAD_IDS hash object.

 

The read all the records and lookup its  household_id/person_no_household  with the household_id/mom_no_houshold in MOM_IDS (and similarly with household_id/dad_no_houshold in DAD_IDS).  If the lookup is successful (the MOMIDS.CHECK() method or DADIDS.CHECK() return a zero), then you have a parent of a control.

 

dm 'clear log';
data have;
 input Household_ID Person_ID Person_no_household Mom_no_houshold Dad_no_houshold Control Parent;
datalines;
100 201 1 . . 0 .
100 202 3 1 2 1 .
101 203 4 1 2 0 .
102 204 3 1 . 0 .
run;

data want;
  set have;
  if _n_=1 then do;
    declare hash mom_ids (dataset:'have (keep=household_id mom_no_houshold control where=(control=1))');
      mom_ids.definekey('household_id','mom_no_houshold');
      mom_ids.definedone();
    declare hash dad_ids (dataset:'have (keep=household_id dad_no_houshold control where=(control=1))');
      dad_ids.definekey('household_id','dad_no_houshold');
      dad_ids.definedone();
    mom_ids.output(dataset:'moms');
    dad_ids.output(dataset:'dads');
  end;
  parent=0;
  if mom_ids.check(key:household_id,key:person_no_household)=0 
  or dad_ids.check(key:household_id,key:person_no_household)=0
  then parent=1;
run;

I leave it to others to show the SQL way.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
sas_bo
Calcite | Level 5
This worked. Thank you so much!

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 375 views
  • 0 likes
  • 2 in conversation