BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sas_bo
Calcite | Level 5

Hello,

 

I have a problem with identifying the observations I need in my data (an also a hard time giving this post a proper subject/title). 

 

My data is at household level and I would like to identify the parents to some of the children (in my control group) within a household.

 

My dataset looks like:

 

data have;

Household_ID Person_ID Person_no_household Mom_no_houshold Dad_no_houshold Control Parent

        100                 201                 1                                     .                             .                      0           .

        100                 202                 3                                    1                             2                     1           .

        101                 203                 4                                    1                             2                     0           .

        102                 204                 3                                    1                             .                      0           .

 

And this is what I would like to have:

 

data want;

Household_ID Person_ID Person_no_household Mom_no_houshold Dad_no_houshold Control Parent

        100                 201                 1                                     .                             .                      0          1

        100                 202                 3                                    1                             2                     1          0

        101                 203                 4                                    1                             2                     0          0

        102                 204                 3                                    1                             .                      0          0

 

Basically I have the parents number within the given household, but I need to have the parent's Unique id number (Person_ID).

I do not know with which procedure to start with or search for, so you have any ideas to what I should look at, it would be highly appreciated! Thank you!

 

(If it is relevant - I am using SAS Enterprise Guide 7.1 (64-bit))

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

This is a good situation to use a data step with two hash objects.  One hash object (think "lookup table"), named MOM_IDS, will have the household_id and mom_no_houshold for all records with control=1.   Similarly have a DAD_IDS hash object.

 

The read all the records and lookup its  household_id/person_no_household  with the household_id/mom_no_houshold in MOM_IDS (and similarly with household_id/dad_no_houshold in DAD_IDS).  If the lookup is successful (the MOMIDS.CHECK() method or DADIDS.CHECK() return a zero), then you have a parent of a control.

 

dm 'clear log';
data have;
 input Household_ID Person_ID Person_no_household Mom_no_houshold Dad_no_houshold Control Parent;
datalines;
100 201 1 . . 0 .
100 202 3 1 2 1 .
101 203 4 1 2 0 .
102 204 3 1 . 0 .
run;

data want;
  set have;
  if _n_=1 then do;
    declare hash mom_ids (dataset:'have (keep=household_id mom_no_houshold control where=(control=1))');
      mom_ids.definekey('household_id','mom_no_houshold');
      mom_ids.definedone();
    declare hash dad_ids (dataset:'have (keep=household_id dad_no_houshold control where=(control=1))');
      dad_ids.definekey('household_id','dad_no_houshold');
      dad_ids.definedone();
    mom_ids.output(dataset:'moms');
    dad_ids.output(dataset:'dads');
  end;
  parent=0;
  if mom_ids.check(key:household_id,key:person_no_household)=0 
  or dad_ids.check(key:household_id,key:person_no_household)=0
  then parent=1;
run;

I leave it to others to show the SQL way.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

2 REPLIES 2
mkeintz
PROC Star

This is a good situation to use a data step with two hash objects.  One hash object (think "lookup table"), named MOM_IDS, will have the household_id and mom_no_houshold for all records with control=1.   Similarly have a DAD_IDS hash object.

 

The read all the records and lookup its  household_id/person_no_household  with the household_id/mom_no_houshold in MOM_IDS (and similarly with household_id/dad_no_houshold in DAD_IDS).  If the lookup is successful (the MOMIDS.CHECK() method or DADIDS.CHECK() return a zero), then you have a parent of a control.

 

dm 'clear log';
data have;
 input Household_ID Person_ID Person_no_household Mom_no_houshold Dad_no_houshold Control Parent;
datalines;
100 201 1 . . 0 .
100 202 3 1 2 1 .
101 203 4 1 2 0 .
102 204 3 1 . 0 .
run;

data want;
  set have;
  if _n_=1 then do;
    declare hash mom_ids (dataset:'have (keep=household_id mom_no_houshold control where=(control=1))');
      mom_ids.definekey('household_id','mom_no_houshold');
      mom_ids.definedone();
    declare hash dad_ids (dataset:'have (keep=household_id dad_no_houshold control where=(control=1))');
      dad_ids.definekey('household_id','dad_no_houshold');
      dad_ids.definedone();
    mom_ids.output(dataset:'moms');
    dad_ids.output(dataset:'dads');
  end;
  parent=0;
  if mom_ids.check(key:household_id,key:person_no_household)=0 
  or dad_ids.check(key:household_id,key:person_no_household)=0
  then parent=1;
run;

I leave it to others to show the SQL way.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
sas_bo
Calcite | Level 5
This worked. Thank you so much!