I have a Census data file of persons who occupy various houses. Sample data shown below.
In every house, there is always a person numbered "1" who is the primary respondent to the Census survey (known as the head of household). Sometimes, this person lives alone. Or with relatives, or nonrelatives, or a combination of both. I want to create a list of the HouseNumbers that correspond to houses in which the head of household (person 1) is living with at least 2 of his or her relatives. So using the above example, the output would be:
House #128 would not be included, because person 1 ("self", also known as the respondent) is only related to one other person in house #128. Also, house #136 and house #149 would not be included because the respondent lives alone (no relatives or nonrelatives).
In my output table, I only want two variables: ID and House.
If you can't assume that the primary respondent in a house has the smallest ID then you need:
proc sql;
create table SelfWithTwoRelatives as
select b.id, b.house
from (
select house
from myData
group by house
having sum(RelationshipToRespondent = "Relative") >= 2) as a inner join
myData as b on a.house=b.house
where b.RelationshipToRespondent="Self";
quit;
(Couldn't test against your data which can't be imported into SAS)
PG
If you can't assume that the primary respondent in a house has the smallest ID then you need:
proc sql;
create table SelfWithTwoRelatives as
select b.id, b.house
from (
select house
from myData
group by house
having sum(RelationshipToRespondent = "Relative") >= 2) as a inner join
myData as b on a.house=b.house
where b.RelationshipToRespondent="Self";
quit;
(Couldn't test against your data which can't be imported into SAS)
PG
proc sql;
create table want as
select min(id) as id,a.house from have a
left join (select house,count(*) as freq from have
where relationshiptorespondent='Relative' group by house) b
on a.house=b.house
group by a.house
having freq>=2;
quit;
proc print data=want;
run;
If you can assume that MIN(id) is Self for every house then you don't need a join :
proc sql;
create table SelfWithTwoRelatives as
select min(id) as id, house
from myData
group by house
having sum(RelationshipToRespondent = "Relative") >= 2;
quit;
PG
PG,
HAVING also can filter the obs ,not just the group .
proc sql;
create table SelfWithTwoRelatives as
select *
from myData
group by house
having sum(RelationshipToRespondent = "Relative") >= 2 and PersonNumber=1;
quit;
Xia Keshan
Great! This query can even be done without a join and without remerging (implicit join) :
proc sql;
create table SelfWithTwoRelatives as
select
max(case when RelationshipToRespondent="Self" then id else . end) as id,
house
from myData
group by house
having sum(RelationshipToRespondent = "Relative") >= 2;
quit;
PG
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.