I have a Census data file of persons who occupy various houses. Sample data shown below.
In every house, there is always a person numbered "1" who is the primary respondent to the Census survey (known as the head of household). Sometimes, this person lives alone. Or with relatives, or nonrelatives, or a combination of both. I want to create a list of the HouseNumbers that correspond to houses in which the head of household (person 1) is living with at least 2 of his or her relatives. So using the above example, the output would be:
House #128 would not be included, because person 1 ("self", also known as the respondent) is only related to one other person in house #128. Also, house #136 and house #149 would not be included because the respondent lives alone (no relatives or nonrelatives).
In my output table, I only want two variables: ID and House.
If you can't assume that the primary respondent in a house has the smallest ID then you need:
proc sql;
create table SelfWithTwoRelatives as
select b.id, b.house
from (
select house
from myData
group by house
having sum(RelationshipToRespondent = "Relative") >= 2) as a inner join
myData as b on a.house=b.house
where b.RelationshipToRespondent="Self";
quit;
(Couldn't test against your data which can't be imported into SAS)
PG
If you can't assume that the primary respondent in a house has the smallest ID then you need:
proc sql;
create table SelfWithTwoRelatives as
select b.id, b.house
from (
select house
from myData
group by house
having sum(RelationshipToRespondent = "Relative") >= 2) as a inner join
myData as b on a.house=b.house
where b.RelationshipToRespondent="Self";
quit;
(Couldn't test against your data which can't be imported into SAS)
PG
proc sql;
create table want as
select min(id) as id,a.house from have a
left join (select house,count(*) as freq from have
where relationshiptorespondent='Relative' group by house) b
on a.house=b.house
group by a.house
having freq>=2;
quit;
proc print data=want;
run;
If you can assume that MIN(id) is Self for every house then you don't need a join :
proc sql;
create table SelfWithTwoRelatives as
select min(id) as id, house
from myData
group by house
having sum(RelationshipToRespondent = "Relative") >= 2;
quit;
PG
PG,
HAVING also can filter the obs ,not just the group .
proc sql;
create table SelfWithTwoRelatives as
select *
from myData
group by house
having sum(RelationshipToRespondent = "Relative") >= 2 and PersonNumber=1;
quit;
Xia Keshan
Great! This query can even be done without a join and without remerging (implicit join) :
proc sql;
create table SelfWithTwoRelatives as
select
max(case when RelationshipToRespondent="Self" then id else . end) as id,
house
from myData
group by house
having sum(RelationshipToRespondent = "Relative") >= 2;
quit;
PG
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.