I am having trouble coding this problem. Thanks in advance to anyone who can help!
Let's say we have individual-level data. The dataset has families that live in neighborhoods so the dataset would look something like this:
Have:
ID | Fam_ID | Neigh_ID | Income |
1 | A | CC | 100,000 |
2 | A | CC | 200,000 |
3 | B | CC | 60,000 |
4 | B | CC | 200,000 |
5 | B | CC | 120,000 |
6 | C | DD | 35,000 |
7 | C | DD | 40,000 |
I am trying to find the median income of a neighborhood for each family that excludes the income of those in said family.
For example, family A would have a med_neighborhood_income of 120,000 since the only other family in neighborhood CC is family B and the median income in family B is 120,000. The med_neighborhood_income value for family B would be ((100,000+200,000)/2) or 150,000.
Hope this is clear.
Hello @SwordOrator99,
Try this:
data have;
input ID Fam_ID $ Neigh_ID $ Income :comma.;
cards;
1 A CC 100,000
2 A CC 200,000
3 B CC 60,000
4 B CC 200,000
5 B CC 120,000
6 C DD 35,000
7 C DD 40,000
;
proc sql;
create table want as
select a.fam_id, a.neigh_id, median(b.income) as med_neigh_income
from (select distinct fam_id, neigh_id from have) a left join have b
on a.neigh_id=b.neigh_id & a.fam_id ne b.fam_id
group by a.fam_id, a.neigh_id;
quit;
Hello @SwordOrator99,
Try this:
data have;
input ID Fam_ID $ Neigh_ID $ Income :comma.;
cards;
1 A CC 100,000
2 A CC 200,000
3 B CC 60,000
4 B CC 200,000
5 B CC 120,000
6 C DD 35,000
7 C DD 40,000
;
proc sql;
create table want as
select a.fam_id, a.neigh_id, median(b.income) as med_neigh_income
from (select distinct fam_id, neigh_id from have) a left join have b
on a.neigh_id=b.neigh_id & a.fam_id ne b.fam_id
group by a.fam_id, a.neigh_id;
quit;
Thank you, this works!
You apparently want what I call neighborhood median individual income based on the compliment of each family.
Assuming you want one observation per family, and the data are grouped by neighborhood/family, then I suspect the most efficient code would be:
data want (keep=neigh_id fam_id med_inc);
set have;
by neigh_id fam_id ;
array _fam_ids {30} $1 _temporary_ ;
array _famsize {30} _temporary_ ;
array _inc{30,6} _temporary_ ;
array _tmp{6} _temporary_ ;
if first.neigh_id then call missing(of _fam_ids{*},of _famsize{*},of _inc{*},_fam);
if first.fam_id then do;
_fam+1;
_mbr=0;
end;
_mbr+1;
_inc{_fam,_mbr}=income;
if last.fam_id then do;
_fam_ids{_fam}=fam_id;
_famsize{_fam}=coalesce(dif(_n_),_n_);
end;
if last.neigh_id;
_nfams=_fam;
do _fam=1 to _nfams;
do _mbr=1 to _famsize{_fam};
_tmp{_mbr}=_inc{_fam,_mbr};
_inc{_fam,_mbr}=.;
end;
fam_id=_fam_ids{_fam};
med_inc=median(of _inc{*});
output;
do _mbr=1 to _famsize{_fam};
_inc{_fam,_mbr}=_tmp{_mbr};
_tmp{_mbr}=.;
end;
end;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.