I am having trouble coding this problem. Thanks in advance to anyone who can help!
Let's say we have individual-level data. The dataset has families that live in neighborhoods so the dataset would look something like this:
Have:
ID | Fam_ID | Neigh_ID | Income |
1 | A | CC | 100,000 |
2 | A | CC | 200,000 |
3 | B | CC | 60,000 |
4 | B | CC | 200,000 |
5 | B | CC | 120,000 |
6 | C | DD | 35,000 |
7 | C | DD | 40,000 |
I am trying to find the median income of a neighborhood for each family that excludes the income of those in said family.
For example, family A would have a med_neighborhood_income of 120,000 since the only other family in neighborhood CC is family B and the median income in family B is 120,000. The med_neighborhood_income value for family B would be ((100,000+200,000)/2) or 150,000.
Hope this is clear.
Hello @SwordOrator99,
Try this:
data have;
input ID Fam_ID $ Neigh_ID $ Income :comma.;
cards;
1 A CC 100,000
2 A CC 200,000
3 B CC 60,000
4 B CC 200,000
5 B CC 120,000
6 C DD 35,000
7 C DD 40,000
;
proc sql;
create table want as
select a.fam_id, a.neigh_id, median(b.income) as med_neigh_income
from (select distinct fam_id, neigh_id from have) a left join have b
on a.neigh_id=b.neigh_id & a.fam_id ne b.fam_id
group by a.fam_id, a.neigh_id;
quit;
Hello @SwordOrator99,
Try this:
data have;
input ID Fam_ID $ Neigh_ID $ Income :comma.;
cards;
1 A CC 100,000
2 A CC 200,000
3 B CC 60,000
4 B CC 200,000
5 B CC 120,000
6 C DD 35,000
7 C DD 40,000
;
proc sql;
create table want as
select a.fam_id, a.neigh_id, median(b.income) as med_neigh_income
from (select distinct fam_id, neigh_id from have) a left join have b
on a.neigh_id=b.neigh_id & a.fam_id ne b.fam_id
group by a.fam_id, a.neigh_id;
quit;
Thank you, this works!
You apparently want what I call neighborhood median individual income based on the compliment of each family.
Assuming you want one observation per family, and the data are grouped by neighborhood/family, then I suspect the most efficient code would be:
data want (keep=neigh_id fam_id med_inc);
set have;
by neigh_id fam_id ;
array _fam_ids {30} $1 _temporary_ ;
array _famsize {30} _temporary_ ;
array _inc{30,6} _temporary_ ;
array _tmp{6} _temporary_ ;
if first.neigh_id then call missing(of _fam_ids{*},of _famsize{*},of _inc{*},_fam);
if first.fam_id then do;
_fam+1;
_mbr=0;
end;
_mbr+1;
_inc{_fam,_mbr}=income;
if last.fam_id then do;
_fam_ids{_fam}=fam_id;
_famsize{_fam}=coalesce(dif(_n_),_n_);
end;
if last.neigh_id;
_nfams=_fam;
do _fam=1 to _nfams;
do _mbr=1 to _famsize{_fam};
_tmp{_mbr}=_inc{_fam,_mbr};
_inc{_fam,_mbr}=.;
end;
fam_id=_fam_ids{_fam};
med_inc=median(of _inc{*});
output;
do _mbr=1 to _famsize{_fam};
_inc{_fam,_mbr}=_tmp{_mbr};
_tmp{_mbr}=.;
end;
end;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.