BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
SwordOrator99
Calcite | Level 5

I am having trouble coding this problem. Thanks in advance to anyone who can help!

Let's say we have individual-level data. The dataset has families that live in neighborhoods so the dataset would look something like this:

Have:

IDFam_IDNeigh_IDIncome
1ACC100,000
2ACC200,000
3BCC60,000
4BCC200,000
5BCC120,000
6CDD35,000
7CDD40,000

 

I am trying to find the median income of a neighborhood for each family that excludes the income of those in said family. 

For example, family A would have a med_neighborhood_income of 120,000 since the only other family in neighborhood CC is family B and the median income in family B is 120,000. The med_neighborhood_income value for family B would be ((100,000+200,000)/2) or 150,000. 

Hope this is clear. 

 

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hello @SwordOrator99,

 

Try this:

data have;
input ID Fam_ID $ Neigh_ID $ Income :comma.;
cards;
1 A CC 100,000
2 A CC 200,000
3 B CC 60,000
4 B CC 200,000
5 B CC 120,000
6 C DD 35,000
7 C DD 40,000
;

proc sql;
create table want as
select a.fam_id, a.neigh_id, median(b.income) as med_neigh_income
from (select distinct fam_id, neigh_id from have) a left join have b
on a.neigh_id=b.neigh_id & a.fam_id ne b.fam_id
group by a.fam_id, a.neigh_id;
quit;

View solution in original post

3 REPLIES 3
FreelanceReinh
Jade | Level 19

Hello @SwordOrator99,

 

Try this:

data have;
input ID Fam_ID $ Neigh_ID $ Income :comma.;
cards;
1 A CC 100,000
2 A CC 200,000
3 B CC 60,000
4 B CC 200,000
5 B CC 120,000
6 C DD 35,000
7 C DD 40,000
;

proc sql;
create table want as
select a.fam_id, a.neigh_id, median(b.income) as med_neigh_income
from (select distinct fam_id, neigh_id from have) a left join have b
on a.neigh_id=b.neigh_id & a.fam_id ne b.fam_id
group by a.fam_id, a.neigh_id;
quit;
mkeintz
PROC Star

You apparently want what I call neighborhood median individual income based on the compliment of each family. 

 

Assuming you want one observation per family, and the data are grouped by neighborhood/family, then I suspect the most efficient code would be:

 

data want (keep=neigh_id fam_id med_inc);
  set have;
  by neigh_id fam_id ;

  array _fam_ids {30} $1 _temporary_ ;
  array _famsize {30}    _temporary_ ;
  array _inc{30,6}       _temporary_ ;
  array _tmp{6}          _temporary_ ;

  if first.neigh_id then call missing(of _fam_ids{*},of _famsize{*},of _inc{*},_fam);

  if first.fam_id then do;
    _fam+1;
    _mbr=0;
  end;

  _mbr+1;
  _inc{_fam,_mbr}=income;

  if last.fam_id then do;
    _fam_ids{_fam}=fam_id;
    _famsize{_fam}=coalesce(dif(_n_),_n_);
  end;

  if last.neigh_id;

  _nfams=_fam;

  do _fam=1 to _nfams;
    do _mbr=1 to _famsize{_fam};
      _tmp{_mbr}=_inc{_fam,_mbr};
      _inc{_fam,_mbr}=.;
    end;

    fam_id=_fam_ids{_fam};
    med_inc=median(of _inc{*});
    output;

    do _mbr=1 to _famsize{_fam};
      _inc{_fam,_mbr}=_tmp{_mbr};
      _tmp{_mbr}=.;
    end;
  end;
run;

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1301 views
  • 2 likes
  • 3 in conversation