Solved: Finding median of a group excluding certain ids within that group

SwordOrator99 · Posted 01-15-2024 12:27 AM

I am having trouble coding this problem. Thanks in advance to anyone who can help!

Let's say we have individual-level data. The dataset has families that live in neighborhoods so the dataset would look something like this:

Have:

ID	Fam_ID	Neigh_ID	Income
1	A	CC	100,000
2	A	CC	200,000
3	B	CC	60,000
4	B	CC	200,000
5	B	CC	120,000
6	C	DD	35,000
7	C	DD	40,000

I am trying to find the median income of a neighborhood for each family that excludes the income of those in said family.

For example, family A would have a med_neighborhood_income of 120,000 since the only other family in neighborhood CC is family B and the median income in family B is 120,000. The med_neighborhood_income value for family B would be ((100,000+200,000)/2) or 150,000.

Hope this is clear.

FreelanceReinh · Posted 01-15-2024 05:00 AM

Hello @SwordOrator99,

Try this:

data have;
input ID Fam_ID $ Neigh_ID $ Income :comma.;
cards;
1 A CC 100,000
2 A CC 200,000
3 B CC 60,000
4 B CC 200,000
5 B CC 120,000
6 C DD 35,000
7 C DD 40,000
;

proc sql;
create table want as
select a.fam_id, a.neigh_id, median(b.income) as med_neigh_income
from (select distinct fam_id, neigh_id from have) a left join have b
on a.neigh_id=b.neigh_id & a.fam_id ne b.fam_id
group by a.fam_id, a.neigh_id;
quit;

View solution in original post

FreelanceReinh · Posted 01-15-2024 05:00 AM

Hello @SwordOrator99,

Try this:

data have;
input ID Fam_ID $ Neigh_ID $ Income :comma.;
cards;
1 A CC 100,000
2 A CC 200,000
3 B CC 60,000
4 B CC 200,000
5 B CC 120,000
6 C DD 35,000
7 C DD 40,000
;

proc sql;
create table want as
select a.fam_id, a.neigh_id, median(b.income) as med_neigh_income
from (select distinct fam_id, neigh_id from have) a left join have b
on a.neigh_id=b.neigh_id & a.fam_id ne b.fam_id
group by a.fam_id, a.neigh_id;
quit;

SwordOrator99 · Posted 01-15-2024 07:54 PM

Thank you, this works!

mkeintz · Posted 01-15-2024 01:02 PM

You apparently want what I call neighborhood median individual income based on the compliment of each family.

Assuming you want one observation per family, and the data are grouped by neighborhood/family, then I suspect the most efficient code would be:

data want (keep=neigh_id fam_id med_inc);
  set have;
  by neigh_id fam_id ;

  array _fam_ids {30} $1 _temporary_ ;
  array _famsize {30}    _temporary_ ;
  array _inc{30,6}       _temporary_ ;
  array _tmp{6}          _temporary_ ;

  if first.neigh_id then call missing(of _fam_ids{*},of _famsize{*},of _inc{*},_fam);

  if first.fam_id then do;
    _fam+1;
    _mbr=0;
  end;

  _mbr+1;
  _inc{_fam,_mbr}=income;

  if last.fam_id then do;
    _fam_ids{_fam}=fam_id;
    _famsize{_fam}=coalesce(dif(_n_),_n_);
  end;

  if last.neigh_id;

  _nfams=_fam;

  do _fam=1 to _nfams;
    do _mbr=1 to _famsize{_fam};
      _tmp{_mbr}=_inc{_fam,_mbr};
      _inc{_fam,_mbr}=.;
    end;

    fam_id=_fam_ids{_fam};
    med_inc=median(of _inc{*});
    output;

    do _mbr=1 to _famsize{_fam};
      _inc{_fam,_mbr}=_tmp{_mbr};
      _tmp{_mbr}=.;
    end;
  end;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Finding median of a group excluding certain ids within that group

Re: Finding median of a group excluding certain ids within that group

Re: Finding median of a group excluding certain ids within that group

Re: Finding median of a group excluding certain ids within that group

Re: Finding median of a group excluding certain ids within that group

Finding median of a group excluding certain ids within that group

Re: Finding median of a group excluding certain ids within that group

Re: Finding median of a group excluding certain ids within that group

Re: Finding median of a group excluding certain ids within that group

Re: Finding median of a group excluding certain ids within that group

Registration is open

SAS Training: Just a Click Away