- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello! I have a data set with individuals' weights over time and state of residence.
Name Year Weight State
Ann 2017 150 MD
Ann 2016 160 MD
Bob 2017 200 MD
Matt 2016 175 TX
etc.
What I want to do is create a new variable that tells me the percentage of people (not including the current person) that have a higher weight than the current individual in their state for a given year.
If it was EXCEL I would do =COUNTIFS(B:B,B2,D: D,D2,C:C,">"&C2)/(COUNTIFS(B:B,B2,😧 D,D2,)-1).
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Another way, with percentages (I missed this part of the question)
proc sql;
select unique h1.*
,sum (h1.WEIGHT < h2.WEIGHT) as NB_HEAVIER
,sum (h1.WEIGHT < h2.WEIGHT)/ count(*) as PCT_HEAVIER
from HAVE h1
left join
HAVE h2
on h1.STATE = h2.STATE
and h1.YEAR = h2.YEAR
group by h1.NAME, h1.YEAR, h1.STATE;
quit;
Name | Year | Weight | State | NB_HEAVIER | PCT_HEAVIER |
---|---|---|---|---|---|
Ann | 2017 | 150 | MD | 3 | 0.75 |
Ann | 2018 | 250 | MD | 0 | 0 |
Bob | 2017 | 200 | MD | 1 | 0.25 |
Bob | 2018 | 200 | MD | 2 | 0.5 |
Chris | 2017 | 210 | MD | 0 | 0 |
Chris | 2018 | 210 | MD | 1 | 0.25 |
Matt | 2017 | 160 | MD | 2 | 0.5 |
Matt | 2018 | 160 | MD | 3 | 0.75 |
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Since the within-state-percentile of each individual's weight is the percent that are lighter than the individual, then you want 1 minus that percentile, right?
If so, just use proc rank, with the DESCENDING and PERCENT options, and a BY statement (assuming your data are sorted by state).
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Like this?
data HAVE;
input Name$ Year Weight State $;
cards;
Ann 2017 150 MD
Matt 2017 160 MD
Bob 2017 200 MD
Chris 2017 210 MD
Ann 2018 250 MD
Matt 2018 160 MD
Bob 2018 200 MD
Chris 2018 210 MD
run;
proc sql;
select unique h1.*
,count (h2.NAME) as NB_HEAVIER
from HAVE h1
left join
HAVE h2
on h1.STATE = h2.STATE
and h1.YEAR = h2.YEAR
and h1.WEIGHT < h2.WEIGHT
group by h1.NAME, h1.YEAR, h1.STATE;
quit;
SAS Output
Name | Year | Weight | State | NB_HEAVIER |
---|---|---|---|---|
Ann | 2017 | 150 | MD | 3 |
Ann | 2018 | 250 | MD | 0 |
Bob | 2017 | 200 | MD | 1 |
Bob | 2018 | 200 | MD | 2 |
Chris | 2017 | 210 | MD | 0 |
Chris | 2018 | 210 | MD | 1 |
Matt | 2017 | 160 | MD | 2 |
Matt | 2018 | 160 | MD | 3 |
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Another way, with percentages (I missed this part of the question)
proc sql;
select unique h1.*
,sum (h1.WEIGHT < h2.WEIGHT) as NB_HEAVIER
,sum (h1.WEIGHT < h2.WEIGHT)/ count(*) as PCT_HEAVIER
from HAVE h1
left join
HAVE h2
on h1.STATE = h2.STATE
and h1.YEAR = h2.YEAR
group by h1.NAME, h1.YEAR, h1.STATE;
quit;
Name | Year | Weight | State | NB_HEAVIER | PCT_HEAVIER |
---|---|---|---|---|---|
Ann | 2017 | 150 | MD | 3 | 0.75 |
Ann | 2018 | 250 | MD | 0 | 0 |
Bob | 2017 | 200 | MD | 1 | 0.25 |
Bob | 2018 | 200 | MD | 2 | 0.5 |
Chris | 2017 | 210 | MD | 0 | 0 |
Chris | 2018 | 210 | MD | 1 | 0.25 |
Matt | 2017 | 160 | MD | 2 | 0.5 |
Matt | 2018 | 160 | MD | 3 | 0.75 |
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This solution works great. Thank you.