Hello! I have a data set with individuals' weights over time and state of residence.
Name Year Weight State
Ann 2017 150 MD
Ann 2016 160 MD
Bob 2017 200 MD
Matt 2016 175 TX
etc.
What I want to do is create a new variable that tells me the percentage of people (not including the current person) that have a higher weight than the current individual in their state for a given year.
If it was EXCEL I would do =COUNTIFS(B:B,B2,D: D,D2,C:C,">"&C2)/(COUNTIFS(B:B,B2,😧 D,D2,)-1).
Another way, with percentages (I missed this part of the question)
proc sql;
select unique h1.*
,sum (h1.WEIGHT < h2.WEIGHT) as NB_HEAVIER
,sum (h1.WEIGHT < h2.WEIGHT)/ count(*) as PCT_HEAVIER
from HAVE h1
left join
HAVE h2
on h1.STATE = h2.STATE
and h1.YEAR = h2.YEAR
group by h1.NAME, h1.YEAR, h1.STATE;
quit;
Name | Year | Weight | State | NB_HEAVIER | PCT_HEAVIER |
---|---|---|---|---|---|
Ann | 2017 | 150 | MD | 3 | 0.75 |
Ann | 2018 | 250 | MD | 0 | 0 |
Bob | 2017 | 200 | MD | 1 | 0.25 |
Bob | 2018 | 200 | MD | 2 | 0.5 |
Chris | 2017 | 210 | MD | 0 | 0 |
Chris | 2018 | 210 | MD | 1 | 0.25 |
Matt | 2017 | 160 | MD | 2 | 0.5 |
Matt | 2018 | 160 | MD | 3 | 0.75 |
Since the within-state-percentile of each individual's weight is the percent that are lighter than the individual, then you want 1 minus that percentile, right?
If so, just use proc rank, with the DESCENDING and PERCENT options, and a BY statement (assuming your data are sorted by state).
Like this?
data HAVE;
input Name$ Year Weight State $;
cards;
Ann 2017 150 MD
Matt 2017 160 MD
Bob 2017 200 MD
Chris 2017 210 MD
Ann 2018 250 MD
Matt 2018 160 MD
Bob 2018 200 MD
Chris 2018 210 MD
run;
proc sql;
select unique h1.*
,count (h2.NAME) as NB_HEAVIER
from HAVE h1
left join
HAVE h2
on h1.STATE = h2.STATE
and h1.YEAR = h2.YEAR
and h1.WEIGHT < h2.WEIGHT
group by h1.NAME, h1.YEAR, h1.STATE;
quit;
SAS Output
Name | Year | Weight | State | NB_HEAVIER |
---|---|---|---|---|
Ann | 2017 | 150 | MD | 3 |
Ann | 2018 | 250 | MD | 0 |
Bob | 2017 | 200 | MD | 1 |
Bob | 2018 | 200 | MD | 2 |
Chris | 2017 | 210 | MD | 0 |
Chris | 2018 | 210 | MD | 1 |
Matt | 2017 | 160 | MD | 2 |
Matt | 2018 | 160 | MD | 3 |
Another way, with percentages (I missed this part of the question)
proc sql;
select unique h1.*
,sum (h1.WEIGHT < h2.WEIGHT) as NB_HEAVIER
,sum (h1.WEIGHT < h2.WEIGHT)/ count(*) as PCT_HEAVIER
from HAVE h1
left join
HAVE h2
on h1.STATE = h2.STATE
and h1.YEAR = h2.YEAR
group by h1.NAME, h1.YEAR, h1.STATE;
quit;
Name | Year | Weight | State | NB_HEAVIER | PCT_HEAVIER |
---|---|---|---|---|---|
Ann | 2017 | 150 | MD | 3 | 0.75 |
Ann | 2018 | 250 | MD | 0 | 0 |
Bob | 2017 | 200 | MD | 1 | 0.25 |
Bob | 2018 | 200 | MD | 2 | 0.5 |
Chris | 2017 | 210 | MD | 0 | 0 |
Chris | 2018 | 210 | MD | 1 | 0.25 |
Matt | 2017 | 160 | MD | 2 | 0.5 |
Matt | 2018 | 160 | MD | 3 | 0.75 |
This solution works great. Thank you.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.