Hello,
I would like to calculate the percentage of men and women with a salary below each decile (<p10, <p20...). So I have a binary variable gender, a variable for salaries (from 1 to 20'000).
I tried to separate my database into 10 equal parts with proc rank.
proc rank data=database groups=10 descending out=ranked; var wages; ranks decile; run;
Then I sorted it by gender and did a proc freq for the first decile (the idea is to repeat this for each decile).
proc sort data=ranked
out=ranked_sort;
by sex;
run;
proc freq data= ranked_sort ;
where decile = 9 ; * p10 ;
table wages*sex ;
run;
I find a total of % for men and for women but I also have the details for each wage (1-3000) in the decile. I would like to have only the total for the decile.
Does anyone have an idea how to do this?
Thanks in advance, regards,
Jo
If there are no tied values, then, by definition, you will get of employees 10% of values below the first decile, 20% below the second decile, and so forth. So, I guess you are trying to look at whether males/females differ in their proportions? Because another option is to look at the empirical distribution curves separately for males and females. If the two curves differ, that tells you whether the distribution of salaries differs between genders:
data Have;
set sashelp.heart;
keep Sex Cholesterol;
run;
proc univariate data=Have;
class Sex;
var Cholesterol;
cdfplot Cholesterol;
run;
But if you want to use the PROC RANK and PROC FREQ idea, see if this helps:
proc rank data=Have groups=10 descending out=ranked;
var Wages;
ranks decile;
run;
proc freq data= ranked;
table sex*decile / list out=ListOut;
run;
proc means data=ListOut Sum;
class Sex;
var Count;
run;
/*
Female Sum=2873
Male Sum=2336
*/
data Want;
set ListOut;
if Sex='Female' then Prop = Count / 2873;
else Prop = Count / 2336;
run;
proc print data=Want;
run;
I would like to calculate the percentage of men and women with a salary below each decile (<p10, <p20...). So I have a binary variable gender, a variable for salaries (from 1 to 20'000).
Isn't the percent less than p10 equal to 10 percent? Isn't the percent less than p20 equal to 20 percent?
If there are no tied values, then, by definition, you will get of employees 10% of values below the first decile, 20% below the second decile, and so forth. So, I guess you are trying to look at whether males/females differ in their proportions? Because another option is to look at the empirical distribution curves separately for males and females. If the two curves differ, that tells you whether the distribution of salaries differs between genders:
data Have;
set sashelp.heart;
keep Sex Cholesterol;
run;
proc univariate data=Have;
class Sex;
var Cholesterol;
cdfplot Cholesterol;
run;
But if you want to use the PROC RANK and PROC FREQ idea, see if this helps:
proc rank data=Have groups=10 descending out=ranked;
var Wages;
ranks decile;
run;
proc freq data= ranked;
table sex*decile / list out=ListOut;
run;
proc means data=ListOut Sum;
class Sex;
var Count;
run;
/*
Female Sum=2873
Male Sum=2336
*/
data Want;
set ListOut;
if Sex='Female' then Prop = Count / 2873;
else Prop = Count / 2336;
run;
proc print data=Want;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.