proc univariate data=myData noprint; by year sex; weight myWeight; var wage; output out=myNewData PCTLPTS= 1 5 10 25 75 90 95 99
PCTLPRE= PER_; run;
Data Total;
merge myData myNewData;
by year sex;
array pct(*) per:;
do i=1 to dim(pct);
if pct(i-1) <=wage <= pct(i) then index = i-1; end; drop per; run;
proc freq data= total; tables sex* year* index * (var1 var2); weight myweight; run;
But I have 7 index instead of 8 index.
I don't understand the question.
Do you want to calculate the percent in percentile 1 and the percent in percentile 5 and the percent in percentile 25 and so on?
OR
Do you want to calculate the percent between percentile 1 and percentile 5, and the percent between percentile 5 and percentile 25 and so on?
Do you have a lot of ties in your data in variable wage? Could the use of the weight statement produce lots of ties, meaning uneven distribution of observations to percentile?
Thank you for the response PageMiller.
I need to calculate the proportion of observations for each PI, P5, P10.... P99 by sex year Var1 and Var2.
Does it answer your question?
Your code as shown is calculating the number of observations (not weighted correctly) between the percentiles , ie less than 1, between 1 and 5 etc.
What do you want to accomplish? Also, wouldn't that do loop error out as it goes from p-1/i-1 which is undefined?
Thank you for the reply Reeza.
I have a dataset as mydata contents sex year wage weight var1 and var2. I need to calculate weighted P1, P5, P10... P99 for wage. Then, I would like to estimate the proportions of observations fall into each percentile disaggregated by sex year var1 and var2, like this :
sex year var1 var2
P1 1 2007 X% Y%
P5
P10
.
.
P99
X% or Y%: proportion in P1
Thank you,
That's kind of a weird request because it's 1% fall in the 1th percentile, 5% fall under the P5 ( or 4% between P1/P5)....
That's the definition of percentiles.
That's right Reeza. P5 here, for example, is the value of wage that 5% of observations fall in this value or below that. Now, I need to know from this 5% what proportion is, for example, women, were paid those wages in year=2000, have bachelor degree (var 1) and are married (Var 2). For this. I need to know the number of observation for each P1, P5, P10... P99. Then do cross tabulation with the variables. Am I right?
Thank you,
@altadata1 wrote:
Thank you for the response PageMiller.
I need to calculate the proportion of observations for each PI, P5, P10.... P99 by sex year Var1 and Var2.
Does it answer your question?
It seems to answer the question, but it leaves me thinking that this is a relatively meaningless thing to do. I am mystified by the request. Can you tell me why you want the proportion of observations at P1 and proportion at P5 but not at P4? What benefit is there to knowing how many values are at P5?
Do you want to know the proportion at exactly percentile 5, what about the proportion at P4.9 and P5.1, are those considered to be P5??
Thank you PageMiller. Here is what I need to do:
P5 here, for example, is the value of wage that 5% of observations fall in this value or below that. Now, I need to know from this 5% what proportion is, for example, women, were paid those wages in year=2000, have bachelor degree (var 1) and are married (Var 2). For this. I need to know the number of observation for each P1, P5, P10... P99. Then do cross tabulation with the variables. Am I right?
Thank you,
Another explanation that is inconsistent with earlier explanations. Now you seem to be saying something different. Now it seems you are saying you want the proportion LESS THAN p5 (if I am understanding you properly) — and its still not clear what you want for P10, P25, ...
Are the percentiles computed separately for males and females? Or are they computed across the entire population and then this percentile is applied to the males and applied to the females
@PageMiller. Thank you for your time and help, but I don't think I've ever been inconsistent. Please refer to my first post.
I mentioned from the beginning that I have P1 P5 P10 P25 P75 P90 P95 and P99 and I would like to calculate the number of observation within each percentile.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.