BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Hello:

I have a relatively large data set (around 100,000 rows), I used the following code to calculate the 5th and 95th percentile,

proc means data=fulldata3;
VAR DV;
CLASS GROUP TREA CMT STIM;
output out=percentiles2 mean=mean P5=P5 P95=P95;
run;


The data have 30 GROUPs, each GROUP has 4 TREAs, each TREA have 4 CMTs, every CMTs have the same STIM (scheduled time). But the above code cannot calculate the 2.5 and 97.5 percentile, since proc means does not have P2.5 and P97.5

I was trying to use
proc univariate data=fulldata3;
VAR DV;
CLASS GROUP TREA CMT STIM;
output out=percentiles1 mean=mean pctlpts=2.5 97.5 pctlpre=P;
run;

The error message is "ERROR: Cannot specify more than two CLASS variables"

Then, I tried to use
proc univariate data=fulldata3;
VAR DV;
By GROUP TREA CMT STIM;
output out=percentiles1 mean=mean pctlpts=2.5 97.5 pctlpre=P;
run;

It pumped out "windows is full and must be cleared select" and after I made a selection, it keeps doing that. I have stay there and click "C to clear windows without saving ". After a while I got the output, but the title was like the following,
"the 2.5000 percentile, DV", this name is not recognized as a variable which I will further process.

I hope to get some help about this issue.

Thank you
1 ACCEPTED SOLUTION

Accepted Solutions
lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12
The code by data _null_ should work. I just wrote a demonstration program (with a few class factors), and you definitely get a variable called P2_5. Everything works fine.

data a;
do A = 1 to 5;
do B = 1 to 3;
do C = 1 to 7;
do rep = 1 to 200;
y = rannor(1);
output;
end;
end;
end;
end;
run;

proc print data=a(obs=300);
run;
proc univariate data=a;
by A B C;
var y;
output out=percentiles1 mean=mean pctlpts=2.5 97.5 pctlpre=P;
run;
proc print data=percentiles1;
run;
proc means data=percentiles1;
var P2_5;
run;

View solution in original post

9 REPLIES 9
deleted_user
Not applicable
Hello:

I think the noprint option ought to stop the pumping out of the "windows is full and must be cleared select". So, the question left was that how to make the variable name in the output file from the proc univariate so that it can be processed later. The column name right now is like ""the 2.5000 percentile, DV".
SteveDenham
Jade | Level 19
I could be missing something. Do you want the percentiles across all data, or the percentiles within each sub-sub-subclass? I assume the latter.

I ran this code (with the NOPRINT option) against a dataset with several clinical chemistry endpoints and groups. The output dataset contained the following (output from PROC CONTENTS):

Alphabetic List of Variables and Attributes

# Variable Type Len Format Informat Label

4 GRP_NO Num 8
2 MEAS_CD Num 8 11. 11. MEAS_CD
6 P2_5 Num 8 the 2.5000 percentile, value
7 P97_5 Num 8 the 97.5000 percentile, value
1 insum Num 8
5 mean Num 8 the mean, value
3 measureM Char 91

(Sorry about the wrapping, and this may not look so good in a proportional font)

Anyway, the variables you need would all be there, with proper values for each of your by variables.

Steve Denham



Message was edited by: SteveDenham Message was edited by: SteveDenham
deleted_user
Not applicable
hi SteveDenham :

Thanks for your reply. The issue is with the label for the percentile, e.g."the 2.5000 percentile, value" I need to use this variable later, but since there is a comma in the label (maybe other things I am not aware of), I cannot get access to the 2.5 percentile in the output file from the proc univariate in the following example.

Proc means data=output (this is the output from the proc univariate);
var the 2.5000 percentile, value;
run;

This code did not work. "the 2.5000 percentile, value" is not recognized
data_null__
Jade | Level 19
You are confusing NAMES and LABELS.

[pre]
proc sort data=sashelp.class out=class;
by sex;
proc univariate data=class;
by sex;
var height;
output out=percentiles1 mean=mean pctlpts=2.5 97.5 pctlpre=P;
run;

Obs Sex mean P2_5 P97_5

1 F 60.5889 51.3 66.5
2 M 63.9100 57.3 72.0
[/pre]
deleted_user
Not applicable
Hi data _null_;

I think there is a misunderstanding between us.

what I intended do is as following based on your code,

proc sort data=sashelp.class out=class;
by sex;
proc univariate data=class;
by sex;
var height;
output out=percentiles1 mean=mean pctlpts=2.5 97.5 pctlpre=P;
run;


proc mean data=percentiles1;
var P2_5;
run;

/*the last piece of code will not work because there is no P2_5 in percentiles1 */
data_null__
Jade | Level 19
You are going to have to show you work. When I run my program there IS a variable P2_5.
lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12
The code by data _null_ should work. I just wrote a demonstration program (with a few class factors), and you definitely get a variable called P2_5. Everything works fine.

data a;
do A = 1 to 5;
do B = 1 to 3;
do C = 1 to 7;
do rep = 1 to 200;
y = rannor(1);
output;
end;
end;
end;
end;
run;

proc print data=a(obs=300);
run;
proc univariate data=a;
by A B C;
var y;
output out=percentiles1 mean=mean pctlpts=2.5 97.5 pctlpre=P;
run;
proc print data=percentiles1;
run;
proc means data=percentiles1;
var P2_5;
run;
deleted_user
Not applicable
Hello:

Thanks for your guys help. The P2_5 works. I did not know this P2_5 is the name.
I am "confusing NAMES and LABELS".

Thanks again
Ksharp
Super User
Hi.
Another workaround way is to use ' proc rank ', set rank=1000,then value of rank with 25 is
the 2.5 percentila.



Ksharp

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 20293 views
  • 0 likes
  • 5 in conversation