Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Re: Quantiles and Sum of Variables...

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 11-02-2011 10:22 PM
(3465 views)

Hi everyone!

I am fairly new to SAS. Maybe you could help me with two problems:

1. I am trying to export the quantiles for several consecutive variables with the PROC UNIVARIATE procedure. However, even though I am mentioning all variables in the VAR statement (VAR x1-x200) the output just shows the results for one of the variables. Is there a way to export the quantiles of all variables at once?

2. There are several variables (including the observations) which I would like to sum up. I'm well aware of the SUM function. What I'm looking for is a way in creating n new variables, each a sum of the already existing variables.

In detail: x1= a1+....+a100, x2=a101+...+a200,. ..... a50=a4900+...+a5000

Since I would rather not type in all SUM commands by hand I would be glad if one of you had an idea. I've already thought about arrays abut couldn't think of an appropriate way.

Looking forward to your help! THANK YOU!

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

That seems odd--I ran that code on existing data and got all the quantiles for each variable, named as per x1_p30 (30th percentile of x1), x1_p60 (60th percentile of x1), etc.

Unfortunately, I agree that the decimal place thing doesn't work for PROC MEANS, and for PROC UNIVARIATE, it gets complicated with the pctlpts and pctlpre options. However, I got the following to work quite well, and the variables px12_5, px197_5, px22_5, px297_5, px32_5, px397_5, px42_5, and px497_5 are in the output dataset with labels that look like "the 2.5000 percentile, x1", etc.:

proc univariate data=tryit;

var x1-x4;

output out=checkit2

pctlpts= 2.5 97.5

pctlpre= px1-px4;

run;

If you are only getting a single value, I have no idea what might be going on, unless you have BY, CLASS or WHERE statements that essentially partition the data before processing. Can you show your code?

Steve Denham

** **

20 REPLIES 20

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Not sure if I correctly understand your requests. If I do, for Q1, use ods. Take a look at:

For Q2, couldn't you just use a variable list? E.g.,

data have;

input a b c;

sum=sum(of a--c);

cards;

12 0 36

0 0 20

30 29 0

1 30 4

55 0 0

0 12 0

;

If the variables are as shown in your example, you could also use a numbered variable list an not have to worry about the actual order of the variables in your dataset. e.g.

sum=sum(of a1-a5000);

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thank you art297 for your quick reply!

Unfortunately my problems still remain. Please excuse my vague description.

To Question 1: I'd like to create a new dataset and not a table with the quantiles of the variables. I've used the following code:

*proc univariated data=x;*

*Var x1 x2 x3*

*output out= quantiles pctlpts=90.0, 60.0, 30.0, pctlpre=q;*

*run;*

However as mentioned in my original post, the dataset just gives me the result for one variable. Is there a way to export the quantiles of all variables at once?

To question 2: I am well aware of the sum(of....) command. In fact it's what I'm using at the moment. Though, I'm having lots of variables and I'd like sum up 100 of them in every new variable, i. e. not all in one variable - cause in this case the sum(of x1-x5000) would be the way to go.

In detail:

x1=a1+....+a100

x2=a101+....+a200

....

x50=a4900+....+a5000

At the moment the only way I can think of is the sum(of ...) command for every new variable - i.e. x1=sum(of a1 to a100). But that would mean a lot of typing for creating 50 new variables.

I hope you get a better picture of my difficulties. Maybe you could help me out Thank you for your effort!!!

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

What about shifting to PROC MEANS from UNIVARIATE?

This should give an output dataset named work.want with variables x1_P30, x1_P60, x1_P90, x2_P30, etc.

proc means noprint data=have;

var x1-x200;

output out=want p30= p60= p90=/autoname;

run;

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

And, in addition to Steve's advice, couldn't you do the sums with something like:

data have (drop=a: y:);

array a(5000);

array x(50);

array y(100);

retain a (5000*1);

do i=1 to 4901 by 100;

do j=0 to 99;

y(j+1)=a(i+j);

end;

k+1;

x(k)=sum(of y:);

end;

run;

However, I'm not sure what you mean by you want a dataset rather than a table. The words may be getting in the way, thus I think you'll have to explain what you mean by a "dataset".

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thank you Steve and art297 for your great support!

To Steve's advice:

I've tried the PROC MEANS procedure but it won't let me output more than one quantile (i.e. p90 or p60) at once. In addition to that it does not seem to work for quantiles with decimal places (i.e. 99.9%).

To Art297's advice:

First of allI want to thank you for your patience with me. Unfortunately the code you posted just seems to work for variables with one observation. When I use it on more than one observation I get the error: "Array subscript out ofrange at line....". I've tried to find the cause for the error but do not have a clue how to change your code. To be frank I still do not get how your code exactly works. Why does summing up all y (y:) lead to x1-x50? Shouldn’t be the sum of all y be the same for all x? Could you tell me what the RETAIN a(5000*1) function does in the given example?

P.S.: With dataset I meant a new data file. With table I meant the print out in the results section.

Once again thank you for your awesome support!

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

That seems odd--I ran that code on existing data and got all the quantiles for each variable, named as per x1_p30 (30th percentile of x1), x1_p60 (60th percentile of x1), etc.

Unfortunately, I agree that the decimal place thing doesn't work for PROC MEANS, and for PROC UNIVARIATE, it gets complicated with the pctlpts and pctlpre options. However, I got the following to work quite well, and the variables px12_5, px197_5, px22_5, px297_5, px32_5, px397_5, px42_5, and px497_5 are in the output dataset with labels that look like "the 2.5000 percentile, x1", etc.:

proc univariate data=tryit;

var x1-x4;

output out=checkit2

pctlpts= 2.5 97.5

pctlpre= px1-px4;

run;

If you are only getting a single value, I have no idea what might be going on, unless you have BY, CLASS or WHERE statements that essentially partition the data before processing. Can you show your code?

Steve Denham

** **

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thank you so much Steve!!!

As it turns out, I forgot to state all variable names in the pctlpre option - I just put in letter q. So much for that....

Now there is only question 2 still to be solved....;)

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Steve has the right idea.

Since "Mark is fairly new to SAS," I don't want to complicate the discussion. But for the "old timers" I will mention that if you have SAS/IML this problem is pretty simple:

proc iml;

use sashelp.class;

read all var _NUM_ into x[colname=varnames];

probs = {0.025 0.05 0.95 0.975};

call qntl(q, x, probs); /* compute quantiles of each column */

labels = "P" + strip(putn(probs, "5.3"));

labels = tranwrd(labels, ".", "_");

print q[rowname=labels colname=varnames];

/* then output matrix (or it's transpose) */

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Mark, Here is the explanation you wanted in the form of comments in the code. I'll break the code into two parts: one creating a test file to approximate your data, and another to do the calculation.

/*create a test file to mimic your real data. The test*/

/*file, have, will have 1 record and 5000 variables*/

data have;

array a(5000);

/* assign a 1 to all 5000 variables*/

retain a (5000*1);

run;

data want (drop=a: y:);

set have;

/* assign all 5000 variables to the array a*/

array a(5000) a1-a5000;

/* initialize k to 0 at the beginning of reading each new record*/

k=0;

/*initialize 2 additional arrays to represent x1 to*/

/*x50 .. the variables that will contain the sums and*/

/*y1 to y100 .. the variables that will temporarily*/

/*contain the values from each set of 100 variables*/

array x(50);

array y(100);

/*loop through the variables 100 at a time*/

do i=1 to 4901 by 100;

/*fill the array y with each set of 100 variables*/

do j=0 to 99;

y(j+1)=a(i+j);

end;

/*calculate the desired sum*/

k+1;

x(k)=sum(of y:);

/*move onto the next set of 100 variables*/

end;

run;

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thank you all!

@Rick: Thank you for the alternative method!

@Art297: Wow.... Thank you so much! The comments really help a lot... But I still think that this code is not working in case of more than one record for each variable (see my last reply):(

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@Art297

Aggregated data wouldn't help. To give you an idea:

Let's assume my data consists of 5000 variables (a1-a5000) each with 100 records.

My goal is to sum up those variables as follows (horizontal addition):

x1= a1+...+a100 .....x50=a4900+...+a5000

Each new variable (x1 to x50) should have 100 records.

Maybe this helps to clear things up! THANK YOU!

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

How about:

options symbolgen mprint mlogic; %macro a; %let start=1; %let end=100; data want; set have; %do i=1 %to 50; x&i=sum(of a&start - a&end); %let start=%eval(&end+1); %let end=%eval(&start+99); %end; run; %mend a; %a

Ksharp

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

KSharp,

That is what my originally suggested code did but the OP said it wasn't what was needed. I'm confused about what the OP is seeking.

**Available on demand!**

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.