BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
MarkGIP
Calcite | Level 5

Hi everyone!

I am fairly new to SAS. Maybe you could help me with two problems:

1. I am trying to export the quantiles for several consecutive variables with the PROC UNIVARIATE procedure. However, even though I am mentioning all variables in the VAR statement (VAR x1-x200) the output just shows the results for one of the variables. Is there a way to export the quantiles of all variables at once?

2. There are several variables (including the observations) which I would like to sum up. I'm well aware of the SUM function. What I'm looking for is a way in creating n new variables, each a sum of the already existing variables.

In detail: x1= a1+....+a100, x2=a101+...+a200,. .....  a50=a4900+...+a5000

Since I would rather not type in all SUM commands by hand I would be glad if one of you had an idea. I've already thought about arrays abut couldn't think of an appropriate way.

Looking forward to your help! THANK YOU!

1 ACCEPTED SOLUTION

Accepted Solutions
SteveDenham
Jade | Level 19

That seems odd--I ran that code on existing data and got all the quantiles for each variable, named as per x1_p30 (30th percentile of x1), x1_p60 (60th percentile of x1), etc.

Unfortunately, I agree that the decimal place thing doesn't work for PROC MEANS, and for PROC UNIVARIATE, it gets complicated with the pctlpts and pctlpre options.  However, I got the following to work quite well, and the variables px12_5, px197_5, px22_5, px297_5, px32_5, px397_5, px42_5, and px497_5 are in the output dataset with labels that look like "the   2.5000 percentile, x1", etc.:

proc univariate data=tryit;

var x1-x4;

output out=checkit2

     pctlpts= 2.5 97.5

     pctlpre= px1-px4;

run;

If you are only getting a single value, I have no idea what might be going on, unless you have BY, CLASS or WHERE statements that essentially partition the data before processing.  Can you show your code?

Steve Denham

View solution in original post

20 REPLIES 20
art297
Opal | Level 21

Not sure if I correctly understand your requests.  If I do, for Q1, use ods.  Take a look at:

http://support.sas.com/documentation/cdl/en/procstat/63104/HTML/default/viewer.htm#procstat_univaria...

For Q2, couldn't you just use a variable list?  E.g.,

data have;

  input a b c;

  sum=sum(of a--c);

  cards;

12 0  36

0  0  20

30 29  0

1  30  4

55  0  0

0  12  0

;

If the variables are as shown in your example, you could also use a numbered variable list an not have to worry about the actual order of the variables in your dataset.  e.g.

sum=sum(of a1-a5000);

MarkGIP
Calcite | Level 5

Thank you art297 for your quick reply!

Unfortunately my problems still remain. Please excuse my vague description.

To Question 1: I'd like to create a new dataset and not a table with the quantiles of the variables. I've used the following code:

proc univariated data=x;

Var x1 x2 x3

output out= quantiles pctlpts=90.0, 60.0, 30.0, pctlpre=q;

run;

However as mentioned in my original post, the dataset just gives me the result for one variable. Is there a way to export the quantiles of all variables at once?

To question 2: I am well aware of the sum(of....) command. In fact it's what I'm using at the moment. Though, I'm having lots of variables and I'd like sum up 100 of them in every new variable, i. e. not all in one variable - cause in this case the sum(of x1-x5000) would be the way to go.

In detail:

x1=a1+....+a100

x2=a101+....+a200

....

x50=a4900+....+a5000

At the moment the only way I can think of is the sum(of ...) command for every new variable - i.e. x1=sum(of a1 to a100). But that would mean a lot of typing for creating 50 new variables.

I hope you get a better picture of my difficulties. Maybe you could help me outSmiley Happy Thank you for your effort!!!

SteveDenham
Jade | Level 19

What about shifting to PROC MEANS from UNIVARIATE?

This should give an output dataset named work.want with variables x1_P30, x1_P60, x1_P90, x2_P30, etc.

proc means noprint data=have;

var x1-x200;

output out=want p30= p60= p90=/autoname;

run;

Steve Denham

art297
Opal | Level 21

And, in addition to Steve's advice, couldn't you do the sums with something like:

data have (drop=a: y:);

  array a(5000);

  array x(50);

  array y(100);

  retain a (5000*1);

  do i=1 to 4901 by 100;

    do j=0 to 99;

      y(j+1)=a(i+j);

    end;

    k+1;

    x(k)=sum(of y:);

  end;

run;

However, I'm not sure what you mean by you want a dataset rather than a table.  The words may be getting in the way, thus I think you'll have to explain what you mean by a "dataset".

MarkGIP
Calcite | Level 5

Thank you Steve and art297 for your great support!

To Steve's advice:

I've tried the PROC MEANS procedure but it won't let me output more than one quantile (i.e. p90 or p60) at once. In addition to that it does not seem to work for quantiles with decimal places (i.e. 99.9%).

To Art297's advice:

First of allI want to thank you for your patience with me. Unfortunately the code you posted just seems to work for variables with one observation. When I use it on more than one observation I get the error: "Array subscript out ofrange at line....". I've tried to find the cause for the error but do not have a clue how to change your code. To be frank I still do not get how your code exactly works. Why does summing up all y (y:) lead to x1-x50? Shouldn’t be the sum of all y be the same for all x? Could you tell me what the RETAIN a(5000*1) function does in the given example?

P.S.: With dataset I meant a new data file. With table I meant the print out in the results section.

Once again thank you for your awesome support!

SteveDenham
Jade | Level 19

That seems odd--I ran that code on existing data and got all the quantiles for each variable, named as per x1_p30 (30th percentile of x1), x1_p60 (60th percentile of x1), etc.

Unfortunately, I agree that the decimal place thing doesn't work for PROC MEANS, and for PROC UNIVARIATE, it gets complicated with the pctlpts and pctlpre options.  However, I got the following to work quite well, and the variables px12_5, px197_5, px22_5, px297_5, px32_5, px397_5, px42_5, and px497_5 are in the output dataset with labels that look like "the   2.5000 percentile, x1", etc.:

proc univariate data=tryit;

var x1-x4;

output out=checkit2

     pctlpts= 2.5 97.5

     pctlpre= px1-px4;

run;

If you are only getting a single value, I have no idea what might be going on, unless you have BY, CLASS or WHERE statements that essentially partition the data before processing.  Can you show your code?

Steve Denham

MarkGIP
Calcite | Level 5

Thank you so much Steve!!!

As it turns out, I forgot to state all variable names in the pctlpre option - I just put in letter q. So much for that....

Now there is only question 2 still to be solved....;)

Rick_SAS
SAS Super FREQ

Steve has the right idea.

Since "Mark is fairly new to SAS," I don't want to complicate the discussion. But for the "old timers" I will mention that if you have SAS/IML this problem is pretty simple:

proc iml;
use sashelp.class;
read all var _NUM_ into x[colname=varnames];

probs = {0.025 0.05 0.95 0.975};
call qntl(q, x, probs); /* compute quantiles of each column */
labels = "P" + strip(putn(probs, "5.3"));
labels = tranwrd(labels, ".", "_");
print q[rowname=labels colname=varnames];

/* then output matrix (or it's transpose) */

art297
Opal | Level 21

Mark, Here is the explanation you wanted in the form of comments in the code.  I'll break the code into two parts: one creating a test file to approximate your data, and another to do the calculation.

/*create a test file to mimic your real data. The test*/

/*file, have, will have 1 record and 5000 variables*/

data have;

  array a(5000);

/*  assign a 1 to all 5000 variables*/

  retain a (5000*1);

run;

data want (drop=a: y:);

  set have;

/*  assign all 5000 variables to the array a*/

  array a(5000) a1-a5000;

/*  initialize k to 0 at the beginning of reading each new record*/

  k=0;

/*initialize 2 additional arrays to represent x1 to*/

/*x50 .. the variables that will contain the sums and*/

/*y1 to y100 .. the variables that will temporarily*/

/*contain the values from each set of 100 variables*/

  array x(50);

  array y(100);

/*loop through the variables 100 at a time*/

  do i=1 to 4901 by 100;

/*fill the array y with each set of 100 variables*/

    do j=0 to 99;

      y(j+1)=a(i+j);

    end;

/*calculate the desired sum*/

    k+1;

    x(k)=sum(of y:);

/*move onto the next set of 100 variables*/

  end;

run;

MarkGIP
Calcite | Level 5

Thank you all!

@Rick: Thank you for the alternative method!

@Art297: Wow.... Thank you so much! The comments really help a lot... But I still think that this code is not working in case of more than one record for each variable (see my last reply):(

art297
Opal | Level 21

You would have to explain your data with an example.  It sounds like you first want to sum or doing something with the individual records before doing your analysis .. like maybe first run proc summary to create a file with aggregated data.

MarkGIP
Calcite | Level 5

@Art297

Aggregated data wouldn't help. To give you an idea:

Let's assume my data consists of 5000 variables (a1-a5000) each with 100 records.

My goal is to sum up those variables as follows (horizontal addition):

x1= a1+...+a100 .....x50=a4900+...+a5000

Each new variable (x1 to x50) should have 100 records.

Maybe this helps to clear things up! THANK YOU!

Ksharp
Super User

How about:

options symbolgen mprint mlogic;
%macro a;
%let start=1;
%let end=100;
data want;
 set have;
 %do i=1 %to 50;
  x&i=sum(of a&start - a&end);
  %let start=%eval(&end+1);
  %let end=%eval(&start+99);
 %end;
run;
%mend a;

%a


Ksharp

art297
Opal | Level 21

KSharp,

That is what my originally suggested code did but the OP said it wasn't what was needed.  I'm confused about what the OP is seeking.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 20 replies
  • 3275 views
  • 7 likes
  • 5 in conversation