Hi there,
I am new to SAS and I am struggling with the following scenario.
I have a dataset of 20 valiables and each one has a value. I am trying to choose 3 of the 20 and compute the mean for each one of the rows. Basically there would be 1140 additional columns to my dataset and in this case order does not matter.
Example attached - my columns are from A-F and the combinations are ABC...BCE...
I am trying to do the following and put statement seems to work fine but i can't figure out how to add those additional columns
and compute the mean. Right now ,it just adds columns x1 through x6, k, n, ncomb, j.
data combination; set comp4; by year gvkey; array x[6] $15 ('A' 'B' 'C' 'D' 'E' 'F'); k=3; n=dim(x); ncomb=comb(n,k); do j=1 to ncomb+1; call allcomb(j, k, of x[*]); AggRank = mean(x1,x2,x3); /*put j 5. +3 x1-x3;*/ end; run;
Any help is appreciated.
Thank you.
You're close with your solution, but you need to convert the array names to variable values. Here's a long winded version of a solution, there's probably easier ways to simplify this code but each step helps to illustrate the issues while learning.
Key concepts:
VVALUEX -> retrieve the value of a variable when you know the name
CATT -> concatenate variable values with no trailing spaces
INPUT -> convert character value to a numeric value
When this step is done you can transpose (PROC TRANSPOSE) your solution to get the wide data set you wanted.
data combination;
set have;
array x[6] A B C D E F;
array nms(6) $ nms1-nms6 ('A' 'B' 'C' 'D' 'E' 'F');
k=3;
n=dim(x);
ncomb=comb(n, k);
do j=1 to ncomb+1;
call allcomb(j, k, of nms[*]);
comb = catt(of nms1-nms3);
v1 = input(vvaluex(nms(1)), best12.);
v2 = input(vvaluex(nms(2)), best12.);
v3 = input(vvaluex(nms(3)), best12.);
mean = mean(of v1-v3);
output;
/*put j 5. +3 x1-x3;*/
end;
run;
FYI - Please do not post data as an image. I will not type it out and you are more likely to get a response if people can work off sample data.
You're close with your solution, but you need to convert the array names to variable values. Here's a long winded version of a solution, there's probably easier ways to simplify this code but each step helps to illustrate the issues while learning.
Key concepts:
VVALUEX -> retrieve the value of a variable when you know the name
CATT -> concatenate variable values with no trailing spaces
INPUT -> convert character value to a numeric value
When this step is done you can transpose (PROC TRANSPOSE) your solution to get the wide data set you wanted.
data combination;
set have;
array x[6] A B C D E F;
array nms(6) $ nms1-nms6 ('A' 'B' 'C' 'D' 'E' 'F');
k=3;
n=dim(x);
ncomb=comb(n, k);
do j=1 to ncomb+1;
call allcomb(j, k, of nms[*]);
comb = catt(of nms1-nms3);
v1 = input(vvaluex(nms(1)), best12.);
v2 = input(vvaluex(nms(2)), best12.);
v3 = input(vvaluex(nms(3)), best12.);
mean = mean(of v1-v3);
output;
/*put j 5. +3 x1-x3;*/
end;
run;
FYI - Please do not post data as an image. I will not type it out and you are more likely to get a response if people can work off sample data.
Hi Reeza,
Thank you for your reply. It is basically giving me the data that I wanted. There is only one problem.
When the dataset is created based on the test data set below, it will create 80 different rows of mean values.
If I still do 6 choose 3, that means 20 different combinations for each of the rows of data. Since i have 4 rows then the total is 80.
So, the dataset created has the right data.
data testData; infile datalines delimiter=','; INPUT A B C D E F; DATALINES; 12,78,54,75,85,89 45,56,32,425,45,45 74,45,65,78,56,12 12,12,45,78,12,45 ;
When I try to transpose, I only want to have 20 different mean values as columns. That means i have to transpose by "comb" column or group the values somehow. But "comb" has values 'ABC' or 'BCA' or 'CAB' or 'ACB'. So, instead of getting 20 columns, i am getting 80. Is there a way to either sort the value of "comb" when it is getting created. Something like this:
comb = sort(catt(of nms1-nms3));
That way all combinations of A B C are "ABC" and not something else. Then i belive i can do the transpose by "comb".
Hopefully i didnt confuse you.
Thanks again for the help.
CALL SORTC I think?
call allcomb(j, k, of nms[*]); call sortc(of nms1-nms3); comb = catt(of nms1-nms3);
At this point I have no idea what you're looking for.
If you'd like further help, post the sample solution - fully - for your sample data.
Also, review the documentation for ALLCOMB. You'll notice they refer to LEXCOMB which will do it in order - though I have no idea why that matters at the moment. The TRANSPOSE should have reordered everything correctly.
Show what you want the data to look like for an example that includes your
"and 4 rows that reports the averages of those those 3 columns"
Hi Reeza,
why did you do in the code
do j=1 to ncomb+1;
instead of "do j=1 to ncomb;"
I ran the code on a small sample data:
data have;
input A B C D E F;
datalines;
1 2 3 4 5 6
10 20 30 40 50 60
;
run;
and noticed that the 21st combination for each original observation is the exact copy of the 20th combination (and in this case ncomb = 20)?
@ilikesas wrote:
Hi Reeza,
why did you do in the code
do j=1 to ncomb+1;
instead of "do j=1 to ncomb;"
I ran the code on a small sample data:
data have;
input A B C D E F;
datalines;
1 2 3 4 5 6
10 20 30 40 50 60
;
run;
and noticed that the 21st combination for each original observation is the exact copy of the 20th combination (and in this case ncomb = 20)?
I didn't check that logic to be honest, the OPs initial code used it and I left it in, but clearly that doesn't make sense. It should probably be ncomb.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.