BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
scrapex87
Fluorite | Level 6

Hi there, 

I am new to SAS and I am struggling with the following scenario. 

I have a dataset of 20 valiables and each one has a value. I am trying to choose 3 of the 20 and compute the mean for each one of the rows. Basically there would be 1140 additional columns to my dataset and in this case order does not matter.

 

Example attached - my columns are from A-F and the combinations are ABC...BCE...

 

I am trying to do the following and put statement seems to work fine but i can't figure out how to add those additional columns

and compute the mean. Right now ,it just adds columns x1 through x6, k, n, ncomb, j.  

data combination;
   set comp4;
   by year gvkey;

   array x[6] $15 ('A' 'B' 'C' 'D' 'E' 'F');
   k=3;
   n=dim(x);
   ncomb=comb(n,k);
   do j=1 to ncomb+1;
      call allcomb(j, k, of x[*]);
	  AggRank = mean(x1,x2,x3);
	  /*put j 5. +3 x1-x3;*/
   end;
run;

 

Any help is appreciated. 

Thank you.


exampl3.PNG
1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

You're close with your solution, but you need to convert the array names to variable values. Here's a long winded version of a solution, there's probably easier ways to simplify this code but each step helps to illustrate the issues while learning. 

 

Key concepts:

VVALUEX -> retrieve the value of a variable when you know the name

CATT -> concatenate variable values with no trailing spaces

INPUT -> convert character value to a numeric value

 

When this step is done you can transpose (PROC TRANSPOSE) your solution to get the wide data set you wanted. 

 

data combination;
    set have;
    
    array x[6] A B C D E F;
    array nms(6) $  nms1-nms6 ('A' 'B' 'C' 'D' 'E' 'F');
    

    
    k=3;
    n=dim(x);
    ncomb=comb(n, k);

    do j=1 to ncomb+1;
        call allcomb(j, k, of nms[*]);
        comb = catt(of nms1-nms3);
        
        v1 = input(vvaluex(nms(1)), best12.);
        v2 = input(vvaluex(nms(2)), best12.);
        v3 = input(vvaluex(nms(3)), best12.);
        
        mean = mean(of v1-v3);

        output;

        /*put j 5. +3 x1-x3;*/
    end;
run;

 

 

 

 

 

FYI - Please do not post data as an image. I will not type it out and you are more likely to get a response if people can work off sample data. 

 

 

View solution in original post

8 REPLIES 8
Reeza
Super User

You're close with your solution, but you need to convert the array names to variable values. Here's a long winded version of a solution, there's probably easier ways to simplify this code but each step helps to illustrate the issues while learning. 

 

Key concepts:

VVALUEX -> retrieve the value of a variable when you know the name

CATT -> concatenate variable values with no trailing spaces

INPUT -> convert character value to a numeric value

 

When this step is done you can transpose (PROC TRANSPOSE) your solution to get the wide data set you wanted. 

 

data combination;
    set have;
    
    array x[6] A B C D E F;
    array nms(6) $  nms1-nms6 ('A' 'B' 'C' 'D' 'E' 'F');
    

    
    k=3;
    n=dim(x);
    ncomb=comb(n, k);

    do j=1 to ncomb+1;
        call allcomb(j, k, of nms[*]);
        comb = catt(of nms1-nms3);
        
        v1 = input(vvaluex(nms(1)), best12.);
        v2 = input(vvaluex(nms(2)), best12.);
        v3 = input(vvaluex(nms(3)), best12.);
        
        mean = mean(of v1-v3);

        output;

        /*put j 5. +3 x1-x3;*/
    end;
run;

 

 

 

 

 

FYI - Please do not post data as an image. I will not type it out and you are more likely to get a response if people can work off sample data. 

 

 

scrapex87
Fluorite | Level 6

Hi Reeza, 

 

Thank you for your reply. It is basically giving me the data that I wanted. There is only one problem. 

When the dataset is created based on the test data set below, it will create 80 different rows of mean values. 

If I still do 6 choose 3, that means 20 different combinations for each of the rows of data. Since i have 4 rows then the total is 80. 

So, the dataset created has the right data.

 

 

data testData;
infile datalines delimiter=','; 
INPUT A B C D E F;
DATALINES;
12,78,54,75,85,89
45,56,32,425,45,45
74,45,65,78,56,12
12,12,45,78,12,45
; 

 

When I try to transpose, I only want to have 20 different mean values as columns. That means i have to transpose by "comb" column or group the values somehow. But "comb" has values 'ABC' or 'BCA' or 'CAB' or 'ACB'. So, instead of getting 20 columns, i am getting 80. Is there a way to either sort the value of "comb" when it is getting created. Something like this:

 

comb = sort(catt(of nms1-nms3));

 That way all combinations of A B C are "ABC" and not something else. Then i belive i can do the transpose by "comb".

 

Hopefully i didnt confuse you. 

Thanks again for the help. 

Reeza
Super User

CALL SORTC I think? 

 

scrapex87
Fluorite | Level 6
Reza
 
thank you for your help. 
Sortc is not helping much since im sorting nms1-nms3, and its not working correctly.
call allcomb(j, k, of nms[*]);

call sortc(of nms1-nms3);

comb = catt(of nms1-nms3);
 
what i am hoping to do is the following
 
it would be great if we can have the do loop produce the means for 1 combination at a time
 
instead of looping throguh 20 combination 
 
so what i am hopoing for is a 7th column labeled ABC
 
with the means of the 4 rows reported
 
then i hope for the loop to start doing ABD
 
aond compute the means of the 4 rows in each respective row 
 
before moving to ABE
 
at present the code is running ABC, ABD, ABE and producing 80 lines of code and showing all interim mean computations
 
ideally the final output should be 20 columns (6 Choose 3) 
 
and 4 rows that reports the averages of those those 3 columns
 
our dataset is large and it would be relatively inefficient if we perhaps continued this way
Reeza
Super User

At this point I have no idea what you're looking for. 

If you'd like further help, post the sample solution - fully - for your sample data. 

 

Also, review the documentation for ALLCOMB. You'll notice they refer to LEXCOMB which will do it in order - though I have no idea why that matters at the moment. The TRANSPOSE should have reordered everything correctly. 

ballardw
Super User

Show what you want the data to look like for an example that includes your
"and 4 rows that reports the averages of those those 3 columns"

 

 

ilikesas
Barite | Level 11

Hi Reeza,

 

why did you do in the code 

do j=1 to ncomb+1;

instead of  "do j=1 to ncomb;" 

 

I ran the code on a small sample data:

data have;
input A B C D E F;
datalines;
1 2 3 4 5 6
10 20 30 40 50 60
;
run;

 

and noticed that the 21st combination for each original observation is the exact copy of the 20th combination (and in this case ncomb = 20)?

Reeza
Super User

@ilikesas wrote:

Hi Reeza,

 

why did you do in the code 

do j=1 to ncomb+1;

instead of  "do j=1 to ncomb;" 

 

I ran the code on a small sample data:

data have;
input A B C D E F;
datalines;
1 2 3 4 5 6
10 20 30 40 50 60
;
run;

 

and noticed that the 21st combination for each original observation is the exact copy of the 20th combination (and in this case ncomb = 20)?


I didn't check that logic to be honest, the OPs initial code used it and I left it in, but clearly that doesn't make sense. It should probably be ncomb. 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 1736 views
  • 2 likes
  • 4 in conversation