Combinations means of dataset

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 7
Accepted Solution

Combinations means of dataset

Hi there, 

I am new to SAS and I am struggling with the following scenario. 

I have a dataset of 20 valiables and each one has a value. I am trying to choose 3 of the 20 and compute the mean for each one of the rows. Basically there would be 1140 additional columns to my dataset and in this case order does not matter.

 

Example attached - my columns are from A-F and the combinations are ABC...BCE...

 

I am trying to do the following and put statement seems to work fine but i can't figure out how to add those additional columns

and compute the mean. Right now ,it just adds columns x1 through x6, k, n, ncomb, j.  

data combination;
   set comp4;
   by year gvkey;

   array x[6] $15 ('A' 'B' 'C' 'D' 'E' 'F');
   k=3;
   n=dim(x);
   ncomb=comb(n,k);
   do j=1 to ncomb+1;
      call allcomb(j, k, of x[*]);
	  AggRank = mean(x1,x2,x3);
	  /*put j 5. +3 x1-x3;*/
   end;
run;

 

Any help is appreciated. 

Thank you.

Attachment

Accepted Solutions
Solution
‎04-09-2017 10:13 PM
Grand Advisor
Posts: 17,294

Re: Combinations means of dataset

You're close with your solution, but you need to convert the array names to variable values. Here's a long winded version of a solution, there's probably easier ways to simplify this code but each step helps to illustrate the issues while learning. 

 

Key concepts:

VVALUEX -> retrieve the value of a variable when you know the name

CATT -> concatenate variable values with no trailing spaces

INPUT -> convert character value to a numeric value

 

When this step is done you can transpose (PROC TRANSPOSE) your solution to get the wide data set you wanted. 

 

data combination;
    set have;
    
    array x[6] A B C D E F;
    array nms(6) $  nms1-nms6 ('A' 'B' 'C' 'D' 'E' 'F');
    

    
    k=3;
    n=dim(x);
    ncomb=comb(n, k);

    do j=1 to ncomb+1;
        call allcomb(j, k, of nms[*]);
        comb = catt(of nms1-nms3);
        
        v1 = input(vvaluex(nms(1)), best12.);
        v2 = input(vvaluex(nms(2)), best12.);
        v3 = input(vvaluex(nms(3)), best12.);
        
        mean = mean(of v1-v3);

        output;

        /*put j 5. +3 x1-x3;*/
    end;
run;

 

 

 

 

 

FYI - Please do not post data as an image. I will not type it out and you are more likely to get a response if people can work off sample data. 

 

 

View solution in original post


All Replies
Solution
‎04-09-2017 10:13 PM
Grand Advisor
Posts: 17,294

Re: Combinations means of dataset

You're close with your solution, but you need to convert the array names to variable values. Here's a long winded version of a solution, there's probably easier ways to simplify this code but each step helps to illustrate the issues while learning. 

 

Key concepts:

VVALUEX -> retrieve the value of a variable when you know the name

CATT -> concatenate variable values with no trailing spaces

INPUT -> convert character value to a numeric value

 

When this step is done you can transpose (PROC TRANSPOSE) your solution to get the wide data set you wanted. 

 

data combination;
    set have;
    
    array x[6] A B C D E F;
    array nms(6) $  nms1-nms6 ('A' 'B' 'C' 'D' 'E' 'F');
    

    
    k=3;
    n=dim(x);
    ncomb=comb(n, k);

    do j=1 to ncomb+1;
        call allcomb(j, k, of nms[*]);
        comb = catt(of nms1-nms3);
        
        v1 = input(vvaluex(nms(1)), best12.);
        v2 = input(vvaluex(nms(2)), best12.);
        v3 = input(vvaluex(nms(3)), best12.);
        
        mean = mean(of v1-v3);

        output;

        /*put j 5. +3 x1-x3;*/
    end;
run;

 

 

 

 

 

FYI - Please do not post data as an image. I will not type it out and you are more likely to get a response if people can work off sample data. 

 

 

Occasional Contributor
Posts: 7

Re: Combinations means of dataset

Hi Reeza, 

 

Thank you for your reply. It is basically giving me the data that I wanted. There is only one problem. 

When the dataset is created based on the test data set below, it will create 80 different rows of mean values. 

If I still do 6 choose 3, that means 20 different combinations for each of the rows of data. Since i have 4 rows then the total is 80. 

So, the dataset created has the right data.

 

 

data testData;
infile datalines delimiter=','; 
INPUT A B C D E F;
DATALINES;
12,78,54,75,85,89
45,56,32,425,45,45
74,45,65,78,56,12
12,12,45,78,12,45
; 

 

When I try to transpose, I only want to have 20 different mean values as columns. That means i have to transpose by "comb" column or group the values somehow. But "comb" has values 'ABC' or 'BCA' or 'CAB' or 'ACB'. So, instead of getting 20 columns, i am getting 80. Is there a way to either sort the value of "comb" when it is getting created. Something like this:

 

comb = sort(catt(of nms1-nms3));

 That way all combinations of A B C are "ABC" and not something else. Then i belive i can do the transpose by "comb".

 

Hopefully i didnt confuse you. 

Thanks again for the help. 

Grand Advisor
Posts: 17,294

Re: Combinations means of dataset

CALL SORTC I think? 

 

Occasional Contributor
Posts: 7

Re: Combinations means of dataset

Reza
 
thank you for your help. 
Sortc is not helping much since im sorting nms1-nms3, and its not working correctly.
call allcomb(j, k, of nms[*]);

call sortc(of nms1-nms3);

comb = catt(of nms1-nms3);
 
what i am hoping to do is the following
 
it would be great if we can have the do loop produce the means for 1 combination at a time
 
instead of looping throguh 20 combination 
 
so what i am hopoing for is a 7th column labeled ABC
 
with the means of the 4 rows reported
 
then i hope for the loop to start doing ABD
 
aond compute the means of the 4 rows in each respective row 
 
before moving to ABE
 
at present the code is running ABC, ABD, ABE and producing 80 lines of code and showing all interim mean computations
 
ideally the final output should be 20 columns (6 Choose 3) 
 
and 4 rows that reports the averages of those those 3 columns
 
our dataset is large and it would be relatively inefficient if we perhaps continued this way
Grand Advisor
Posts: 17,294

Re: Combinations means of dataset

At this point I have no idea what you're looking for. 

If you'd like further help, post the sample solution - fully - for your sample data. 

 

Also, review the documentation for ALLCOMB. You'll notice they refer to LEXCOMB which will do it in order - though I have no idea why that matters at the moment. The TRANSPOSE should have reordered everything correctly. 

Grand Advisor
Posts: 10,192

Re: Combinations means of dataset

Show what you want the data to look like for an example that includes your
"and 4 rows that reports the averages of those those 3 columns"

 

 

Super Contributor
Posts: 413

Re: Combinations means of dataset

Hi Reeza,

 

why did you do in the code 

do j=1 to ncomb+1;

instead of  "do j=1 to ncomb;" 

 

I ran the code on a small sample data:

data have;
input A B C D E F;
datalines;
1 2 3 4 5 6
10 20 30 40 50 60
;
run;

 

and noticed that the 21st combination for each original observation is the exact copy of the 20th combination (and in this case ncomb = 20)?

Grand Advisor
Posts: 17,294

Re: Combinations means of dataset


ilikesas wrote:

Hi Reeza,

 

why did you do in the code 

do j=1 to ncomb+1;

instead of  "do j=1 to ncomb;" 

 

I ran the code on a small sample data:

data have;
input A B C D E F;
datalines;
1 2 3 4 5 6
10 20 30 40 50 60
;
run;

 

and noticed that the 21st combination for each original observation is the exact copy of the 20th combination (and in this case ncomb = 20)?


I didn't check that logic to be honest, the OPs initial code used it and I left it in, but clearly that doesn't make sense. It should probably be ncomb. 

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 177 views
  • 2 likes
  • 4 in conversation