turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- General Programming
- /
- Combinations means of dataset

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-09-2017 02:38 PM

Hi there,

I am new to SAS and I am struggling with the following scenario.

I have a dataset of 20 valiables and each one has a value. I am trying to choose 3 of the 20 and compute the mean for each one of the rows. Basically there would be 1140 additional columns to my dataset and in this case order does not matter.

Example attached - my columns are from A-F and the combinations are ABC...BCE...

I am trying to do the following and put statement seems to work fine but i can't figure out how to add those additional columns

and compute the mean. Right now ,it just adds columns x1 through x6, k, n, ncomb, j.

data combination; set comp4; by year gvkey; array x[6] $15 ('A' 'B' 'C' 'D' 'E' 'F'); k=3; n=dim(x); ncomb=comb(n,k); do j=1 to ncomb+1; call allcomb(j, k, of x[*]); AggRank = mean(x1,x2,x3); /*put j 5. +3 x1-x3;*/ end; run;

Any help is appreciated.

Thank you.

Accepted Solutions

Solution

04-09-2017
10:13 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-09-2017 03:45 PM

You're close with your solution, but you need to convert the array names to variable values. Here's a long winded version of a solution, there's probably easier ways to simplify this code but each step helps to illustrate the issues while learning.

Key concepts:

VVALUEX -> retrieve the value of a variable when you know the name

CATT -> concatenate variable values with no trailing spaces

INPUT -> convert character value to a numeric value

When this step is done you can transpose (PROC TRANSPOSE) your solution to get the wide data set you wanted.

```
data combination;
set have;
array x[6] A B C D E F;
array nms(6) $ nms1-nms6 ('A' 'B' 'C' 'D' 'E' 'F');
k=3;
n=dim(x);
ncomb=comb(n, k);
do j=1 to ncomb+1;
call allcomb(j, k, of nms[*]);
comb = catt(of nms1-nms3);
v1 = input(vvaluex(nms(1)), best12.);
v2 = input(vvaluex(nms(2)), best12.);
v3 = input(vvaluex(nms(3)), best12.);
mean = mean(of v1-v3);
output;
/*put j 5. +3 x1-x3;*/
end;
run;
```

FYI - Please do not post data as an image. I will not type it out and you are more likely to get a response if people can work off sample data.

All Replies

Solution

04-09-2017
10:13 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-09-2017 03:45 PM

You're close with your solution, but you need to convert the array names to variable values. Here's a long winded version of a solution, there's probably easier ways to simplify this code but each step helps to illustrate the issues while learning.

Key concepts:

VVALUEX -> retrieve the value of a variable when you know the name

CATT -> concatenate variable values with no trailing spaces

INPUT -> convert character value to a numeric value

When this step is done you can transpose (PROC TRANSPOSE) your solution to get the wide data set you wanted.

```
data combination;
set have;
array x[6] A B C D E F;
array nms(6) $ nms1-nms6 ('A' 'B' 'C' 'D' 'E' 'F');
k=3;
n=dim(x);
ncomb=comb(n, k);
do j=1 to ncomb+1;
call allcomb(j, k, of nms[*]);
comb = catt(of nms1-nms3);
v1 = input(vvaluex(nms(1)), best12.);
v2 = input(vvaluex(nms(2)), best12.);
v3 = input(vvaluex(nms(3)), best12.);
mean = mean(of v1-v3);
output;
/*put j 5. +3 x1-x3;*/
end;
run;
```

FYI - Please do not post data as an image. I will not type it out and you are more likely to get a response if people can work off sample data.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-09-2017 05:35 PM

Hi Reeza,

Thank you for your reply. It is basically giving me the data that I wanted. There is only one problem.

When the dataset is created based on the test data set below, it will create 80 different rows of mean values.

If I still do 6 choose 3, that means 20 different combinations for each of the rows of data. Since i have 4 rows then the total is 80.

So, the dataset created has the right data.

data testData; infile datalines delimiter=','; INPUT A B C D E F; DATALINES; 12,78,54,75,85,89 45,56,32,425,45,45 74,45,65,78,56,12 12,12,45,78,12,45 ;

When I try to transpose, I only want to have 20 different mean values as columns. That means i have to transpose by "comb" column or group the values somehow. But "comb" has values 'ABC' or 'BCA' or 'CAB' or 'ACB'. So, instead of getting 20 columns, i am getting 80. Is there a way to either sort the value of "comb" when it is getting created. Something like this:

`comb = sort(catt(of nms1-nms3));`

That way all combinations of A B C are "ABC" and not something else. Then i belive i can do the transpose by "comb".

Hopefully i didnt confuse you.

Thanks again for the help.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-09-2017 05:46 PM

CALL SORTC I think?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-09-2017 06:55 PM

Reza

Sortc is not helping much since im sorting nms1-nms3, and its not working correctly.

call allcomb(j, k, of nms[*]); call sortc(of nms1-nms3); comb = catt(of nms1-nms3);

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-09-2017 08:04 PM

At this point I have no idea what you're looking for.

If you'd like further help, post the sample solution - fully - for your sample data.

Also, review the documentation for ALLCOMB. You'll notice they refer to LEXCOMB which will do it in order - though I have no idea why that matters at the moment. The TRANSPOSE should have reordered everything correctly.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-10-2017 04:26 PM

Show what you want the data to look like for an example that includes your

"and 4 rows that reports the averages of those those 3 columns"

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-10-2017 06:43 PM

Hi Reeza,

why did you do in the code

`do j=1 to ncomb+1;`

instead of "do j=1 to ncomb;"

I ran the code on a small sample data:

data have;

input A B C D E F;

datalines;

1 2 3 4 5 6

10 20 30 40 50 60

;

run;

and noticed that the 21st combination for each original observation is the exact copy of the 20th combination (and in this case ncomb = 20)?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-10-2017 06:48 PM

ilikesas wrote:

Hi Reeza,

why did you do in the code

`do j=1 to ncomb+1;`

instead of "do j=1 to ncomb;"

I ran the code on a small sample data:

data have;

input A B C D E F;

datalines;

1 2 3 4 5 6

10 20 30 40 50 60

;

run;

and noticed that the 21st combination for each original observation is the exact copy of the 20th combination (and in this case ncomb = 20)?

I didn't check that logic to be honest, the OPs initial code used it and I left it in, but clearly that doesn't make sense. It should probably be ncomb.