Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 05-26-2016 03:07 PM
(2852 views)

Hello,

I am a new SAS user, and I apologize if this question has been asked before but I couldn't find any information on this in the forums. What I am trying to do is import an Excel Sheet that will have a table with customer information on them. I would then like to get all the possible combinations of customers that are possible within the parameters. For example, if I wanted to get all possible combinations of cumstomers that, when added together, don't surpass the $40,000 loss mark, how would I get the combinations of the following:

C1 $20,000 MA Male

C2 $5,000 NH Female

C3 $23,000 ME Male

C4 $12000 RI Male

C5 $1,500 MA Female

Is it possible to get all the combinations of customers where row 2 <= $40,000? Also, if in the future I would want to use other parameters like years of age, how would I be able to include that in my model?

Thank you,

Nimer

10 REPLIES 10

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Before going into actually generating combinations it may be time to ask HOW the result will be used?

Second is how many (assumed ) unique customers are there ? Note that very moderate numbers of customers lead t

Can you provide a little more input data and what you would expect for the output? This may be in perview of SAS/IML.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

We are trying to group these customers in "batches", so I am just trying to see how many combinations there are, and then optimize the combinations by using different parameters for the data (age, state, etc). I guess it is comparable to an optimal lineup in Draftkings. Once you have all the possible combinations of players that dont exceed the salary given, then you would rank the other fields on importance level to attempt to find your "optimal lineup" (sorry for the unnecessary Draftkings reference haha).

So at the end of the day, my output would a list of all possible combination of customer names given that column B does not exceed a total number. Then hopefully after that I can look at the other fields and find the most ideal method of batching.

Hope this provides a bit more insight on what I am trying to accomplish.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

How many customers are you concerned with? 10, 100, 1000 10000 or more? If you are considering combinations of 5 customers from 100 you start with 75,287,520 combinations to calculate totals for. If you have 1000 customers then you are looking at 8.25E12 combinations. And since you would be looking a combinations of 2, 3, 4, and possibly more than 5 then you would be looking at multiple passes of similar magnitude through your data. Might take awhile. There maybe some help in the SAS/OR module but I don't have access to that.

It sounds like you may be looking to do some sort of cluster assignment, i.e. group by similar characteristics. Proc Fastclus might give you a better start if that is what you are doing.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

The issue is really how many records to you want to process? Excel would fail pretty early as it will not support the number of rows or columns you were attempting to deal with as mentioned above.

SAS can do it but your basic approach is going to generate files that take up Gbytes of space and probably require passes through the data that will be likley be measured in closer to hours than in minutes.

You say that you are "trying to group these customers in batches". Do you mean that the customers are similar? Then looking at clustering procedures might be more effecient.

Or just looking at the value and partitioning on a single value. I am not sure why you would want to create a group based on a sum of a variable. Supposed I have one customer with a value of $100000 and 20 customers with a value of $5000. I am not sure what I would gain by treating the two as "groups". Something like this:

Proc Rank data=have out=want groups=10;

var value;

ranks valuerank;

run;

will assign a group value of 0 to 9 in the valuerank variable and each group will contain about 10% of the records.

If you want to create an arbitrary number of groups based on the range of values of a single variable look at Proc Rank with the groups option.

Here's an idea: Take a subset of 10 records and show us what you think you want for those 10 records and some idea of what you will do next

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I will give you an example based on what I am trying to model the my project as, using fantasy football as my example because I think I would be able to explain it better (Sorry if this is not what you were expecting as an example).

So in fantasy football you have $50,000 salary cap every week, you have to choose 7 players and each player is assigned a salary based on how he is projected to perform during the week. So the first step would be figure out the combination of players for each position that would reach to that $50,000 salary cap or under. Obviously I only want to choose 1 or 2 optimal lineups, but I my output is giving me thousands of rows as an output, so there is where I can start filtering my other rows as input. I could decide I want to filter by the highest total aggreggate fantasy points from last week, or filter out by the opponent they are playing next week. So that would reduce my total number of combinations from thousands to maybe hundreds. And then just keep applying filters on the data to try to reach my "optimal lineup" i guess. So I guess the first step would be having a formula that would create all those possible lineups that are <= $50,000 on salary.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I suggest at this point either investing in SAS/OR, which I believe has some tools for this sort of optimization or make sure you have lots of Gbyte of data storage and prepare to wait hours at each step of the process.

Here's an example of what I think you are attempting that only uses 50 "customers" or whatever with an associated value.

The first two bits generate a dummy list of words and values to think of the player or customer (words) and values. The last data step builds the combinations taken 5 at a time. You could add a loop to do 2, 3, 4 and 5 at a time as well.

See if the amount of time it takes to do 50 choose 5 is acceptable. If not 150 choose 5 is likely to be orders of magnitude longer.

```
/* some dummy data*/
data junk;
length xword $ 25;
do x= 1 to 50;
value = 2*x;
xword=put(x,words25.);
output;
end;
run;
proc sql noprint;
select quote(xword),value into : wordlist separated by ',',
: valuelist separated by ','
from junk;
quit;
data work.working;
array x (50) $ 25 _temporary_ (&wordlist); /* list of all unique elements to build combinations from*/
array v (50) _temporary_ (&valuelist); /* your associated value variable matching above*/
array word (5) $ 25 word1-word5; /* just to examine 5 words at a time*/
array value (5) value1-value5;
array index (5) index1-index5;
/*initialize the index array*/
do i=1 to 5;
index[i]=i;
end;
n=dim(x); /* number of elements you'll be making combinations from*/
k=dim(word); /* maximum value could for combination size, limited to 5 for example*/
ncomb=comb(n,k); /* for this example around 2 million output lines*/
/* do j=1 to ncomb;*/ /*uncomment this bit of code to run the full group
and use instead of the other DO J= line*/
do j=1 to 20; /*play around with values of J to investigate time/output size*/
call allcombi(n,k, of index[*]); /* actualls selects the combinations of indices looking into the X array*/
do h=1 to k;
/* keep the selections*/
word[h] = x[index[h]];
value[h] = v[index[h]];
end;
valueSum = sum(of value[*]);
output;
end;
drop i j h k index: n;
run;
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

You can use the SAS Add-In for Microsoft Office to use Excel as an interface:

http://support.sas.com/documentation/onlinedoc/addin/

There is also a SAS Support Community:

https://communities.sas.com/t5/Integration-with-Microsoft/bd-p/office_analytics

And a YouTube video:

https://www.youtube.com/watch?v=UvW3YB3sYOc

If you use SAS/OR, the constraints enforce the restrictions directly, without the need for an IF statement.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

**SAS Innovate 2025** is scheduled for May 6-9 in Orlando, FL. Sign up to be **first to learn** about the agenda and registration!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.