BookmarkSubscribeRSS Feed
nabudayeh
Calcite | Level 5

Hello,

 

I am a new SAS user, and I apologize if this question has been asked before but I couldn't find any information on this in the forums. What I am trying to do is import an Excel Sheet that will have a table with customer information on them. I would then like to get all the possible combinations of customers that are possible within the parameters. For example, if I wanted to get all possible combinations of cumstomers that, when added together, don't surpass the $40,000 loss mark, how would I get the combinations of the following:

 

C1     $20,000       MA        Male    

C2     $5,000         NH      Female    

C3     $23,000       ME       Male    

C4     $12000         RI      Male    

C5     $1,500         MA        Female    

 

Is it possible to get all the combinations of customers where row 2 <= $40,000? Also, if in the future I would want to use other parameters like years of age, how would I be able to include that in my model?

 

Thank you,

 

Nimer

 

10 REPLIES 10
ballardw
Super User

Before going into actually generating combinations it may be time to ask HOW the result will be used?

 

Second is how many (assumed ) unique customers are there ? Note that very moderate numbers of customers lead t

 

Can you provide a little more input data and what you would expect for the output? This may be in perview of SAS/IML.

nabudayeh
Calcite | Level 5

We are trying to group these customers in "batches", so I am just trying to see how many combinations there are, and then optimize the combinations by using different parameters for the data (age, state, etc). I guess it is comparable to an optimal lineup in Draftkings. Once you have all the possible combinations of players that dont exceed the salary given, then you would rank the other fields on importance level to attempt to find your "optimal lineup" (sorry for the unnecessary Draftkings reference haha).

 

So at the end of the day, my output would a list of all possible combination of customer names given that column B does not exceed a total number. Then hopefully after that I can look at the other fields and find the most ideal method of batching.

 

Hope this provides a bit more insight on what I am trying to accomplish.

ballardw
Super User

How many customers are you concerned with? 10, 100, 1000 10000 or more? If you are considering combinations of 5 customers from 100 you start with 75,287,520 combinations to calculate totals for. If you have 1000 customers then you are looking at 8.25E12 combinations. And since you would be looking a combinations of 2, 3, 4, and possibly more than 5 then you would be looking at multiple passes of similar magnitude through your data. Might take awhile. There maybe some help in the SAS/OR module but I don't have access to that.

 

It sounds like you may be looking to do some sort of cluster assignment, i.e. group by similar characteristics. Proc Fastclus might give you a better start if that is what you are doing.

nabudayeh
Calcite | Level 5

Id say there will probably be around 150 customers. I was trying to do it in Excel with a sample of the data but unfortunately it was too much data to crunch. Ideally all I would need is to calculate all the possibel combinations on column A and then I would include some If Statements in my analysis to find the optimal solution. I was told I would be able to do this in SAS, so sorry if I made it seem like there were 1000s of customers.

ballardw
Super User

The issue is really how many records to you want to process? Excel would fail pretty early as it will not support the number of rows or columns you were attempting to deal with as mentioned above.

 

SAS can do it but your basic approach is going to generate files that take up Gbytes of space and probably require passes through the data that will be likley be measured in closer to hours than in minutes.

 

You say that you are "trying to group these customers in batches". Do you mean that the customers are similar? Then looking at clustering procedures might be more effecient.

Or just looking at the value and partitioning on a single value. I am not sure why you would want to create a group based on a sum of a variable. Supposed I have one customer with a value of $100000 and 20 customers with a value of $5000. I am not sure what I would gain by treating the two as "groups". Something like this:

Proc Rank data=have out=want groups=10;

   var value;

   ranks valuerank;

run;

will assign a group value of 0 to 9 in the valuerank variable and each group will contain about 10% of the records.

 

 

If you want to create an arbitrary number of groups based on the range of values of a single variable look at Proc Rank with the groups option.

 

Here's an idea: Take a subset of 10 records and show us what you think you want for those 10 records and some idea of what you will do next

nabudayeh
Calcite | Level 5

I will give you an example based on what I am trying to model the my project as, using fantasy football as my example because I think I would be able to explain it better (Sorry if this is not what you were expecting as an example).

 

So in fantasy football you have $50,000 salary cap every week, you have to choose 7 players and each player is assigned a salary based on how he is projected to perform during the week. So the first step would be figure out the combination of players for each position that would reach to that $50,000 salary cap or under. Obviously I only want to choose 1 or 2 optimal lineups, but I my output is giving me thousands of rows as an output, so there is where I can start filtering my other rows as input. I could decide I want to filter by the highest total aggreggate fantasy points from last week, or filter out by the opponent they are playing next week. So that would reduce my total number of combinations from thousands to maybe hundreds. And then just keep applying filters on the data to try to reach my "optimal lineup" i guess. So I guess the first step would be having a formula that would create all those possible lineups that are <= $50,000 on salary.

ballardw
Super User

I suggest at this point either investing in SAS/OR, which I believe has some tools for this sort of optimization or make sure you have lots of Gbyte of data storage and prepare to wait hours at each step of the process.

Here's an example of what I think you are attempting that only uses 50 "customers" or whatever with an associated value.

The first two bits generate a dummy list of words and values to think of the player or customer (words) and values. The last data step builds the combinations taken 5 at a time. You could add a loop to do 2, 3, 4 and 5 at a time as well.

See if the amount of time it takes to do 50 choose 5 is acceptable. If not 150 choose 5 is likely to be orders of magnitude longer.

 

/* some dummy data*/
data junk;
   length xword $ 25;
   do x= 1 to 50;
      value = 2*x;
      xword=put(x,words25.);

      output;
   end;
run;

proc sql noprint;
   select quote(xword),value into : wordlist separated by ',',
            : valuelist separated by ','
   from junk;
quit;


data work.working;
   array x (50) $ 25 _temporary_ (&wordlist);  /* list of all unique elements to build combinations from*/
   array v (50)      _temporary_ (&valuelist); /* your associated value variable matching above*/
   array word (5) $ 25 word1-word5; /* just to examine 5 words at a time*/
   array value (5)   value1-value5;
   array index (5)    index1-index5;
   /*initialize the index array*/
   do i=1 to 5;
      index[i]=i;
   end;
   n=dim(x); /* number of elements you'll be making combinations from*/
   k=dim(word); /* maximum value could for combination size, limited to 5 for example*/
   ncomb=comb(n,k); /* for this example around 2 million output lines*/
/*   do j=1 to ncomb;*/ /*uncomment this bit of code to run the full group
                        and use instead of the other DO J= line*/
   do j=1 to 20;        /*play around with values of J to investigate time/output size*/
      call allcombi(n,k, of index[*]); /* actualls selects the combinations of indices looking into the X array*/
      do h=1 to k;
      /* keep the selections*/
         word[h] = x[index[h]];
         value[h] = v[index[h]];
      end;
      valueSum = sum(of value[*]);
      output;
   end;
   drop i j h k index: n;
run;
nabudayeh
Calcite | Level 5

This is definitely close to what I am looking for, I am fairly new to SAS but it seems like I would have to put list all the unique variables in the code itself to be able to do this right, as well as the "associated value variable matching above". Is there any way I can just link this to an Excel Sheet, that way evry week or month that we update our data I can just run the query again and the updated list will populate? Also, is there any way I can add an if statement before the combinations start generating? I feel if I were to use an ifstatement before the combinations begin, I would be able to cut down the number of outputs since a lot of the combinations it will be generating will not satisfy the $ restriction.

RobPratt
SAS Super FREQ

You can use the SAS Add-In for Microsoft Office to use Excel as an interface:

http://support.sas.com/documentation/onlinedoc/addin/

 

There is also a SAS Support Community:

https://communities.sas.com/t5/Integration-with-Microsoft/bd-p/office_analytics

 

And a YouTube video:

https://www.youtube.com/watch?v=UvW3YB3sYOc

 

If you use SAS/OR, the constraints enforce the restrictions directly, without the need for an IF statement.

RobPratt
SAS Super FREQ

I recommend using SAS/OR to solve an optimization problem, implicitly searching the feasible region to find the best solution rather than explicitly enumerating all feasible solutions.  See in particular the PROC OPTMODEL code in these two threads on fantasy football and basketball:

 

https://communities.sas.com/t5/Mathematical-Optimization/Fantasy-Football-Optimal-Model/m-p/171189#M...

https://communities.sas.com/t5/Mathematical-Optimization/Building-a-Fantasy-Basketball-Optimization-...

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 2233 views
  • 0 likes
  • 3 in conversation