BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
akberali67
Calcite | Level 5

Hi,

I am trying to build something that combines all values of all variables using SQL within SAS. For Example:

A       B       C      D

A1     B1     C1     D1

A2     B2     C2

A3               C3

A4               C4

                    C5

A1

B1

C1

D1

A1 B1

A1 B2

A1 B1 C1 D1 and so on..

Basically making all possible filter combinations on variables. I have been a little bit succesful but the problem is with storage because the combinations can be so many.

I wanted your help in trying to make a SAS procedure to make all these filter combinations and also check performance of each combination on a criteria.

But more importantly I am trying to understand how to make the combinations with out killing my storage which is very limited. This is my code:

DECLARE @i INT
SET @i = 0
WHILE (@i < 15)
BEGIN
SET @i = @i + 1
EXEC ('CREATE TABLE VAR' + @i + '(VAR' + @i + ' VARCHAR(50))')
END

INSERT INTO VAR1 (VAR1) SELECT DISTINCT VAR1 FROM [Training Data]


DECLARE @i INT
SET @i = 0
WHILE (@i < 15)
BEGIN
SET @i = @i + 1
EXEC('INSERT INTO VAR' + @i + ' (VAR' + @i + ')' + ' SELECT DISTINCT VAR' + @i + ' FROM [Training Data]')
END


CREATE TABLE ALL_COMB (VAR1 VARCHAR(50), VAR2 VARCHAR(50), VAR3 VARCHAR(50), VAR4 VARCHAR(50), VAR5 VARCHAR(50), VAR6 VARCHAR(50), VAR7 VARCHAR(50), VAR8 VARCHAR(50), VAR9 VARCHAR(50), VAR10 VARCHAR(50), VAR11 VARCHAR(50), VAR12 VARCHAR(50), VAR13 VARCHAR(50), VAR14 VARCHAR(50), VAR15 VARCHAR(50))
INSERT INTO ALL_COMB (VAR1, VAR2, VAR3, VAR4, VAR5, VAR6, VAR7, VAR8, VAR9, VAR10, VAR11, VAR12, VAR13, VAR14, VAR15) SELECT * FROM VAR1, VAR2, VAR3, VAR4, VAR5, VAR6, VAR7, VAR8, VAR9, VAR10, VAR11, VAR12, VAR13, VAR14, VAR15
ORDER BY VAR1, VAR2, VAR3, VAR4, VAR5, VAR6, VAR7, VAR8, VAR9, VAR10, VAR11, VAR12, VAR13, VAR14, VAR15;

TRUNCATE TABLE VAR1

select sum(Outcome)/count(Outcome) as Perc, count(Outcome) as Lines from [Training Data] where var1 = 1

SELECT * FROM ALL_COMB

1 ACCEPTED SOLUTION

Accepted Solutions
art297
Opal | Level 21

I'm still a bit confused.  Given your latest example, you would only have 31 combinations.  You have 5 values, therefore:

5!/(1!*(5-1)! + 5!/(2!*(5-2)! + 5!/(3!*(5-3)! + 5!/(42!*(5-4)! +1

There is code provided in a SAS TS Doc that you can use: TS-DOC: TS-498 - Generating Combinations and Permutations

Here is an example for obtaining the combinations based on your example.  The code is also on that page for getting all permutations.

data test;

   infile cards missover;

   input (A B)(:$2.);

   cards;

A1     B1

A2     B2

A3     .

;;;;

   run;

proc transpose data=test

  out=need (keep=col1);

  by a b;

  var a b;

run;

proc sql noprint;

  select col1

    into :vars separated by ","

      from need

  ;

quit;

    %macro combo(r)/parmbuff;

      %let i=2;

      %let things=;

      %do %while (%Qscan(&syspbuff,&i,%STR(,%))) ne );

        %let p&i="%Qscan(&syspbuff,&i,%STR(,%)))";

        %if &i=2 %then %let things=&&p&i;

        %else %let things=&things,&&p&i;

        %let i=%eval(&i+1);

      %end;

      %let n=%eval(&i-2);

       data combo;

            keep v1-v&r;

            array word $8  w1-w&n (&things);

            array rr (*) r1-r&r;

            array v $8  v1-v&r;

           %do i=1 %to &r;                    /* create the DO LOOPs */

             %if &i=1 %then %do;

               do r&i=1 to &n-(&r-&i);

               %end;

             %else %do;

               do r&i=r%eval(&i-1)+1 to &n-(&r-&i);

               %end;

             %end;

               do k=1 to &r;              /* select subscripted items */

               v(k)=word (rr(k));               /* for a SAS data set */

               put v(k)      '  ' @;                       /* for log */

               end;

               put;                                  /* writes to log */

               output;                    /* writes to a SAS data set */

           %do i=1 %to &r;

             end;                     /* create ENDs for the DO LOOPs */

             %end;

            put;

            run;

       %mend combo;

%combo(5,&vars.)

data want;

  set combo;

run;

%combo(4,&vars.)

proc append base=want data=combo;

run;

%combo(3,&vars.)

proc append base=want data=combo;

run;

%combo(2,&vars.)

proc append base=want data=combo;

run;

%combo(1,&vars.)

proc append base=want data=combo;

run;

Note: the link I provided to the combinations and permutations macros apparently is no longer on support.sas.com. However, you can still find them at: http://www.urz.uni-heidelberg.de/statistik/sas/doc/ts498-combperm.txt

View solution in original post

21 REPLIES 21
hdodson_pacificmetrics_com
Calcite | Level 5

akberali67,

     You can thank data _null_; by marking their post as the answer. That's always appreciated!

Regards,

Huey

akberali67
Calcite | Level 5

:smileyconfused: @Huey - You need a browser update.

akberali67
Calcite | Level 5

Actually, I figured what the code is doing and it is wrong. It simply retains the blank cells with the observations above which are non-blank but more importantly only makes combinations of same rows but does not combine multiple cells.

Any other solution would be helpful.

I am trying to build something that combines all values of all variables for filter combinations I can apply. For Example:

A       B       C      D

A1     B1     C1     D1

A2     B2     C2

A3               C3

A4               C4

                    C5

A1

B1

C1

D1

A1 B1

A1 B2

........

A1 B1 C1 D1 and so on

Basically all permutations and combinations possible to make filters. I know the combinations will be in Zillions and am trying to find a way to limit them also. Thanks in advance

Sir, could you please help, if you have the time. Smiley Happy

art297
Opal | Level 21

: I'd help but I'm not sure what exactly you are looking for.  As for Huey's comment, you only marked data_null's suggestion as "helpful", not as being "correct".

If you are only concerned about the ordering of the records in the file resulting from data_null's suggestion, you can always sort it.  e.g.:

data comb;

  set comb;

  count=count(_type_,'1');

run;

proc sort data=comb;

  by count descending _type_ ;

run;

You said all combinations and permutations.  Do you mean that you would want to have "A1 B1" and "B1 A1" as two separate records in your results (i.e., where order IS relevant)?.

As for how you can use the results, you can always use a datastep to create an include file that specifies which combinations/permutations to use for any particular analysis or set of analyses.

akberali67
Calcite | Level 5

Hi,

I did mark it as correct, unmarked it after I figured it was wrong a few minutes ago.

High level of what I am trying to do is find all filter combinations possible to filter a dataset.

Like an IF condition that substitutes a distinct value of each variable.

So:

A       B       C      D

A1     B1     C1     D1

A2     B2     C2

A3              C3

A4              C4

                  C5

Trying to get 4! * 2! * 5! * 1! = 5,760 possible combinations but I am not trying to get the number, I am trying to get the unique combinations to use as filters, one by one. With the complexity of my dataset, it will run into zillions of combinations which I want to test one by one based on a condition before deciding to store it or delete it. I hope I have managed to explain what I am trying to do. Thanks again.

akberali67
Calcite | Level 5

: What I am trying to do is to filter a dataset on all possible combinations.

My code does create combinations in a separate dataset but I am unable to completely run it because of the size which will run up to 800 TB and I dont have that space.

Is there a way to slow SAS/SQL down and generate these combination one by one with a condition that checks performance of each combination against a criteria specified?

So, that I can only store a combination that peforms well against my criteria and delete the ones that dont.

Thanks in advance for the help!

hdodson_pacificmetrics_com
Calcite | Level 5

akberali67,

     Sorry about the correct answer thing. I'm not sure what went on there.

     In any case, how are you checking the performance of a given filter set? What are the criteria?

     I ask because it will have to be a very quick check to be able to run against 800TB worth of filter sets.

Thanks,

Huey

akberali67
Calcite | Level 5

I have more variables that are not part of the combination variables and its a summary that measures performance against a known standard but the point is to not create that 800 TB, I already know how to do that. I dont have 800 TB of space, so need something that will measure performance and decide whether to store it or delete it.

Astounding
PROC Star

If you could generate this type of loop, would it be helpful?

data want;

set have;

do a='A1', 'A2', 'A3', 'A4';

    do b='B1', 'B2';

         * Test against filtering conditions;

    end;

end;

run;

Of course, you would have to expand it to include the other variables, and to automate the process.  But lots of people here could help with that.

akberali67
Calcite | Level 5

Not really, what this will do again is that it wont try every possible combination. I am starting to think this loop is not possible without eating up all the storage.

The sample dataset I showed you can make 5,760 combinations, I just dont know how to make those combinations without killing the storage space.

Thanks for trying to help.

Astounding
PROC Star

akberali67,

OK, one more try then.  Would this be a viable approach (again, it would need to be expanded to more variables and automated):

data want;

set have;

do _a1='A1', ' ';

   do _a2='A2, ' ';

      do _a3='A3', ' ';

         do _a4='A4', ' ';

            do _a5='A5', ' ';

                do _b1='B1', ' ';

                    do _b2='B2', ' ';

                         * testing and filtering;

end;end;end;end;end;end;end;

run;

It gets all combinations (including all blanks) but not all permutations.  Would that do the trick?

akberali67
Calcite | Level 5

Hi,

I am not sure what I am doing wrong but when I run this code:

data have;

   infile cards missover;

   input (A B)(:$2.);

   cards;

A1     B1

A2     B2

A3     .

A4     .

A5     .

;;;;

run;

data want;

set have;

do _a1='A1', ' ';

   do _a2='A2', ' ';

      do _a3='A3', ' ';

         do _a4='A4', ' ';

            do _a5='A5', ' ';

                do _b1='B1', ' ';

                    do _b2='B2', ' ';

                         * testing and filtering;

end;end;end;end;end;end;end;

run;

I only get the same data with additional blank variables _a1 to _b2.

Astounding
PROC Star

That's correct.  You will end up with all blank variables using my code.  However, you will have looped through every combination along the way, without having to store them.  So the idea is to test/filter at the innermost portion of the loop since each time the innermost loop is reached that is a unique combination.

The issue of combinations vs. permutations still exists.

Good luck.

art297
Opal | Level 21

I'm still a bit confused.  Given your latest example, you would only have 31 combinations.  You have 5 values, therefore:

5!/(1!*(5-1)! + 5!/(2!*(5-2)! + 5!/(3!*(5-3)! + 5!/(42!*(5-4)! +1

There is code provided in a SAS TS Doc that you can use: TS-DOC: TS-498 - Generating Combinations and Permutations

Here is an example for obtaining the combinations based on your example.  The code is also on that page for getting all permutations.

data test;

   infile cards missover;

   input (A B)(:$2.);

   cards;

A1     B1

A2     B2

A3     .

;;;;

   run;

proc transpose data=test

  out=need (keep=col1);

  by a b;

  var a b;

run;

proc sql noprint;

  select col1

    into :vars separated by ","

      from need

  ;

quit;

    %macro combo(r)/parmbuff;

      %let i=2;

      %let things=;

      %do %while (%Qscan(&syspbuff,&i,%STR(,%))) ne );

        %let p&i="%Qscan(&syspbuff,&i,%STR(,%)))";

        %if &i=2 %then %let things=&&p&i;

        %else %let things=&things,&&p&i;

        %let i=%eval(&i+1);

      %end;

      %let n=%eval(&i-2);

       data combo;

            keep v1-v&r;

            array word $8  w1-w&n (&things);

            array rr (*) r1-r&r;

            array v $8  v1-v&r;

           %do i=1 %to &r;                    /* create the DO LOOPs */

             %if &i=1 %then %do;

               do r&i=1 to &n-(&r-&i);

               %end;

             %else %do;

               do r&i=r%eval(&i-1)+1 to &n-(&r-&i);

               %end;

             %end;

               do k=1 to &r;              /* select subscripted items */

               v(k)=word (rr(k));               /* for a SAS data set */

               put v(k)      '  ' @;                       /* for log */

               end;

               put;                                  /* writes to log */

               output;                    /* writes to a SAS data set */

           %do i=1 %to &r;

             end;                     /* create ENDs for the DO LOOPs */

             %end;

            put;

            run;

       %mend combo;

%combo(5,&vars.)

data want;

  set combo;

run;

%combo(4,&vars.)

proc append base=want data=combo;

run;

%combo(3,&vars.)

proc append base=want data=combo;

run;

%combo(2,&vars.)

proc append base=want data=combo;

run;

%combo(1,&vars.)

proc append base=want data=combo;

run;

Note: the link I provided to the combinations and permutations macros apparently is no longer on support.sas.com. However, you can still find them at: http://www.urz.uni-heidelberg.de/statistik/sas/doc/ts498-combperm.txt

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 21 replies
  • 4967 views
  • 3 likes
  • 5 in conversation