SAS Programming

kjohnsonm · Posted 12-04-2019 01:32 PM

Hello All,
I have some data sets and I need to "stack" them in all combinations possible where the data obs within each sub-set cannot change, (they have to stay in their subset order). (I have seen a lot of info on creating a combinations/permutations of columns of a single data set, that is not what I need)
Here I have shown a manual way to do a full set of three items 3! (three factorial) = 6 as you all know this gets nuts quick with say 7 or 15 items. Any help would be greatly appreciated. I was hoping to do some macro and just hand it my list of sets. Is this possible? -KJ

data abc;
set a
b
c;
run;

data acb;
set a
c
b;
run;

data bac;
set b
a
c;
run;

data bca;
set b
c
a;
run;

data cab;
set c
a
b;
run;

data cba;
set c
b
a;
run;

ballardw · Posted 12-04-2019 02:56 PM

Variation on the "columns" permutation.

make sure the size of the array elements is sufficient to hold any of the libname.dataset names.(41 should work for typical SAS names.

Call execute is one way to build code. You could use the data _null_ set to write to a text file and include that as a program or to run later. You still have a limitation on how long this may run with large numbers of values in the X array. I just number the output sets as combinations of actual data set names as text is very likely to exceed the length of valid data set names. Or leave the I variable in the work.perm and you have a key back to the order used if needed.

data work.perm (keep=namelist);
   array x [3] $10 ('work.A' 'work.B' 'work.C');
   n=dim(x);
   nfact=fact(n); 
   do i=1 to nfact;   
      names=allperm(i, of x[*]);
      namelist = catx(' ',of x[*]);
      output;
   end;
run;

data _null_;
   set work.perm ;
   outset = cats('work.DS_',_n_);
   call execute( 'data '||outset||'; set '|| namelist ||';run;');
run;

View solution in original post

Reeza · Posted 12-04-2019 01:34 PM

If you add an order variable to each data set and then use the approaches you've found so far and sort at the end by your order variables, would that work?

What is the expected output from this? Especially since the values overlap?

kjohnsonm · Posted 12-04-2019 02:17 PM

I am not sure on "order variable", I guess I need to read more on SAS permutations.
As for your second question, I am producing csv* files in my next step that is not an issue, *that is my last step.

Reeza · Posted 12-04-2019 02:27 PM

I meant, given the above data sets as an exact input, what exact output would you expect.

kjohnsonm · Posted 12-04-2019 02:52 PM

In my case, if my data output name is ABC, the data set should be added

first A, then B, then C.
If the output name was CAB then the stacked data should be added in the order C, then A, then last B.
The names could be 1 2 3 that makes no difference to me. I eventually just need the data created into new data sets with every permutation of the sub-sets still intact.

...or if this helps my sets names could be 12_, 13_,14_, 15_, and maybe one output name would be 15_12_13_14_.

*oops I do not think SAS takes a data set starting with a 'number', but A15, A14, A13, etc... would work fine.

Does that answer your follow up? -KJ

ballardw · Posted 12-04-2019 05:23 PM

@kjohnsonm wrote:

In my case, if my data output name is ABC, the data set should be added

*oops I do not think SAS takes a data set starting with a 'number', but A15, A14, A13, etc... would work fine.

Does that answer your follow up? -KJ

Which is why my example code uses DS_ as the base of the name with a counter variable to add a numeric suffix.

FreelanceReinh · Posted 12-04-2019 05:17 PM

@kjohnsonm wrote:
I have some data sets and I need to "stack" them in all combinations possible (...)
As for your second question, I am producing csv* files in my next step that is not an issue, *that is my last step.

To me this sounds a bit like an "XY problem." Wouldn't it be possible to create those csv files from a single dataset containing the subsets in one order, e.g., dataset abc in your example? After all, no new information would be created by generating all subset permutations. You can read a dataset in any order and multiple times (if need be) and create multiple csv files from that dataset in a single DATA step.

Edit (addendum):

Example:

/* Create test data for demonstration */

data have(keep=make seqno);
set sashelp.cars(where=(make in ('Acura','Buick','Cadillac')));
by make;
if first.make then seqno=1;
else seqno+1;
run; /* 7+9+8=24 obs. */

%let s=3; /* number of subgroups */
%let dir=C:\Temp\;

/* Create 3!=6 CSV files C:\Temp\123.csv, 132.csv, ..., 321.csv
   containing the correspondingly permuted 3 subgroups of HAVE */

data _null_;
array x[&s] (1:&s);
array p[&s] _temporary_; /* starting positions */
length fname $50;
do until(last);
  set have end=last;
  by make notsorted;
  if first.make then do;
    j+1;
    p[j]=_n_;
  end;
  _n_+1;
end;
do i=1 to fact(dim(x));
  call allperm(i, of x[*]);
  fname=cats("&dir", of x[*], ".csv");
  file dummy filevar=fname dsd;
  put 'Make,Seqno';
  do j=1 to dim(x);
    do k=p[x[j]] to mod(n+p[mod(x[j],&s)+1]-2,n)+1;
      set have point=k nobs=n;
      put make seqno;
    end;
  end;
end;
run;

ballardw · Posted 12-04-2019 02:56 PM

Variation on the "columns" permutation.

make sure the size of the array elements is sufficient to hold any of the libname.dataset names.(41 should work for typical SAS names.

Call execute is one way to build code. You could use the data _null_ set to write to a text file and include that as a program or to run later. You still have a limitation on how long this may run with large numbers of values in the X array. I just number the output sets as combinations of actual data set names as text is very likely to exceed the length of valid data set names. Or leave the I variable in the work.perm and you have a key back to the order used if needed.

data work.perm (keep=namelist);
   array x [3] $10 ('work.A' 'work.B' 'work.C');
   n=dim(x);
   nfact=fact(n); 
   do i=1 to nfact;   
      names=allperm(i, of x[*]);
      namelist = catx(' ',of x[*]);
      output;
   end;
run;

data _null_;
   set work.perm ;
   outset = cats('work.DS_',_n_);
   call execute( 'data '||outset||'; set '|| namelist ||';run;');
run;

ballardw · Posted 12-05-2019 10:27 AM

ballardw,

You are a true SAS supper user. Thank you.

data work.perm (keep=namelist);
array x [3] $10 ('A' 'B' 'C');
n=dim(x);
nfact=fact(n); 
do i=1 to nfact; 
names=allperm(i, of x[*]);
namelist = catx(' ',of x[*]);
output;
end;
run; proc print data=perm noobs;run;
data temp;
set work.perm ;
/* outset = cats('work.DS_',_n_);*/ /* This code will make output data sets like DS_# where the number will be 1-6 for a three set list.  or n!  --needed for if your data sets have longer names than when joined SAS will allow. */
outset =compress(namelist,,'s');  /* This fits my simple example data model  */
my_call="call execute( 'data '||outset||'; set '|| namelist ||';run;')";
call execute( 'data '||outset||'; set '|| namelist ||';run;');
run;
proc print data=temp;run;
proc print data=ACB ;run;

kjohnsonm · Posted 12-05-2019 10:40 AM

/*PS if you modify your code to */
data temp;
   set work.perm ;
   outset = cats('work.DS_',_n_);
/*   outset =compress(namelist,,'s');*/
   n = _n_;
   call execute( 'data '||outset||'; set '|| namelist ||';run;');
   keep outset n namelist ;
run;
proc print data=temp noobs;run;

You will get output like this:

namelist outset n

A B C work.DS_1 1
A C B work.DS_2 2
C A B work.DS_3 3
C B A work.DS_4 4
B C A work.DS_5 5
B A C work.DS_6 6
That can be used as a simple x-cross walk to your data set names from your source names if your N is large or your data set names are too large. Thanks again cheers -KJ

PGStats · Posted 12-04-2019 03:24 PM

Do datasets a, b and c have the same number of observations? Are you going to be concatenating all these datasets at some point?

PG

kjohnsonm · Posted 12-05-2019 10:29 AM

...in my case no they did not. They could have any number of lines greater than 30 or so.

ballardw · Posted 12-05-2019 10:59 AM

You may want to consider using the INDSNAME= option on the SET statement if you want to know which data set contributed which record

SAS Programming

Stack Data sets in all permutation?

Re: Stack Data sets in all permutation?

Re: Stack Data sets in all permutation?

Re: Stack Data sets in all permutation?

Re: Stack Data sets in all permutation?

Re: Stack Data sets in all permutation?

Re: Stack Data sets in all permutation?

Re: Stack Data sets in all permutation?

Re: Stack Data sets in all permutation?

Re: Stack Data sets in all permutation?

Re: Stack Data sets in all permutation?

Re: Stack Data sets in all permutation?

Re: Stack Data sets in all permutation?

Re: Stack Data sets in all permutation?

Follow Us

What is...

SAS Programming

Special offer for SAS Communities members

SAS Training: Just a Click Away

Follow Us

What is...