BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
kjohnsonm
Lapis Lazuli | Level 10

Hello All,
I have some data sets and I need to "stack" them in all combinations possible where the data obs within each sub-set cannot change, (they have to stay in their subset order).  (I have seen a lot of info on creating a combinations/permutations of columns of a single data set, that is not what I need)
Here I have shown a manual way to do a full set of three items 3! (three factorial) = 6 as you all know this gets nuts quick with say 7 or 15 items. Any help would be greatly appreciated. I was hoping to do some macro and just hand it my list of sets.  Is this possible? -KJ

 

data abc;
set a
b
c;
run;

data acb;
set a
c
b;
run;

data bac;
set b
a
c;
run;

data bca;
set b
c
a;
run;

data cab;
set c
a
b;
run;

data cba;
set c
b
a;
run;

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Variation on the "columns" permutation.

make sure the size of the array elements is sufficient to hold any of the libname.dataset names.(41 should work for typical SAS names.

Call execute is one way to build code. You could use the data _null_ set to write to a text file and include that as a program or to run later. You still have a limitation on how long this may run with large numbers of values in the X array. I just number the output sets as combinations of actual data set names as text is very likely to exceed the length of valid data set names. Or leave the I variable in the work.perm and you have a key back to the order used if needed.

data work.perm (keep=namelist);
   array x [3] $10 ('work.A' 'work.B' 'work.C');
   n=dim(x);
   nfact=fact(n); 
   do i=1 to nfact;   
      names=allperm(i, of x[*]);
      namelist = catx(' ',of x[*]);
      output;
   end;
run;

data _null_;
   set work.perm ;
   outset = cats('work.DS_',_n_);
   call execute( 'data '||outset||'; set '|| namelist ||';run;');
run;

View solution in original post

12 REPLIES 12
Reeza
Super User
If you add an order variable to each data set and then use the approaches you've found so far and sort at the end by your order variables, would that work?

What is the expected output from this? Especially since the values overlap?
kjohnsonm
Lapis Lazuli | Level 10
I am not sure on "order variable", I guess I need to read more on SAS permutations.
As for your second question, I am producing csv* files in my next step that is not an issue, *that is my last step.
Reeza
Super User
I meant, given the above data sets as an exact input, what exact output would you expect.
kjohnsonm
Lapis Lazuli | Level 10

In my case, if my data output name is ABC, the data set should be added

first A, then B, then C.
If the output name was CAB then the stacked data should be added in the order C, then A, then last B.
The names could be 1 2 3 that makes no difference to me. I eventually just need the data created into new data sets with every permutation of the sub-sets still intact.

...or if this helps my sets names could be 12_, 13_,14_, 15_, and maybe one output name would be 15_12_13_14_.

 

*oops I do not think SAS takes a data set starting with a 'number', but A15, A14, A13, etc... would work fine.

Does that answer your follow up? -KJ

ballardw
Super User

@kjohnsonm wrote:

In my case, if my data output name is ABC, the data set should be added

 

*oops I do not think SAS takes a data set starting with a 'number', but A15, A14, A13, etc... would work fine.

Does that answer your follow up? -KJ


Which is why my example code uses DS_ as the base of the name with a counter variable to add a numeric suffix.

FreelanceReinh
Jade | Level 19

@kjohnsonm wrote:
I have some data sets and I need to "stack" them in all combinations possible (...)
As for your second question, I am producing csv* files in my next step that is not an issue, *that is my last step.

To me this sounds a bit like an "XY problem." Wouldn't it be possible to create those csv files from a single dataset containing the subsets in one order, e.g., dataset abc in your example? After all, no new information would be created by generating all subset permutations. You can read a dataset in any order and multiple times (if need be) and create multiple csv files from that dataset in a single DATA step.

 

Edit (addendum):

Example:

/* Create test data for demonstration */

data have(keep=make seqno);
set sashelp.cars(where=(make in ('Acura','Buick','Cadillac')));
by make;
if first.make then seqno=1;
else seqno+1;
run; /* 7+9+8=24 obs. */

%let s=3; /* number of subgroups */
%let dir=C:\Temp\;

/* Create 3!=6 CSV files C:\Temp\123.csv, 132.csv, ..., 321.csv
   containing the correspondingly permuted 3 subgroups of HAVE */

data _null_;
array x[&s] (1:&s);
array p[&s] _temporary_; /* starting positions */
length fname $50;
do until(last);
  set have end=last;
  by make notsorted;
  if first.make then do;
    j+1;
    p[j]=_n_;
  end;
  _n_+1;
end;
do i=1 to fact(dim(x));
  call allperm(i, of x[*]);
  fname=cats("&dir", of x[*], ".csv");
  file dummy filevar=fname dsd;
  put 'Make,Seqno';
  do j=1 to dim(x);
    do k=p[x[j]] to mod(n+p[mod(x[j],&s)+1]-2,n)+1;
      set have point=k nobs=n;
      put make seqno;
    end;
  end;
end;
run;

 

ballardw
Super User

Variation on the "columns" permutation.

make sure the size of the array elements is sufficient to hold any of the libname.dataset names.(41 should work for typical SAS names.

Call execute is one way to build code. You could use the data _null_ set to write to a text file and include that as a program or to run later. You still have a limitation on how long this may run with large numbers of values in the X array. I just number the output sets as combinations of actual data set names as text is very likely to exceed the length of valid data set names. Or leave the I variable in the work.perm and you have a key back to the order used if needed.

data work.perm (keep=namelist);
   array x [3] $10 ('work.A' 'work.B' 'work.C');
   n=dim(x);
   nfact=fact(n); 
   do i=1 to nfact;   
      names=allperm(i, of x[*]);
      namelist = catx(' ',of x[*]);
      output;
   end;
run;

data _null_;
   set work.perm ;
   outset = cats('work.DS_',_n_);
   call execute( 'data '||outset||'; set '|| namelist ||';run;');
run;
kjohnsonm
Lapis Lazuli | Level 10

,

You are a true SAS supper user.  Thank you.

data work.perm (keep=namelist);
array x [3] $10 ('A' 'B' 'C');
n=dim(x);
nfact=fact(n); 
do i=1 to nfact; 
names=allperm(i, of x[*]);
namelist = catx(' ',of x[*]);
output;
end;
run; proc print data=perm noobs;run;
data temp;
set work.perm ;
/* outset = cats('work.DS_',_n_);*/ /* This code will make output data sets like DS_# where the number will be 1-6 for a three set list.  or n!  --needed for if your data sets have longer names than when joined SAS will allow. */
outset =compress(namelist,,'s');  /* This fits my simple example data model  */
my_call="call execute( 'data '||outset||'; set '|| namelist ||';run;')";
call execute( 'data '||outset||'; set '|| namelist ||';run;');
run;
proc print data=temp;run;
proc print data=ACB ;run;



kjohnsonm
Lapis Lazuli | Level 10
/*PS if you modify your code to */
data temp;
   set work.perm ;
   outset = cats('work.DS_',_n_);
/*   outset =compress(namelist,,'s');*/
   n = _n_;
   call execute( 'data '||outset||'; set '|| namelist ||';run;');
   keep outset n namelist ;
run;
proc print data=temp noobs;run;

You will get output like this:

namelist outset n

A B C work.DS_1 1
A C B work.DS_2 2
C A B work.DS_3 3
C B A work.DS_4 4
B C A work.DS_5 5
B A C work.DS_6 6
That can be used as a simple x-cross walk to your data set names from your source names if your N is large or your data set names are too large. Thanks again cheers -KJ

PGStats
Opal | Level 21

Do datasets a, b and c have the same number of observations? Are you going to be concatenating all these datasets at some point?

PG
kjohnsonm
Lapis Lazuli | Level 10
...in my case no they did not. They could have any number of lines greater than 30 or so.
ballardw
Super User

You may want to consider using the INDSNAME= option on the SET statement if you want to know which data set contributed which record

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 2572 views
  • 2 likes
  • 5 in conversation