- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello All,
I have some data sets and I need to "stack" them in all combinations possible where the data obs within each sub-set cannot change, (they have to stay in their subset order). (I have seen a lot of info on creating a combinations/permutations of columns of a single data set, that is not what I need)
Here I have shown a manual way to do a full set of three items 3! (three factorial) = 6 as you all know this gets nuts quick with say 7 or 15 items. Any help would be greatly appreciated. I was hoping to do some macro and just hand it my list of sets. Is this possible? -KJ
data abc;
set a
b
c;
run;
data acb;
set a
c
b;
run;
data bac;
set b
a
c;
run;
data bca;
set b
c
a;
run;
data cab;
set c
a
b;
run;
data cba;
set c
b
a;
run;
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Variation on the "columns" permutation.
make sure the size of the array elements is sufficient to hold any of the libname.dataset names.(41 should work for typical SAS names.
Call execute is one way to build code. You could use the data _null_ set to write to a text file and include that as a program or to run later. You still have a limitation on how long this may run with large numbers of values in the X array. I just number the output sets as combinations of actual data set names as text is very likely to exceed the length of valid data set names. Or leave the I variable in the work.perm and you have a key back to the order used if needed.
data work.perm (keep=namelist); array x [3] $10 ('work.A' 'work.B' 'work.C'); n=dim(x); nfact=fact(n); do i=1 to nfact; names=allperm(i, of x[*]); namelist = catx(' ',of x[*]); output; end; run; data _null_; set work.perm ; outset = cats('work.DS_',_n_); call execute( 'data '||outset||'; set '|| namelist ||';run;'); run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
What is the expected output from this? Especially since the values overlap?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
As for your second question, I am producing csv* files in my next step that is not an issue, *that is my last step.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
In my case, if my data output name is ABC, the data set should be added
first A, then B, then C.
If the output name was CAB then the stacked data should be added in the order C, then A, then last B.
The names could be 1 2 3 that makes no difference to me. I eventually just need the data created into new data sets with every permutation of the sub-sets still intact.
...or if this helps my sets names could be 12_, 13_,14_, 15_, and maybe one output name would be 15_12_13_14_.
*oops I do not think SAS takes a data set starting with a 'number', but A15, A14, A13, etc... would work fine.
Does that answer your follow up? -KJ
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@kjohnsonm wrote:
In my case, if my data output name is ABC, the data set should be added
*oops I do not think SAS takes a data set starting with a 'number', but A15, A14, A13, etc... would work fine.
Does that answer your follow up? -KJ
Which is why my example code uses DS_ as the base of the name with a counter variable to add a numeric suffix.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@kjohnsonm wrote:
I have some data sets and I need to "stack" them in all combinations possible (...)
As for your second question, I am producing csv* files in my next step that is not an issue, *that is my last step.
To me this sounds a bit like an "XY problem." Wouldn't it be possible to create those csv files from a single dataset containing the subsets in one order, e.g., dataset abc in your example? After all, no new information would be created by generating all subset permutations. You can read a dataset in any order and multiple times (if need be) and create multiple csv files from that dataset in a single DATA step.
Edit (addendum):
Example:
/* Create test data for demonstration */
data have(keep=make seqno);
set sashelp.cars(where=(make in ('Acura','Buick','Cadillac')));
by make;
if first.make then seqno=1;
else seqno+1;
run; /* 7+9+8=24 obs. */
%let s=3; /* number of subgroups */
%let dir=C:\Temp\;
/* Create 3!=6 CSV files C:\Temp\123.csv, 132.csv, ..., 321.csv
containing the correspondingly permuted 3 subgroups of HAVE */
data _null_;
array x[&s] (1:&s);
array p[&s] _temporary_; /* starting positions */
length fname $50;
do until(last);
set have end=last;
by make notsorted;
if first.make then do;
j+1;
p[j]=_n_;
end;
_n_+1;
end;
do i=1 to fact(dim(x));
call allperm(i, of x[*]);
fname=cats("&dir", of x[*], ".csv");
file dummy filevar=fname dsd;
put 'Make,Seqno';
do j=1 to dim(x);
do k=p[x[j]] to mod(n+p[mod(x[j],&s)+1]-2,n)+1;
set have point=k nobs=n;
put make seqno;
end;
end;
end;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Variation on the "columns" permutation.
make sure the size of the array elements is sufficient to hold any of the libname.dataset names.(41 should work for typical SAS names.
Call execute is one way to build code. You could use the data _null_ set to write to a text file and include that as a program or to run later. You still have a limitation on how long this may run with large numbers of values in the X array. I just number the output sets as combinations of actual data set names as text is very likely to exceed the length of valid data set names. Or leave the I variable in the work.perm and you have a key back to the order used if needed.
data work.perm (keep=namelist); array x [3] $10 ('work.A' 'work.B' 'work.C'); n=dim(x); nfact=fact(n); do i=1 to nfact; names=allperm(i, of x[*]); namelist = catx(' ',of x[*]); output; end; run; data _null_; set work.perm ; outset = cats('work.DS_',_n_); call execute( 'data '||outset||'; set '|| namelist ||';run;'); run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You are a true SAS supper user. Thank you.
data work.perm (keep=namelist);
array x [3] $10 ('A' 'B' 'C');
n=dim(x);
nfact=fact(n);
do i=1 to nfact;
names=allperm(i, of x[*]);
namelist = catx(' ',of x[*]);
output;
end;
run; proc print data=perm noobs;run;
data temp;
set work.perm ;
/* outset = cats('work.DS_',_n_);*/ /* This code will make output data sets like DS_# where the number will be 1-6 for a three set list. or n! --needed for if your data sets have longer names than when joined SAS will allow. */
outset =compress(namelist,,'s'); /* This fits my simple example data model */
my_call="call execute( 'data '||outset||'; set '|| namelist ||';run;')";
call execute( 'data '||outset||'; set '|| namelist ||';run;');
run;
proc print data=temp;run;
proc print data=ACB ;run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
/*PS if you modify your code to */
data temp;
set work.perm ;
outset = cats('work.DS_',_n_);
/* outset =compress(namelist,,'s');*/
n = _n_;
call execute( 'data '||outset||'; set '|| namelist ||';run;');
keep outset n namelist ;
run;
proc print data=temp noobs;run;
You will get output like this:
namelist outset n
A B C work.DS_1 1
A C B work.DS_2 2
C A B work.DS_3 3
C B A work.DS_4 4
B C A work.DS_5 5
B A C work.DS_6 6
That can be used as a simple x-cross walk to your data set names from your source names if your N is large or your data set names are too large. Thanks again cheers -KJ
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Do datasets a, b and c have the same number of observations? Are you going to be concatenating all these datasets at some point?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You may want to consider using the INDSNAME= option on the SET statement if you want to know which data set contributed which record