Shuffling cells in each column independently

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 5
Accepted Solution

Shuffling cells in each column independently

I have a data set with roughly 12000 columns. I would like to  randomly shuffle the cells in each column independently of how they are shuffled in the other colums. Any ideas on how to do that? After having spent 5 hours trying to figure something out i just feel stupid.


Accepted Solutions
Solution
‎01-27-2016 07:31 PM
Grand Advisor
Posts: 9,571

Re: Shuffling cells in each column independently

[ Edited ]

That would be very easy if you use IML code .

 

data have;
input (column1-column3) ($);
cards;
herring apple  Jimmy
salmon  orange Bertha
cod     kiwi   Piotr
haddock banana Lizzie
;
run;
proc iml;
use have;
read all var _ALL_ into x[c=vnames];
close;
do i=1 to ncol(x);
 x[,i]=t(ranperm(x[,i]));
end;
create want from x[c=vnames];
append from x;
close;
quit;

 

View solution in original post


All Replies
Super Contributor
Posts: 490

Re: Shuffling cells in each column independently

Could you give example of your current dataset and your desired output?

Occasional Contributor
Posts: 5

Re: Shuffling cells in each column independently

[ Edited ]

This is what it looks like now:

 

column1     column2    column3...........column 12000

herring       apple         Jimmy

salmon       orange      Bertha

cod             kiwi           Piotr

haddock     banana     Lizzie

 

This is what i want to accomplish (randomly):

 

column1      column2    column3............column12000

cod              orange      Jimmy

salmon        banana      Lizzie

haddock      apple         Piotr

herring        kiwi            Bertha

 

 

So what i want to do is to move the cells around randomly within each column.

Trusted Advisor
Posts: 1,114

Re: Shuffling cells in each column independently

[ Edited ]

I support @Reeza's suggestion. "Adding" a random variable and "sorting" by it would translate to applying the CALL RANPERM routine.

 

data have;
input (column1-column3) ($);
cards;
herring apple  Jimmy
salmon  orange Bertha
cod     kiwi   Piotr
haddock banana Lizzie
;

proc transpose data=have out=trans;
var column:; /* You may have to use a variable list of the form varone--lastvar. */
run;

data t_shuffled;
retain seed 314159;
set trans;
*array col[*] col:; /* this line was unnecessary */
call ranperm(seed, of col:);
drop seed; /* inserted, as SEED is no longer needed */
run;

proc transpose data=t_shuffled out=want(drop=_name_);
var col:;
run;

Edit: I hope, when you have 12000 columns that you don't have millions of observations in the same dataset. Not sure how CALL RANPERM would perform in this situation (if dataset TRANS could be created at all).

 

Edit 2: Did some minor polishing to the data step creating T_SHUFFLED (see comments). Also please note that if your real HAVE data contain a mixture of numeric and character variables, the shuffled numeric values in WANT will have been converted to character values, unfortunately. So you would need an extra step to convert them back to numeric. The numeric-to-character conversion would occur in the first PROC TRANSPOSE step, because the transposed columns would have to accommodate both numeric and character values.

Respected Advisor
Posts: 4,606

Re: Shuffling cells in each column independently

If you want to avoid loading the whole dataset into memory with proc transpose, you can do the whole shuffle with data steps and proc sort :

 

data have;
input (column1-column3) ($);
cards;
herring apple  Jimmy
salmon  orange Bertha
cod     kiwi   Piotr
haddock banana Lizzie
;

data long;
call streaminit(454);
length value $10;
set have;
array _v{*} column1-column3;
do var = 1 to dim(_v);
    value = _v{var};
    rnd = rand("UNIFORM");
    output;
    end;
keep var value rnd;
run;

proc sort data=long; by var rnd; run;

data longr;
pos = 0;
do until(last.var);
    set long; by var;
    pos + 1;
    output;
    end;
drop rnd;
run;

proc sort data=longr; by pos var; run;

data want;
if 0 then set have;
array _v{*} column1-column3;
do until(last.pos);
    set longr; by pos;
    _v{var} = value;
    end;
drop var pos value;
run;
PG
Grand Advisor
Posts: 17,316

Re: Shuffling cells in each column independently

I would do the double transpose solution. both which assume your variables are all numeric and of the form of something like x1-x12000.

 

Transpose data, add a random variable, sort by said random variable and then re-transpose to original form. 

 

 

 

Solution
‎01-27-2016 07:31 PM
Grand Advisor
Posts: 9,571

Re: Shuffling cells in each column independently

[ Edited ]

That would be very easy if you use IML code .

 

data have;
input (column1-column3) ($);
cards;
herring apple  Jimmy
salmon  orange Bertha
cod     kiwi   Piotr
haddock banana Lizzie
;
run;
proc iml;
use have;
read all var _ALL_ into x[c=vnames];
close;
do i=1 to ncol(x);
 x[,i]=t(ranperm(x[,i]));
end;
create want from x[c=vnames];
append from x;
close;
quit;

 

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 428 views
  • 3 likes
  • 6 in conversation