## Shuffling cells in each column independently

Solved
Occasional Contributor
Posts: 5

# Shuffling cells in each column independently

I have a data set with roughly 12000 columns. I would like to  randomly shuffle the cells in each column independently of how they are shuffled in the other colums. Any ideas on how to do that? After having spent 5 hours trying to figure something out i just feel stupid.

Accepted Solutions
Solution
‎01-27-2016 07:31 PM
Super User
Posts: 10,784

## Re: Shuffling cells in each column independently

[ Edited ]

That would be very easy if you use IML code .

``````data have;
input (column1-column3) (\$);
cards;
herring apple  Jimmy
salmon  orange Bertha
cod     kiwi   Piotr
;
run;
proc iml;
use have;
read all var _ALL_ into x[c=vnames];
close;
do i=1 to ncol(x);
x[,i]=t(ranperm(x[,i]));
end;
create want from x[c=vnames];
append from x;
close;
quit;``````

All Replies
Super Contributor
Posts: 490

## Re: Shuffling cells in each column independently

Could you give example of your current dataset and your desired output?

Occasional Contributor
Posts: 5

## Re: Shuffling cells in each column independently

[ Edited ]

This is what it looks like now:

column1     column2    column3...........column 12000

herring       apple         Jimmy

salmon       orange      Bertha

cod             kiwi           Piotr

This is what i want to accomplish (randomly):

column1      column2    column3............column12000

cod              orange      Jimmy

salmon        banana      Lizzie

herring        kiwi            Bertha

So what i want to do is to move the cells around randomly within each column.

Posts: 1,252

## Re: Shuffling cells in each column independently

[ Edited ]

I support @Reeza's suggestion. "Adding" a random variable and "sorting" by it would translate to applying the CALL RANPERM routine.

``````data have;
input (column1-column3) (\$);
cards;
herring apple  Jimmy
salmon  orange Bertha
cod     kiwi   Piotr
;

proc transpose data=have out=trans;
var column:; /* You may have to use a variable list of the form varone--lastvar. */
run;

data t_shuffled;
retain seed 314159;
set trans;
*array col[*] col:; /* this line was unnecessary */
call ranperm(seed, of col:);
drop seed; /* inserted, as SEED is no longer needed */
run;

proc transpose data=t_shuffled out=want(drop=_name_);
var col:;
run;
``````

Edit: I hope, when you have 12000 columns that you don't have millions of observations in the same dataset. Not sure how CALL RANPERM would perform in this situation (if dataset TRANS could be created at all).

Edit 2: Did some minor polishing to the data step creating T_SHUFFLED (see comments). Also please note that if your real HAVE data contain a mixture of numeric and character variables, the shuffled numeric values in WANT will have been converted to character values, unfortunately. So you would need an extra step to convert them back to numeric. The numeric-to-character conversion would occur in the first PROC TRANSPOSE step, because the transposed columns would have to accommodate both numeric and character values.

Posts: 5,535

## Re: Shuffling cells in each column independently

If you want to avoid loading the whole dataset into memory with proc transpose, you can do the whole shuffle with data steps and proc sort :

``````data have;
input (column1-column3) (\$);
cards;
herring apple  Jimmy
salmon  orange Bertha
cod     kiwi   Piotr
;

data long;
call streaminit(454);
length value \$10;
set have;
array _v{*} column1-column3;
do var = 1 to dim(_v);
value = _v{var};
rnd = rand("UNIFORM");
output;
end;
keep var value rnd;
run;

proc sort data=long; by var rnd; run;

data longr;
pos = 0;
do until(last.var);
set long; by var;
pos + 1;
output;
end;
drop rnd;
run;

proc sort data=longr; by pos var; run;

data want;
if 0 then set have;
array _v{*} column1-column3;
do until(last.pos);
set longr; by pos;
_v{var} = value;
end;
drop var pos value;
run;
``````
PG
Super User
Posts: 23,754

## Re: Shuffling cells in each column independently

I would do the double transpose solution. both which assume your variables are all numeric and of the form of something like x1-x12000.

Transpose data, add a random variable, sort by said random variable and then re-transpose to original form.

Solution
‎01-27-2016 07:31 PM
Super User
Posts: 10,784

## Re: Shuffling cells in each column independently

[ Edited ]

That would be very easy if you use IML code .

``````data have;
input (column1-column3) (\$);
cards;
herring apple  Jimmy
salmon  orange Bertha
cod     kiwi   Piotr
;
run;
proc iml;
use have;
read all var _ALL_ into x[c=vnames];
close;
do i=1 to ncol(x);
x[,i]=t(ranperm(x[,i]));
end;
create want from x[c=vnames];
append from x;
close;
quit;``````

🔒 This topic is solved and locked.