How to randomise the data in a field?

robwork · Posted 03-03-2012 07:16 PM

Hi I have a data set containing names, addresses, phone numbers etc and I need to randomise some of the data so I can share it.

Ideally I want to change all the first names. Is there anyway I can have a list of say 5 names I choose and then randomly apply them to all the names in my data set e.g.

Data Set Name Original:

1. David

2. John

3. Robbie

4. Josh

5. Alex

6. Toby

7. Alan

8. Nigel

10. Ben

5 names used to randomise:

Peter

Paul

Mark

Simon

Ryan

Data Set Name after randomisation:

1. Peter

2. Simon

3. Peter

4. Ryan

5. Ryan

6. Mark

etc...

Any help would be greatly appreciated.

Tom · Posted 03-03-2012 07:27 PM

data want ;

set have;

name = scan("Peter Paul Mark Simon Ryan",ceil(5*ranuni(0)));

run;

Patrick · Posted 03-03-2012 07:35 PM

Or if there are too many replacement names to pack it into a string:

data have;
input name $;
datalines;
David
John
Robbie
Josh
Alex
Toby
Alan
Nigel
Ben
;
run;

data RepNames;
input RepName $;
repID=_n_;
datalines;
Peter
Paul
Mark
Simon
Ryan
;
run;

data want(drop=_: repID);
set have RepNames(obs=0 keep=repID RepName);
if _n_=1 then
do;
    declare hash h1(dataset:'RepNames');
    _rc=h1.defineKey('repID');
    _rc=h1.defineData('RepName');
    _rc=h1.defineDone();
end;
repID=ceil(ranuni(0)*5);
_rc=h1.find();
run;

proc print data=want;
run;

PGStats · Posted 03-03-2012 08:26 PM

Building on Tom's suggestion, if the set of random names is in a dataset called otherNames then you could use :

proc sql noprint;
select name into :randNames SEPARATED BY ' ' from otherNames;
select count(name) into :nameCount from otherNames;

data want ;
set have;
name = scan("&randNames.",ceil(&nameCount.*ranuni(0)));
run;

PG