Hi I have a data set containing names, addresses, phone numbers etc and I need to randomise some of the data so I can share it.
Ideally I want to change all the first names. Is there anyway I can have a list of say 5 names I choose and then randomly apply them to all the names in my data set e.g.
Data Set Name Original:
1. David
2. John
3. Robbie
4. Josh
5. Alex
6. Toby
7. Alan
8. Nigel
10. Ben
5 names used to randomise:
Peter
Paul
Mark
Simon
Ryan
Data Set Name after randomisation:
1. Peter
2. Simon
3. Peter
4. Ryan
5. Ryan
6. Mark
etc...
Any help would be greatly appreciated.
data want ;
set have;
name = scan("Peter Paul Mark Simon Ryan",ceil(5*ranuni(0)));
run;
Or if there are too many replacement names to pack it into a string:
data have;
input name $;
datalines;
David
John
Robbie
Josh
Alex
Toby
Alan
Nigel
Ben
;
run;
data RepNames;
input RepName $;
repID=_n_;
datalines;
Peter
Paul
Mark
Simon
Ryan
;
run;
data want(drop=_: repID);
set have RepNames(obs=0 keep=repID RepName);
if _n_=1 then
do;
declare hash h1(dataset:'RepNames');
_rc=h1.defineKey('repID');
_rc=h1.defineData('RepName');
_rc=h1.defineDone();
end;
repID=ceil(ranuni(0)*5);
_rc=h1.find();
run;
proc print data=want;
run;
Building on Tom's suggestion, if the set of random names is in a dataset called otherNames then you could use :
proc sql noprint;
select name into :randNames SEPARATED BY ' ' from otherNames;
select count(name) into :nameCount from otherNames;
data want ;
set have;
name = scan("&randNames.",ceil(&nameCount.*ranuni(0)));
run;
PG
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.