Hi I have a data set containing names, addresses, phone numbers etc and I need to randomise some of the data so I can share it.
Ideally I want to change all the first names. Is there anyway I can have a list of say 5 names I choose and then randomly apply them to all the names in my data set e.g.
Data Set Name Original:
1. David
2. John
3. Robbie
4. Josh
5. Alex
6. Toby
7. Alan
8. Nigel
10. Ben
5 names used to randomise:
Peter
Paul
Mark
Simon
Ryan
Data Set Name after randomisation:
1. Peter
2. Simon
3. Peter
4. Ryan
5. Ryan
6. Mark
etc...
Any help would be greatly appreciated.
data want ;
set have;
name = scan("Peter Paul Mark Simon Ryan",ceil(5*ranuni(0)));
run;
Or if there are too many replacement names to pack it into a string:
data have;
input name $;
datalines;
David
John
Robbie
Josh
Alex
Toby
Alan
Nigel
Ben
;
run;
data RepNames;
input RepName $;
repID=_n_;
datalines;
Peter
Paul
Mark
Simon
Ryan
;
run;
data want(drop=_: repID);
set have RepNames(obs=0 keep=repID RepName);
if _n_=1 then
do;
declare hash h1(dataset:'RepNames');
_rc=h1.defineKey('repID');
_rc=h1.defineData('RepName');
_rc=h1.defineDone();
end;
repID=ceil(ranuni(0)*5);
_rc=h1.find();
run;
proc print data=want;
run;
Building on Tom's suggestion, if the set of random names is in a dataset called otherNames then you could use :
proc sql noprint;
select name into :randNames SEPARATED BY ' ' from otherNames;
select count(name) into :nameCount from otherNames;
data want ;
set have;
name = scan("&randNames.",ceil(&nameCount.*ranuni(0)));
run;
PG
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.