I am an experienced SAS coder but apparently I am very dusty and need some quick help. I've searched online and the community forums without finding a tidy solution.
I have a dataset with 1000s of variable values (names) that I need to replace with randomly selected names from a second table.
The number of rows in each table is different.
For example:
dataset CLAIMS has a variable called firstname.
dataset FIRSTNAMES contains a variable (unique list) called newname.
I want to replace each value of firstname with a random selection from newname.
Can anyone provide some pseudocode that would accomplish this relatively easily?
See example code for my suggestion, applying the logic to the SASHELP.CLASS dataset:
data firstnames;
input name $;
datalines;
Charles
Diana
William
Harry
George
;
proc sql noprint;
select nobs into :nobs from dictionary.tables
where libname = "WORK" and memname = "FIRSTNAMES";
quit;
data want;
array names {&nobs.} $ _temporary_;
if _n_ = 1
then do i = 1 to &nobs.;
set firstnames;
names{i} = name;
end;
set sashelp.class;
name = names{rand('integer',1,&nobs.)};
drop i;
run;
Edit: fixed a missing m in the SQL (from)
The attached workbooks should give an idea of the structure of the data. Assume these are SAS datasets, of course.
Thanks, Reeza. I will take a look at that link.
I did not mark any of your replies. Seems the forum is doing that all on its own!
As to your earlier questions: I do not need replicability. Nor do I need consistent name matching. Just need to replace actual names with random names. Random names can be reused "infinitely".
Building off this example here:
https://gist.github.com/statgeek/2f733d27820f43fa37d6ba92c30f22cf
Using @Kurt_Bremser suggestion you can randomly assign a new name as follows then. This was originally designed to search terms but re-assignment is a similar concept so it will work just as well.
*Make fake data to show example;
*terms to search for;
data terms;
set sashelp.baseball (obs=5);
search_term = substr(team,1,3);
keep search_term;;
run;
*main data set that will be searched;
data test;
set sashelp.baseball;
run;
/*General process to the solution*/
******************************************************************************************
1. Store the number of terms in a macro variable to assign the length of arrays
2. Load terms to search into a temporary array
3. Loop through for each word and search the terms
4. Exit loop if you find the term to help speed up the process
******************************************************************************************;
/*1*/
proc sql noprint;
select count(*) into :num_search_terms from terms;
quit;
%put &num_search_terms.;
data flagged;
*for reproducibility while testing;
call streaminit(456);
*declare array;
array _replace(&num_search_terms.) $ _temporary_;
/*2*/
*load array into memory;
if _n_ = 1 then do j=1 to &num_search_terms.;
set terms;
_replace(j) = search_term;
end;
set test;
*set flag to 0 for initial start;
flag = 0;
/*3*/
*Replaces name with random one generated;
random_name = _replace(rand('integer', 1, &num_search_terms));
drop i j search_term ;
run;
@Fugue wrote:
Thanks, Reeza. I will take a look at that link.
I did not mark any of your replies. Seems the forum is doing that all on its own!
As to your earlier questions: I do not need replicability. Nor do I need consistent name matching. Just need to replace actual names with random names. Random names can be reused "infinitely".
Hi Reeza. That example would work. I did not try it out as I was under a time crunch. But, thank you for the link and the info. I appreciate the time you've taken.
Thought you should know that the forum keeps marking some reply as the "solution" without me doing anything at all. Not sure why the forum is doing that.
Load all your new names into a temporary array, and then use rand("integer",1,dim(array)) to get a random index.
See example code for my suggestion, applying the logic to the SASHELP.CLASS dataset:
data firstnames;
input name $;
datalines;
Charles
Diana
William
Harry
George
;
proc sql noprint;
select nobs into :nobs from dictionary.tables
where libname = "WORK" and memname = "FIRSTNAMES";
quit;
data want;
array names {&nobs.} $ _temporary_;
if _n_ = 1
then do i = 1 to &nobs.;
set firstnames;
names{i} = name;
end;
set sashelp.class;
name = names{rand('integer',1,&nobs.)};
drop i;
run;
Edit: fixed a missing m in the SQL (from)
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.