BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Fugue
Quartz | Level 8

I am an experienced SAS coder but apparently I am very dusty and need some quick help. I've searched online and the community forums without finding a tidy solution.

 

I have a dataset with 1000s of variable values (names) that I need to replace with randomly selected names from a second table.

 

The number of rows in each table is different.

 

For example:

 

dataset CLAIMS has a variable called firstname.

 

dataset FIRSTNAMES contains a variable (unique list) called newname.

 

I want to replace each value of firstname with a random selection from newname.

 

Can anyone provide some pseudocode that would accomplish this relatively easily?

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

See example code for my suggestion, applying the logic to the SASHELP.CLASS dataset:

data firstnames;
input name $;
datalines;
Charles
Diana
William
Harry
George
;

proc sql noprint;
select nobs into :nobs from dictionary.tables
where libname = "WORK" and memname = "FIRSTNAMES";
quit;

data want;
array names {&nobs.} $ _temporary_;
if _n_ = 1
then do i = 1 to &nobs.;
  set firstnames;
  names{i} = name;
end;
set sashelp.class;
name = names{rand('integer',1,&nobs.)};
drop i;
run;

 

Edit: fixed a missing m in the SQL (from)

View solution in original post

9 REPLIES 9
Reeza
Super User
Can you please quickly make some fake data that follows the structure of your input files and provide that as a starting point?
Fugue
Quartz | Level 8

The attached workbooks should give an idea of the structure of the data. Assume these are SAS datasets, of course.

Reeza
Super User
1. Do you want consistency across ID's or names? Ie does Mark map to same name each time or does ID 4 map to the same name each time? In that case you'll need some unique identifier per individual.
2. Do you need to be able to recreate this in any fashion? If you re-run it later are you expected to match the data?
3. What do you want the output to look like?
Reeza
Super User
Here's an outline of how to map an ID to a random number: https://gist.github.com/statgeek/fd94b0b6e78815430c1340e8c19f8644

Not sure if you solved your issue since you marked my answer, if not you should unmark it.
Once you answer the questions, it should be fairly straightforward depending on the answers.
Fugue
Quartz | Level 8

Thanks, Reeza. I will take a look at that link.

 

I did not mark any of your replies. Seems the forum is doing that all on its own!

 

As to your earlier questions: I do not need replicability. Nor do I need consistent name matching. Just need to replace actual names with random names. Random names can be reused "infinitely".

Reeza
Super User

Building off this example here:

https://gist.github.com/statgeek/2f733d27820f43fa37d6ba92c30f22cf

 

Using @Kurt_Bremser suggestion you can randomly assign a new name as follows then. This was originally designed to search terms but re-assignment is a similar concept so it will work just as well. 

 

 

*Make fake data to show example;
*terms to search for;
data terms; 
set sashelp.baseball (obs=5);
search_term = substr(team,1,3);
keep search_term;;
run;
    
*main data set that will be searched;
data test; 
set sashelp.baseball;
run;


/*General process to the solution*/
******************************************************************************************
1. Store the number of terms in a macro variable to assign the length of arrays
2. Load terms to search into a temporary array
3. Loop through for each word and search the terms
4. Exit loop if you find the term to help speed up the process
******************************************************************************************;



/*1*/
proc sql noprint;
select count(*) into :num_search_terms from terms;
quit;

%put &num_search_terms.;



data flagged;
*for reproducibility while testing;
call streaminit(456);

*declare array;
array _replace(&num_search_terms.) $ _temporary_;

/*2*/
*load array into memory;
   if _n_ = 1 then do j=1 to &num_search_terms.;
   set terms;
   _replace(j) = search_term;
   end;
 
 set test;
 
 *set flag to 0 for initial start;
 flag = 0;

/*3*/
*Replaces name with random one generated;
random_name = _replace(rand('integer', 1, &num_search_terms));

drop i j search_term ;

run;

@Fugue wrote:

Thanks, Reeza. I will take a look at that link.

 

I did not mark any of your replies. Seems the forum is doing that all on its own!

 

As to your earlier questions: I do not need replicability. Nor do I need consistent name matching. Just need to replace actual names with random names. Random names can be reused "infinitely".


 

 

Fugue
Quartz | Level 8

Hi Reeza. That example would work. I did not try it out as I was under a time crunch. But, thank you for the link and the info. I appreciate the time you've taken.

 

Thought you should know that the forum keeps marking some reply as the "solution" without me doing anything at all. Not sure why the forum is doing that.

Kurt_Bremser
Super User

See example code for my suggestion, applying the logic to the SASHELP.CLASS dataset:

data firstnames;
input name $;
datalines;
Charles
Diana
William
Harry
George
;

proc sql noprint;
select nobs into :nobs from dictionary.tables
where libname = "WORK" and memname = "FIRSTNAMES";
quit;

data want;
array names {&nobs.} $ _temporary_;
if _n_ = 1
then do i = 1 to &nobs.;
  set firstnames;
  names{i} = name;
end;
set sashelp.class;
name = names{rand('integer',1,&nobs.)};
drop i;
run;

 

Edit: fixed a missing m in the SQL (from)

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 4316 views
  • 7 likes
  • 3 in conversation