- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I am an experienced SAS coder but apparently I am very dusty and need some quick help. I've searched online and the community forums without finding a tidy solution.
I have a dataset with 1000s of variable values (names) that I need to replace with randomly selected names from a second table.
The number of rows in each table is different.
For example:
dataset CLAIMS has a variable called firstname.
dataset FIRSTNAMES contains a variable (unique list) called newname.
I want to replace each value of firstname with a random selection from newname.
Can anyone provide some pseudocode that would accomplish this relatively easily?
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
See example code for my suggestion, applying the logic to the SASHELP.CLASS dataset:
data firstnames;
input name $;
datalines;
Charles
Diana
William
Harry
George
;
proc sql noprint;
select nobs into :nobs from dictionary.tables
where libname = "WORK" and memname = "FIRSTNAMES";
quit;
data want;
array names {&nobs.} $ _temporary_;
if _n_ = 1
then do i = 1 to &nobs.;
set firstnames;
names{i} = name;
end;
set sashelp.class;
name = names{rand('integer',1,&nobs.)};
drop i;
run;
Edit: fixed a missing m in the SQL (from)
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The attached workbooks should give an idea of the structure of the data. Assume these are SAS datasets, of course.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
2. Do you need to be able to recreate this in any fashion? If you re-run it later are you expected to match the data?
3. What do you want the output to look like?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Not sure if you solved your issue since you marked my answer, if not you should unmark it.
Once you answer the questions, it should be fairly straightforward depending on the answers.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks, Reeza. I will take a look at that link.
I did not mark any of your replies. Seems the forum is doing that all on its own!
As to your earlier questions: I do not need replicability. Nor do I need consistent name matching. Just need to replace actual names with random names. Random names can be reused "infinitely".
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Building off this example here:
https://gist.github.com/statgeek/2f733d27820f43fa37d6ba92c30f22cf
Using @Kurt_Bremser suggestion you can randomly assign a new name as follows then. This was originally designed to search terms but re-assignment is a similar concept so it will work just as well.
*Make fake data to show example;
*terms to search for;
data terms;
set sashelp.baseball (obs=5);
search_term = substr(team,1,3);
keep search_term;;
run;
*main data set that will be searched;
data test;
set sashelp.baseball;
run;
/*General process to the solution*/
******************************************************************************************
1. Store the number of terms in a macro variable to assign the length of arrays
2. Load terms to search into a temporary array
3. Loop through for each word and search the terms
4. Exit loop if you find the term to help speed up the process
******************************************************************************************;
/*1*/
proc sql noprint;
select count(*) into :num_search_terms from terms;
quit;
%put &num_search_terms.;
data flagged;
*for reproducibility while testing;
call streaminit(456);
*declare array;
array _replace(&num_search_terms.) $ _temporary_;
/*2*/
*load array into memory;
if _n_ = 1 then do j=1 to &num_search_terms.;
set terms;
_replace(j) = search_term;
end;
set test;
*set flag to 0 for initial start;
flag = 0;
/*3*/
*Replaces name with random one generated;
random_name = _replace(rand('integer', 1, &num_search_terms));
drop i j search_term ;
run;
@Fugue wrote:
Thanks, Reeza. I will take a look at that link.
I did not mark any of your replies. Seems the forum is doing that all on its own!
As to your earlier questions: I do not need replicability. Nor do I need consistent name matching. Just need to replace actual names with random names. Random names can be reused "infinitely".
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Reeza. That example would work. I did not try it out as I was under a time crunch. But, thank you for the link and the info. I appreciate the time you've taken.
Thought you should know that the forum keeps marking some reply as the "solution" without me doing anything at all. Not sure why the forum is doing that.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Load all your new names into a temporary array, and then use rand("integer",1,dim(array)) to get a random index.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
See example code for my suggestion, applying the logic to the SASHELP.CLASS dataset:
data firstnames;
input name $;
datalines;
Charles
Diana
William
Harry
George
;
proc sql noprint;
select nobs into :nobs from dictionary.tables
where libname = "WORK" and memname = "FIRSTNAMES";
quit;
data want;
array names {&nobs.} $ _temporary_;
if _n_ = 1
then do i = 1 to &nobs.;
set firstnames;
names{i} = name;
end;
set sashelp.class;
name = names{rand('integer',1,&nobs.)};
drop i;
run;
Edit: fixed a missing m in the SQL (from)