Hi,
Context here is I have a rudimentary SAS skillset at best (normally an SQL data monkey), and am looking for some syntax advice on how to reference a specific column from a data set to use as a list of strings to find and replace within a large free text string with a "[NAME REDACTED] placeholder. The overall goal here is to replace a huge list of names (~400k) from free text description fields.
The name list (referring to this as LIB.NAMELIST) appears along the lines of:
ID Name
1 Jim
2 Dave
3 Susan
4 Fiona
My plan was to do this with a simple WHILE look, something along the lines of the below (italics is where I don't know how to reference this dataset & column in the loop):
data=have;
set=want;
nstart=1;
nend=400000;
do while(nstart<=nfinish);
TRANWRD(DescriptionField,(SELECT NAME FROM LIB.NAMELIST.Name WHERE ID = nstart),"[NAME REDACTED]";
n+1
;
END
;
Any help would be much appreciated!
Many thanks in advance.
The eventual solution used:
data name_list;
input ID Name $;
datalines;
1 Jim
2 Steve
3 Susan
4 Karen
;
run;
data have;
input text $100.;
infile datalines truncover;
datalines;
Jim AAA
Steve Susan BBB
Susanne CCC
;
run;
proc sql noprint;
select count(*) into :n from name_list ;
quit;
data want;
set have;
array names[&n] $30 _temporary_;
if _n_ = 1 then do i = 1 to &n;
set name_list(keep = name);
names[i] = name;
end;
do i = 1 to &n;
text = tranwrd(text,strip(names[i]),'[NAME REDACTED]');
end;
keep text;
run;
Could you please provide test data for both datasets in the form of a datastep:
https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...
Otherwise I have nothing to work with. I can think of several ways of doing it - format, proc sql (with which you can use your subclause shown), datastep merge with point statment, proc sql/datastep merge, hash table lookup.
Hi,
Apologies, I think I'm maybe being misleading, I'm not looking for an overall solution, just the syntax for how to reference LIB.NAMELIST.NAME as the target for the TRANSWRD statement.
Many thanks.
You can't. Datastep do not work that way. You can do as you show using proc sql or use one of the other methods I presented. It depends on what you are trying to do, and what your data looks like, it sounds like you are trying to remove any words from a text string in a list of given words. If so, then something like (and I will type in the test data for you):
data have; length str $200; str="A long string of wods with some to be removed"; run; data list; length wrd $200; wrd="string";output; wrd="removed";output; run; data _null_; set list end=last; if _n_=1 then call execute('data want; set have;'); call execute(cats('str=tranwrd(str,"',wrd,'","");')); if last then call execute('run;'); run;
Hi,
Thanks for your effort on this, I really appreciate it.
You are right in the intent as I outlined in the original post. However, again the issue is not with the replacement of words within the string, but how do I reference a list of words from a column in a data set.
In other words, you have defined the list of words with:
data list; length wrd $200; wrd="string";output; wrd="removed";output; run;
But I cannot type out 412,000 variations of wrd="string";output; to create the list to loop through - so the question as before remains, how do I define "list" in the above as the column "Name" from LIB.NAMELIST?
"But I cannot type out 412,000 variations of wrd="string"" - you do not need to. What I presented there was an example, as you have not provided anything for us to work with. I will assume that "LIB.NAMELIST" is a dataset which already contains this information, so you use that rather than my example I gave.
Thank you - so how do I reference LIB.NAMELIST.Name in that code?
If its a dataset, in a library called lib, and is called namelist, and contains a variable name (all of which you still have not told us, nor shown any examples of):
data _null_; set lib.namelist end=last; if _n_=1 then call execute('data want; set have;'); call execute(cats('str=tranwrd(str,"',name,'","");')); if last then call execute('run;'); run;
Thank you for your time, I really appreciate it, I'll seek an alternative solution.
Why?
@Lorielus wrote:
Hi,
Apologies, I think I'm maybe being misleading, I'm not looking for an overall solution, just the syntax for how to reference LIB.NAMELIST.NAME as the target for the TRANSWRD statement.
Many thanks.
In Proc SQL you can use an alias: lib.namelist as Z to create the alias and then use as z.name. Should be basic stuff to a moderately experienced SQL user though perhaps you were unaware of the existence of Proc SQL.
SAS has a less common bit of coding referred to as SCL and that is the only place (I am familiar with) that SAS will accept a construct like LIB.NAMELIST.NAME . Namelist would be expected to be a Catalog since there is something following the second level name and then Name would be expected to be an entry in the catalog
As someone with modest SQL skills (as you will see below), I would think this is possible by calling upon your SQL data monkey skills. SAS contains PROC SQL which ought to be able to do something like:
proc sql;
create table want as select * from have,
case (when name in (select name from lib.namelist) then "NAME REDACTED")
else name) as newname;
quit;
Perhaps I oversimplified the problem?
The eventual solution used:
data name_list;
input ID Name $;
datalines;
1 Jim
2 Steve
3 Susan
4 Karen
;
run;
data have;
input text $100.;
infile datalines truncover;
datalines;
Jim AAA
Steve Susan BBB
Susanne CCC
;
run;
proc sql noprint;
select count(*) into :n from name_list ;
quit;
data want;
set have;
array names[&n] $30 _temporary_;
if _n_ = 1 then do i = 1 to &n;
set name_list(keep = name);
names[i] = name;
end;
do i = 1 to &n;
text = tranwrd(text,strip(names[i]),'[NAME REDACTED]');
end;
keep text;
run;
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.