Hi friends;
I am asking for help from a someone out there! I am new to SAS. I would like to delete multiple substrings from a string variable. I am using the tranwrd statement.I am not sure why the code is not eliminating substrings specified. Anyone willing to scanm through my code? Thanks
DATA art2;
SET art2;
do list="hxu","Hydroxyure","ntiretroviralTreatmentotSpecified","Antiretroviral",
"and","Antiretroviral Treatment Not cified","iretroviral Treatment Not Specified",
"Ant troviral Treatment Not Specified";
abbrev_codes2=tranwrd(abbrev_codes, list,"");
end;
The data i have and the one i want is given below;
iretroviral Treatment Not Specified, L667 |
apv, ddi, d4t, abc, hxu |
iretroviral Treatment Not Specified, lov |
apv, ddi, d4t, abc, hxu,Hydroxyure |
apv, ddi, d4t, abc, iretroviral Treatment Not Specified |
|
Data I want
L667 |
apv, ddi, d4t, abc |
lov |
apv, ddi, d4t, abc |
apv, ddi, d4t, abc |
|
thanks for your helpl;
Dathan Byonanebye
It not worked because:
1- add an output to debug
DATA art2;
SET art2;
do list="hxu","Hydroxyure","ntiretroviralTreatmentotSpecified","Antiretroviral",
"and","Antiretroviral Treatment Not cified","iretroviral Treatment Not Specified",
"Ant troviral Treatment Not Specified";
abbrev_codes2=tranwrd(abbrev_codes, list,"");
output;
end;
run;
2- you dont define length of list then >>> length(list) = length("hxu")=3
>>>> list = "hxu","Hyd", "nti" , …. "Ant"
>>>> need to define length of list
>>>> don't forget abbrev_codes2
LENGTH abbrev_codes2 list $100
3- abbrev_codes2=tranwrd(abbrev_codes, list,""); >>> this line of code
>>> resultat = replace only "Ant" in abbrev_codes
>>> change abbrev_codes to abbrev_codes2
abbrev_codes2=tranwrd(abbrev_codes2, list,"");
>>> Its not worked too, because the substring list = "hxu …. " (lenght = 100 >>> hxu + 97 space blank)
>>> use trim()
abbrev_codes2=tranwrd(abbrev_codes2, trim(list),"");
>>> Now it work, but it is not proper like the result you want
>>> add line of code to remove ',' in left and rigth of string
>>> We have;
/********Fix from your code******/
data art2;
set art2;
length abbrev_codes2 list $100;
abbrev_codes2=abbrev_codes;
do list = "hxu","Hydroxyure","ntiretroviralTreatmentotSpecified","Antiretroviral",
"and","Antiretroviral Treatment Not cified","iretroviral Treatment Not Specified",
"Ant troviral Treatment Not Specified";
abbrev_codes2=tranwrd(abbrev_codes2,trim(list),"");
end;
abbrev_codes2=prxchange("s/ /,/",-1,cats(prxchange("s/,//",-1,abbrev_codes2)));
drop list;
run;
/****OR****/
data art2;
set art2;
abbrev_codes2=prxchange('s/hxu|Hydroxyure|ntiretroviralTreatmentotSpecified|Antiretroviral|and|Antiretroviral Treatment Not cified|iretroviral Treatment Not Specified|Ant troviral Treatment Not Specified//',-1,abbrev_codes);
abbrev_codes2=prxchange("s/ /,/",-1,cats(prxchange("s/,//",-1,abbrev_codes2)));
run;
Have a nice day.
You need to apply each deletion to the result of the previous deletion:
DATA art3;
SET art2;
abbrev_codes2 = abbrev_codes;
do list="hxu","Hydroxyure","ntiretroviralTreatmentotSpecified","Antiretroviral",
"and","Antiretroviral Treatment Not cified","iretroviral Treatment Not Specified",
"Ant troviral Treatment Not Specified";
abbrev_codes2=tranwrd(abbrev_codes2, list, "");
end;
run;
You may also want to remove the commas.
Some things to note:
Hi @DathanMD
Here is an approach to achieve this.
I recommend that you use a dataset to store the list of terms to identify rather than putting them directly in the code.
data art2;
infile datalines dlm="09"x;
input abbrev_codes :$100.;
datalines;
iretroviral Treatment Not Specified, L667
apv, ddi, d4t, abc, hxu
iretroviral Treatment Not Specified, lov
apv, ddi, d4t, abc, hxu,Hydroxyure
apv, ddi, d4t, abc, iretroviral Treatment Not Specified
;
run;
/* Put reference terms to look for into a dataset -> easier to maintain */
data list_terms;
infile datalines dlm="09"x;
input terms :$100.;
datalines;
hxu
Hydroxyure
ntiretroviralTreatmentotSpecified
Antiretroviral
and
Antiretroviral Treatment Not cified
iretroviral Treatment Not Specified
Ant troviral Treatment Not Specified
;
run;
/* Define macrovariables*/
proc sql noprint;
select max(countw(abbrev_codes,",")) into:nb_codes from art2;
select count(terms) into:nb_terms from list_terms;
select quote(terms,"") into:list separated by "," from list_terms;
quit;
/* Look for the reference terms */
data art3;
set art2;
length abbrev_codes2 $100;
array _abbrev_codes(&nb_codes.) $100;
array _list(&nb_terms.) $100 _temporary_ (&list.);
do i=1 to dim(_abbrev_codes);
_abbrev_codes(i) = scan(abbrev_codes,i,",");
do j=1 to dim(_list);
if strip(_abbrev_codes(i)) = strip(_list(j)) then do;
_abbrev_codes(i) = "";
leave;
end;
end;
if _abbrev_codes(i) ne "" then abbrev_codes2 = catx(",", abbrev_codes2,_abbrev_codes(i));
end;
drop i j _:;
run;
Best,
@ed_sas_member: you are absolutely right about storing the strings to search for in a dataset, but i can't recommend stored data in macro-variables, because this can cause errors as soon as the list gets longer.Using point-option two dataset can be combined:
data want;
set art2;
length abbrev_codes2 $ 100 i 8;
drop i;
abbrev_codes2 = abbrev_codes;
do i = 1 to num_terms;
set list_terms point=i nobs=num_terms;
abbrev_codes2 = tranwrd(abbrev_codes2, trim(terms), "");
end;
run;
[Edited to provide improved working code]
data TERMS;
infile cards pad ;
input TERM & $100.;
cards;
hxu
Hydroxyure
Antiretroviral
and
\w{1,4}roviral ?Treatment ?N?ot.{0,4}cified
run;
data HAVE;
infile cards pad;
input ABBREV_CODE & $100.;
cards;
iretroviral Treatment Not Specified, L667
apv, ddi, d4t, abc, hxu
iretroviral Treatment Not Specified, lov
apv, ddi, d4t, abc, hxu,Hydroxyure
apv, ddi, d4t, abc, iretroviral Treatment Not Specified
run;
data WANT;
length REGEX $32700;
retain REGEX;
drop REGEX TERM;
if _N_=1 then do until(LASTOBS);
set TERMS end=LASTOBS;
REGEX=catx('|',REGEX,TERM);
end;
set HAVE;
ABBREV_CODE2=prxchange(catt('s/,? ?(',REGEX,'),?//o'),-1,ABBREV_CODE);
run;
ABBREV_CODE | ABBREV_CODE2 |
---|---|
iretroviral Treatment Not Specified, L667 | L667 |
apv, ddi, d4t, abc, hxu | apv, ddi, d4t, abc |
iretroviral Treatment Not Specified, lov | lov |
apv, ddi, d4t, abc, hxu,Hydroxyure | apv, ddi, d4t, abc |
apv, ddi, d4t, abc, iretroviral Treatment Not Specified | apv, ddi, d4t, abc |
It not worked because:
1- add an output to debug
DATA art2;
SET art2;
do list="hxu","Hydroxyure","ntiretroviralTreatmentotSpecified","Antiretroviral",
"and","Antiretroviral Treatment Not cified","iretroviral Treatment Not Specified",
"Ant troviral Treatment Not Specified";
abbrev_codes2=tranwrd(abbrev_codes, list,"");
output;
end;
run;
2- you dont define length of list then >>> length(list) = length("hxu")=3
>>>> list = "hxu","Hyd", "nti" , …. "Ant"
>>>> need to define length of list
>>>> don't forget abbrev_codes2
LENGTH abbrev_codes2 list $100
3- abbrev_codes2=tranwrd(abbrev_codes, list,""); >>> this line of code
>>> resultat = replace only "Ant" in abbrev_codes
>>> change abbrev_codes to abbrev_codes2
abbrev_codes2=tranwrd(abbrev_codes2, list,"");
>>> Its not worked too, because the substring list = "hxu …. " (lenght = 100 >>> hxu + 97 space blank)
>>> use trim()
abbrev_codes2=tranwrd(abbrev_codes2, trim(list),"");
>>> Now it work, but it is not proper like the result you want
>>> add line of code to remove ',' in left and rigth of string
>>> We have;
/********Fix from your code******/
data art2;
set art2;
length abbrev_codes2 list $100;
abbrev_codes2=abbrev_codes;
do list = "hxu","Hydroxyure","ntiretroviralTreatmentotSpecified","Antiretroviral",
"and","Antiretroviral Treatment Not cified","iretroviral Treatment Not Specified",
"Ant troviral Treatment Not Specified";
abbrev_codes2=tranwrd(abbrev_codes2,trim(list),"");
end;
abbrev_codes2=prxchange("s/ /,/",-1,cats(prxchange("s/,//",-1,abbrev_codes2)));
drop list;
run;
/****OR****/
data art2;
set art2;
abbrev_codes2=prxchange('s/hxu|Hydroxyure|ntiretroviralTreatmentotSpecified|Antiretroviral|and|Antiretroviral Treatment Not cified|iretroviral Treatment Not Specified|Ant troviral Treatment Not Specified//',-1,abbrev_codes);
abbrev_codes2=prxchange("s/ /,/",-1,cats(prxchange("s/,//",-1,abbrev_codes2)));
run;
Have a nice day.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.