Hi friends;
I am asking for help from a someone out there! I am new to SAS. I would like to delete multiple substrings from a string variable. I am using the tranwrd statement.I am not sure why the code is not eliminating substrings specified. Anyone willing to scanm through my code? Thanks
DATA art2;
SET art2;
do list="hxu","Hydroxyure","ntiretroviralTreatmentotSpecified","Antiretroviral",
"and","Antiretroviral Treatment Not cified","iretroviral Treatment Not Specified",
"Ant troviral Treatment Not Specified";
abbrev_codes2=tranwrd(abbrev_codes, list,"");
end;
The data i have and the one i want is given below;
iretroviral Treatment Not Specified, L667 |
apv, ddi, d4t, abc, hxu |
iretroviral Treatment Not Specified, lov |
apv, ddi, d4t, abc, hxu,Hydroxyure |
apv, ddi, d4t, abc, iretroviral Treatment Not Specified |
|
Data I want
L667 |
apv, ddi, d4t, abc |
lov |
apv, ddi, d4t, abc |
apv, ddi, d4t, abc |
|
thanks for your helpl;
Dathan Byonanebye
It not worked because:
1- add an output to debug
DATA art2;
SET art2;
do list="hxu","Hydroxyure","ntiretroviralTreatmentotSpecified","Antiretroviral",
"and","Antiretroviral Treatment Not cified","iretroviral Treatment Not Specified",
"Ant troviral Treatment Not Specified";
abbrev_codes2=tranwrd(abbrev_codes, list,"");
output;
end;
run;
2- you dont define length of list then >>> length(list) = length("hxu")=3
>>>> list = "hxu","Hyd", "nti" , …. "Ant"
>>>> need to define length of list
>>>> don't forget abbrev_codes2
LENGTH abbrev_codes2 list $100
3- abbrev_codes2=tranwrd(abbrev_codes, list,""); >>> this line of code
>>> resultat = replace only "Ant" in abbrev_codes
>>> change abbrev_codes to abbrev_codes2
abbrev_codes2=tranwrd(abbrev_codes2, list,"");
>>> Its not worked too, because the substring list = "hxu …. " (lenght = 100 >>> hxu + 97 space blank)
>>> use trim()
abbrev_codes2=tranwrd(abbrev_codes2, trim(list),"");
>>> Now it work, but it is not proper like the result you want
>>> add line of code to remove ',' in left and rigth of string
>>> We have;
/********Fix from your code******/
data art2;
set art2;
length abbrev_codes2 list $100;
abbrev_codes2=abbrev_codes;
do list = "hxu","Hydroxyure","ntiretroviralTreatmentotSpecified","Antiretroviral",
"and","Antiretroviral Treatment Not cified","iretroviral Treatment Not Specified",
"Ant troviral Treatment Not Specified";
abbrev_codes2=tranwrd(abbrev_codes2,trim(list),"");
end;
abbrev_codes2=prxchange("s/ /,/",-1,cats(prxchange("s/,//",-1,abbrev_codes2)));
drop list;
run;
/****OR****/
data art2;
set art2;
abbrev_codes2=prxchange('s/hxu|Hydroxyure|ntiretroviralTreatmentotSpecified|Antiretroviral|and|Antiretroviral Treatment Not cified|iretroviral Treatment Not Specified|Ant troviral Treatment Not Specified//',-1,abbrev_codes);
abbrev_codes2=prxchange("s/ /,/",-1,cats(prxchange("s/,//",-1,abbrev_codes2)));
run;
Have a nice day.
You need to apply each deletion to the result of the previous deletion:
DATA art3;
SET art2;
abbrev_codes2 = abbrev_codes;
do list="hxu","Hydroxyure","ntiretroviralTreatmentotSpecified","Antiretroviral",
"and","Antiretroviral Treatment Not cified","iretroviral Treatment Not Specified",
"Ant troviral Treatment Not Specified";
abbrev_codes2=tranwrd(abbrev_codes2, list, "");
end;
run;
You may also want to remove the commas.
Some things to note:
Hi @DathanMD
Here is an approach to achieve this.
I recommend that you use a dataset to store the list of terms to identify rather than putting them directly in the code.
data art2;
infile datalines dlm="09"x;
input abbrev_codes :$100.;
datalines;
iretroviral Treatment Not Specified, L667
apv, ddi, d4t, abc, hxu
iretroviral Treatment Not Specified, lov
apv, ddi, d4t, abc, hxu,Hydroxyure
apv, ddi, d4t, abc, iretroviral Treatment Not Specified
;
run;
/* Put reference terms to look for into a dataset -> easier to maintain */
data list_terms;
infile datalines dlm="09"x;
input terms :$100.;
datalines;
hxu
Hydroxyure
ntiretroviralTreatmentotSpecified
Antiretroviral
and
Antiretroviral Treatment Not cified
iretroviral Treatment Not Specified
Ant troviral Treatment Not Specified
;
run;
/* Define macrovariables*/
proc sql noprint;
select max(countw(abbrev_codes,",")) into:nb_codes from art2;
select count(terms) into:nb_terms from list_terms;
select quote(terms,"") into:list separated by "," from list_terms;
quit;
/* Look for the reference terms */
data art3;
set art2;
length abbrev_codes2 $100;
array _abbrev_codes(&nb_codes.) $100;
array _list(&nb_terms.) $100 _temporary_ (&list.);
do i=1 to dim(_abbrev_codes);
_abbrev_codes(i) = scan(abbrev_codes,i,",");
do j=1 to dim(_list);
if strip(_abbrev_codes(i)) = strip(_list(j)) then do;
_abbrev_codes(i) = "";
leave;
end;
end;
if _abbrev_codes(i) ne "" then abbrev_codes2 = catx(",", abbrev_codes2,_abbrev_codes(i));
end;
drop i j _:;
run;
Best,
@ed_sas_member: you are absolutely right about storing the strings to search for in a dataset, but i can't recommend stored data in macro-variables, because this can cause errors as soon as the list gets longer.Using point-option two dataset can be combined:
data want;
set art2;
length abbrev_codes2 $ 100 i 8;
drop i;
abbrev_codes2 = abbrev_codes;
do i = 1 to num_terms;
set list_terms point=i nobs=num_terms;
abbrev_codes2 = tranwrd(abbrev_codes2, trim(terms), "");
end;
run;
[Edited to provide improved working code]
data TERMS;
infile cards pad ;
input TERM & $100.;
cards;
hxu
Hydroxyure
Antiretroviral
and
\w{1,4}roviral ?Treatment ?N?ot.{0,4}cified
run;
data HAVE;
infile cards pad;
input ABBREV_CODE & $100.;
cards;
iretroviral Treatment Not Specified, L667
apv, ddi, d4t, abc, hxu
iretroviral Treatment Not Specified, lov
apv, ddi, d4t, abc, hxu,Hydroxyure
apv, ddi, d4t, abc, iretroviral Treatment Not Specified
run;
data WANT;
length REGEX $32700;
retain REGEX;
drop REGEX TERM;
if _N_=1 then do until(LASTOBS);
set TERMS end=LASTOBS;
REGEX=catx('|',REGEX,TERM);
end;
set HAVE;
ABBREV_CODE2=prxchange(catt('s/,? ?(',REGEX,'),?//o'),-1,ABBREV_CODE);
run;
ABBREV_CODE | ABBREV_CODE2 |
---|---|
iretroviral Treatment Not Specified, L667 | L667 |
apv, ddi, d4t, abc, hxu | apv, ddi, d4t, abc |
iretroviral Treatment Not Specified, lov | lov |
apv, ddi, d4t, abc, hxu,Hydroxyure | apv, ddi, d4t, abc |
apv, ddi, d4t, abc, iretroviral Treatment Not Specified | apv, ddi, d4t, abc |
It not worked because:
1- add an output to debug
DATA art2;
SET art2;
do list="hxu","Hydroxyure","ntiretroviralTreatmentotSpecified","Antiretroviral",
"and","Antiretroviral Treatment Not cified","iretroviral Treatment Not Specified",
"Ant troviral Treatment Not Specified";
abbrev_codes2=tranwrd(abbrev_codes, list,"");
output;
end;
run;
2- you dont define length of list then >>> length(list) = length("hxu")=3
>>>> list = "hxu","Hyd", "nti" , …. "Ant"
>>>> need to define length of list
>>>> don't forget abbrev_codes2
LENGTH abbrev_codes2 list $100
3- abbrev_codes2=tranwrd(abbrev_codes, list,""); >>> this line of code
>>> resultat = replace only "Ant" in abbrev_codes
>>> change abbrev_codes to abbrev_codes2
abbrev_codes2=tranwrd(abbrev_codes2, list,"");
>>> Its not worked too, because the substring list = "hxu …. " (lenght = 100 >>> hxu + 97 space blank)
>>> use trim()
abbrev_codes2=tranwrd(abbrev_codes2, trim(list),"");
>>> Now it work, but it is not proper like the result you want
>>> add line of code to remove ',' in left and rigth of string
>>> We have;
/********Fix from your code******/
data art2;
set art2;
length abbrev_codes2 list $100;
abbrev_codes2=abbrev_codes;
do list = "hxu","Hydroxyure","ntiretroviralTreatmentotSpecified","Antiretroviral",
"and","Antiretroviral Treatment Not cified","iretroviral Treatment Not Specified",
"Ant troviral Treatment Not Specified";
abbrev_codes2=tranwrd(abbrev_codes2,trim(list),"");
end;
abbrev_codes2=prxchange("s/ /,/",-1,cats(prxchange("s/,//",-1,abbrev_codes2)));
drop list;
run;
/****OR****/
data art2;
set art2;
abbrev_codes2=prxchange('s/hxu|Hydroxyure|ntiretroviralTreatmentotSpecified|Antiretroviral|and|Antiretroviral Treatment Not cified|iretroviral Treatment Not Specified|Ant troviral Treatment Not Specified//',-1,abbrev_codes);
abbrev_codes2=prxchange("s/ /,/",-1,cats(prxchange("s/,//",-1,abbrev_codes2)));
run;
Have a nice day.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.