BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
DathanMD
Obsidian | Level 7

Hi friends;

I am asking for help from a someone out there! I am new to SAS. I would like to delete multiple substrings from a string variable. I am using the tranwrd statement.I am not sure why the code is not eliminating substrings specified. Anyone willing to scanm through my code? Thanks

 

DATA art2;
SET art2;
do list="hxu","Hydroxyure","ntiretroviralTreatmentotSpecified","Antiretroviral",
"and","Antiretroviral Treatment Not cified","iretroviral Treatment Not Specified",
"Ant troviral Treatment Not Specified";
abbrev_codes2=tranwrd(abbrev_codes, list,"");
end;

 

The data i have and the one i want is given below;

iretroviral Treatment Not Specified, L667

apv, ddi, d4t, abc, hxu

iretroviral Treatment Not Specified, lov

apv, ddi, d4t, abc, hxu,Hydroxyure

apv, ddi, d4t, abc, iretroviral Treatment Not Specified

 

 

Data I want

L667

apv, ddi, d4t, abc

lov

apv, ddi, d4t, abc

apv, ddi, d4t, abc

 

 

thanks for your helpl;

 

Dathan Byonanebye

1 ACCEPTED SOLUTION

Accepted Solutions
kelxxx
Quartz | Level 8

It not worked because:

1- add an output to debug

DATA art2;
SET art2;
do list="hxu","Hydroxyure","ntiretroviralTreatmentotSpecified","Antiretroviral",
"and","Antiretroviral Treatment Not cified","iretroviral Treatment Not Specified",
"Ant troviral Treatment Not Specified";
abbrev_codes2=tranwrd(abbrev_codes, list,"");
output;
end;
run;

2- you dont define length of list then >>> length(list) = length("hxu")=3 

>>>> list = "hxu","Hyd", "nti" , …. "Ant"

>>>> need to define length of list

>>>> don't forget abbrev_codes2

LENGTH abbrev_codes2 list $100

3- abbrev_codes2=tranwrd(abbrev_codes, list,"");   >>> this line of code 

>>> resultat = replace only "Ant" in abbrev_codes

>>> change  abbrev_codes to abbrev_codes2

abbrev_codes2=tranwrd(abbrev_codes2, list,"");

>>> Its not worked too, because the substring list = "hxu    ….     " (lenght = 100 >>> hxu + 97 space blank)

>>> use trim()

abbrev_codes2=tranwrd(abbrev_codes2, trim(list),"");

>>> Now it work, but it is not proper like the result you want

>>> add line of code to remove ',' in left and rigth of string

>>> We have;

/********Fix from your code******/

data art2;
set art2;
length abbrev_codes2 list $100;
abbrev_codes2=abbrev_codes;
do list = "hxu","Hydroxyure","ntiretroviralTreatmentotSpecified","Antiretroviral",
"and","Antiretroviral Treatment Not cified","iretroviral Treatment Not Specified",
"Ant troviral Treatment Not Specified";
abbrev_codes2=tranwrd(abbrev_codes2,trim(list),"");
end;
abbrev_codes2=prxchange("s/ /,/",-1,cats(prxchange("s/,//",-1,abbrev_codes2)));
drop list;
run;

/****OR****/

data art2;
set art2;
abbrev_codes2=prxchange('s/hxu|Hydroxyure|ntiretroviralTreatmentotSpecified|Antiretroviral|and|Antiretroviral Treatment Not cified|iretroviral Treatment Not Specified|Ant troviral Treatment Not Specified//',-1,abbrev_codes);
abbrev_codes2=prxchange("s/ /,/",-1,cats(prxchange("s/,//",-1,abbrev_codes2)));
run;

Have a nice day.

View solution in original post

6 REPLIES 6
PGStats
Opal | Level 21

You need to apply each deletion to the result of the previous deletion:

 

DATA art3;
SET art2;
abbrev_codes2 = abbrev_codes;
do list="hxu","Hydroxyure","ntiretroviralTreatmentotSpecified","Antiretroviral",
"and","Antiretroviral Treatment Not cified","iretroviral Treatment Not Specified",
"Ant troviral Treatment Not Specified";
abbrev_codes2=tranwrd(abbrev_codes2, list, "");
end;
run;

You may also want to remove the commas.

 

PG
andreas_lds
Jade | Level 19

Some things to note:

  • never overwrite the dataset you are processing
  • add a length-statement for all variable you are using, that are not read from a dataset
  • use put-statement to debug your code, adding put abbrev_codes= abbrev_codes2=; before the end of the loop shows what is happening.
ed_sas_member
Meteorite | Level 14

Hi @DathanMD 

 

Here is an approach to achieve this.

I recommend that you use a dataset to store the list of terms to identify rather than putting them directly in the code.

data art2;
	infile datalines dlm="09"x;
	input abbrev_codes :$100.;
	datalines;
iretroviral Treatment Not Specified, L667
apv, ddi, d4t, abc, hxu
iretroviral Treatment Not Specified, lov
apv, ddi, d4t, abc, hxu,Hydroxyure
apv, ddi, d4t, abc, iretroviral Treatment Not Specified
;
run;

/* Put reference terms to look for into a dataset -> easier to maintain */

data list_terms;
	infile datalines dlm="09"x;
	input terms :$100.;
	datalines;
hxu
Hydroxyure
ntiretroviralTreatmentotSpecified
Antiretroviral
and
Antiretroviral Treatment Not cified
iretroviral Treatment Not Specified
Ant troviral Treatment Not Specified
;
run;

/* Define macrovariables*/

proc sql noprint;
	select max(countw(abbrev_codes,",")) into:nb_codes from art2;
	select count(terms) into:nb_terms from list_terms;
	select quote(terms,"") into:list separated by "," from list_terms;
quit;

/* Look for the reference terms */

data art3;
	set art2;
	length abbrev_codes2 $100;
	array _abbrev_codes(&nb_codes.) $100;
	array _list(&nb_terms.) $100 _temporary_ (&list.);
	
	do i=1 to dim(_abbrev_codes);
	
		_abbrev_codes(i) = scan(abbrev_codes,i,",");
		
		do j=1 to dim(_list);
			if strip(_abbrev_codes(i)) = strip(_list(j)) then do;
				_abbrev_codes(i) = "";
				leave;
			end;
		end;

	if _abbrev_codes(i) ne "" then abbrev_codes2 = catx(",", abbrev_codes2,_abbrev_codes(i));
	end;
	
	drop i j _:;
run;

Capture d’écran 2020-04-17 à 09.00.19.png

Best,

andreas_lds
Jade | Level 19

@ed_sas_member: you are absolutely right about storing the strings to search for in a dataset, but i can't recommend stored data in macro-variables, because this can cause errors as soon as the list gets longer.Using point-option two dataset can be combined:

 

data want;
    set art2;
    
    length abbrev_codes2 $ 100 i 8;
    drop i;
    
    abbrev_codes2 = abbrev_codes;
    
    do i = 1 to num_terms;
        set list_terms point=i nobs=num_terms;
        abbrev_codes2 = tranwrd(abbrev_codes2, trim(terms), "");
    end;
run;
ChrisNZ
Tourmaline | Level 20

[Edited to provide improved working code]

data TERMS;
  infile cards pad ;
  input TERM & $100.;
  cards;
hxu
Hydroxyure
Antiretroviral
and
\w{1,4}roviral ?Treatment ?N?ot.{0,4}cified
run;

data HAVE;
  infile cards pad;
  input ABBREV_CODE & $100.;
  cards;
iretroviral Treatment Not Specified, L667
apv, ddi, d4t, abc, hxu
iretroviral Treatment Not Specified, lov
apv, ddi, d4t, abc, hxu,Hydroxyure
apv, ddi, d4t, abc, iretroviral Treatment Not Specified
run;

data WANT;
  length REGEX $32700;
  retain REGEX;
  drop   REGEX TERM;
  if _N_=1 then do until(LASTOBS);
    set TERMS end=LASTOBS; 
    REGEX=catx('|',REGEX,TERM);
  end;
  set HAVE;
  ABBREV_CODE2=prxchange(catt('s/,? ?(',REGEX,'),?//o'),-1,ABBREV_CODE);
run;
 
ABBREV_CODE ABBREV_CODE2
iretroviral Treatment Not Specified, L667 L667
apv, ddi, d4t, abc, hxu apv, ddi, d4t, abc
iretroviral Treatment Not Specified, lov lov
apv, ddi, d4t, abc, hxu,Hydroxyure apv, ddi, d4t, abc
apv, ddi, d4t, abc, iretroviral Treatment Not Specified apv, ddi, d4t, abc

 

 

 

kelxxx
Quartz | Level 8

It not worked because:

1- add an output to debug

DATA art2;
SET art2;
do list="hxu","Hydroxyure","ntiretroviralTreatmentotSpecified","Antiretroviral",
"and","Antiretroviral Treatment Not cified","iretroviral Treatment Not Specified",
"Ant troviral Treatment Not Specified";
abbrev_codes2=tranwrd(abbrev_codes, list,"");
output;
end;
run;

2- you dont define length of list then >>> length(list) = length("hxu")=3 

>>>> list = "hxu","Hyd", "nti" , …. "Ant"

>>>> need to define length of list

>>>> don't forget abbrev_codes2

LENGTH abbrev_codes2 list $100

3- abbrev_codes2=tranwrd(abbrev_codes, list,"");   >>> this line of code 

>>> resultat = replace only "Ant" in abbrev_codes

>>> change  abbrev_codes to abbrev_codes2

abbrev_codes2=tranwrd(abbrev_codes2, list,"");

>>> Its not worked too, because the substring list = "hxu    ….     " (lenght = 100 >>> hxu + 97 space blank)

>>> use trim()

abbrev_codes2=tranwrd(abbrev_codes2, trim(list),"");

>>> Now it work, but it is not proper like the result you want

>>> add line of code to remove ',' in left and rigth of string

>>> We have;

/********Fix from your code******/

data art2;
set art2;
length abbrev_codes2 list $100;
abbrev_codes2=abbrev_codes;
do list = "hxu","Hydroxyure","ntiretroviralTreatmentotSpecified","Antiretroviral",
"and","Antiretroviral Treatment Not cified","iretroviral Treatment Not Specified",
"Ant troviral Treatment Not Specified";
abbrev_codes2=tranwrd(abbrev_codes2,trim(list),"");
end;
abbrev_codes2=prxchange("s/ /,/",-1,cats(prxchange("s/,//",-1,abbrev_codes2)));
drop list;
run;

/****OR****/

data art2;
set art2;
abbrev_codes2=prxchange('s/hxu|Hydroxyure|ntiretroviralTreatmentotSpecified|Antiretroviral|and|Antiretroviral Treatment Not cified|iretroviral Treatment Not Specified|Ant troviral Treatment Not Specified//',-1,abbrev_codes);
abbrev_codes2=prxchange("s/ /,/",-1,cats(prxchange("s/,//",-1,abbrev_codes2)));
run;

Have a nice day.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 4900 views
  • 1 like
  • 6 in conversation