Hello, experts,
I have a sample dataset 'Have.' I would like to remove all the names shown in the dataset 'Maste_Name.' The final result is shown in the dataset 'Want.' Please help me to approach it. Thank you.
data Master_Name; infile datalines dsd; input Name : $50. ; datalines; uairway, uapnea, ugastro, ugerd, uendo, uallergy ; run; data Have; infile datalines delimiter='/'; input Disease : $300. ; datalines; UGASTRO, UGERD, UALLERGY / UAIRWAY, CONGENITAL ATRESIA OF ESOPHA, UGASTRO / ECZEMA ; run; data Want; infile datalines delimiter='/';Re input Disease : $300. ; datalines; / CONGENITAL ATRESIA OF ESOPHA / ECZEMA ; run;
data Master_Name; infile datalines dsd; input Name : $50. ; datalines; uairway, uapnea, ugastro, ugerd, uendo, uallergy ; run; data Have; infile datalines delimiter='/'; input Disease : $300. ; datalines; UGASTRO, UGERD, UALLERGY / UAIRWAY, CONGENITAL ATRESIA OF ESOPHA, UGASTRO / ECZEMA ; run; options noquotelenmax; proc sql noprint; select name into : names separated by '|' from master_name; quit; %put &names; data want; set have; want=tranwrd(prxchange("s/\b&names\b/ /i",-1,disease),' ,',' '); run;
Have you tried using the Tranwrd function? If the replacement value is '', then that should remove the text that you don't want. You might want to run a COMPBL thereafter to compress any multiple spaces.
Jim
For example, the following:
data Master_Name;
infile datalines dsd;
input Name : $50. ;
datalines;
uairway,
uapnea,
ugastro,
ugerd,
uendo,
uallergy
;
run;
data Have;
infile datalines delimiter='/';
input Disease : $300. ;
datalines;
UGASTRO, UGERD, UALLERGY /
UAIRWAY, CONGENITAL ATRESIA OF ESOPHA, UGASTRO /
ECZEMA
;
run;
DATA _NULL_;
CALL SYMPUTX('Names_Cnt', Names_Cnt, 'G');
STOP;
SET Master_Name NOBS = Names_Cnt;
RUN;
DATA Want;
DROP Name;
DROP _:;
ARRAY Names [&Names_Cnt] $50 _TEMPORARY_;
SET Have;
IF _N_ = 1 THEN
DO;
DO WHILE (NOT End_Of_Master);
SET Master_Name
END = End_of_Master
;
_i + 1;
Names[_i] = Name;
END;
END;
DO _i = 1 TO &Names_Cnt;
Disease = COMPBL(TRANWRD(UPCASE(Disease), CATS(UPCASE(Names[_i]), ','), ''));
IF SUBSTR(Disease, LENGTH(TRIM(Disease)), 1) = ',' THEN
Disease = SUBSTR(Disease, 1, LENGTH(STRIP(Disease)) - 1);
Disease = COMPBL(TRANWRD(UPCASE(Disease), STRIP(UPCASE(Names[_i])), ''));
IF SUBSTR(Disease, LENGTH(TRIM(Disease)), 1) = ',' THEN
Disease = SUBSTR(Disease, 1, LENGTH(STRIP(Disease)) - 1);
END;
IF NOT INDEX(Disease, '/') THEN
Disease = CAT(STRIP(Disease), ' /');
RUN;
Which yields:
Jim
data Master_Name; infile datalines dsd; input Name : $50. ; datalines; uairway, uapnea, ugastro, ugerd, uendo, uallergy ; run; data Have; infile datalines delimiter='/'; input Disease : $300. ; datalines; UGASTRO, UGERD, UALLERGY / UAIRWAY, CONGENITAL ATRESIA OF ESOPHA, UGASTRO / ECZEMA ; run; options noquotelenmax; proc sql noprint; select name into : names separated by '|' from master_name; quit; %put &names; data want; set have; want=tranwrd(prxchange("s/\b&names\b/ /i",-1,disease),' ,',' '); run;
Wow, this is SUPER!!
What is the 's' in front of '/' used for? And what is '-1' for?
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.