Hello, experts,
I have a sample dataset 'Have.' I would like to remove all the names shown in the dataset 'Maste_Name.' The final result is shown in the dataset 'Want.' Please help me to approach it. Thank you.
data Master_Name; infile datalines dsd; input Name : $50. ; datalines; uairway, uapnea, ugastro, ugerd, uendo, uallergy ; run; data Have; infile datalines delimiter='/'; input Disease : $300. ; datalines; UGASTRO, UGERD, UALLERGY / UAIRWAY, CONGENITAL ATRESIA OF ESOPHA, UGASTRO / ECZEMA ; run; data Want; infile datalines delimiter='/';Re input Disease : $300. ; datalines; / CONGENITAL ATRESIA OF ESOPHA / ECZEMA ; run;
data Master_Name; infile datalines dsd; input Name : $50. ; datalines; uairway, uapnea, ugastro, ugerd, uendo, uallergy ; run; data Have; infile datalines delimiter='/'; input Disease : $300. ; datalines; UGASTRO, UGERD, UALLERGY / UAIRWAY, CONGENITAL ATRESIA OF ESOPHA, UGASTRO / ECZEMA ; run; options noquotelenmax; proc sql noprint; select name into : names separated by '|' from master_name; quit; %put &names; data want; set have; want=tranwrd(prxchange("s/\b&names\b/ /i",-1,disease),' ,',' '); run;
Have you tried using the Tranwrd function? If the replacement value is '', then that should remove the text that you don't want. You might want to run a COMPBL thereafter to compress any multiple spaces.
Jim
For example, the following:
data Master_Name;
infile datalines dsd;
input Name : $50. ;
datalines;
uairway,
uapnea,
ugastro,
ugerd,
uendo,
uallergy
;
run;
data Have;
infile datalines delimiter='/';
input Disease : $300. ;
datalines;
UGASTRO, UGERD, UALLERGY /
UAIRWAY, CONGENITAL ATRESIA OF ESOPHA, UGASTRO /
ECZEMA
;
run;
DATA _NULL_;
CALL SYMPUTX('Names_Cnt', Names_Cnt, 'G');
STOP;
SET Master_Name NOBS = Names_Cnt;
RUN;
DATA Want;
DROP Name;
DROP _:;
ARRAY Names [&Names_Cnt] $50 _TEMPORARY_;
SET Have;
IF _N_ = 1 THEN
DO;
DO WHILE (NOT End_Of_Master);
SET Master_Name
END = End_of_Master
;
_i + 1;
Names[_i] = Name;
END;
END;
DO _i = 1 TO &Names_Cnt;
Disease = COMPBL(TRANWRD(UPCASE(Disease), CATS(UPCASE(Names[_i]), ','), ''));
IF SUBSTR(Disease, LENGTH(TRIM(Disease)), 1) = ',' THEN
Disease = SUBSTR(Disease, 1, LENGTH(STRIP(Disease)) - 1);
Disease = COMPBL(TRANWRD(UPCASE(Disease), STRIP(UPCASE(Names[_i])), ''));
IF SUBSTR(Disease, LENGTH(TRIM(Disease)), 1) = ',' THEN
Disease = SUBSTR(Disease, 1, LENGTH(STRIP(Disease)) - 1);
END;
IF NOT INDEX(Disease, '/') THEN
Disease = CAT(STRIP(Disease), ' /');
RUN;
Which yields:
Jim
data Master_Name; infile datalines dsd; input Name : $50. ; datalines; uairway, uapnea, ugastro, ugerd, uendo, uallergy ; run; data Have; infile datalines delimiter='/'; input Disease : $300. ; datalines; UGASTRO, UGERD, UALLERGY / UAIRWAY, CONGENITAL ATRESIA OF ESOPHA, UGASTRO / ECZEMA ; run; options noquotelenmax; proc sql noprint; select name into : names separated by '|' from master_name; quit; %put &names; data want; set have; want=tranwrd(prxchange("s/\b&names\b/ /i",-1,disease),' ,',' '); run;
Wow, this is SUPER!!
What is the 's' in front of '/' used for? And what is '-1' for?
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.