Hello,
I have a sample dataset "Have." I would like to have replace the text fragments with some specific rules as following:
1. If the text contain ' CLEFT PALETTE / CLEFT LIP ' fragments, replace with 'AIRWAY.'
2. If the text contain ' ADHD / AUTISM / SPEECH & COGNITIVE DELAY' fragments, replace with 'DEVDIS.'
3. If the text contain 'GTUBE' fragments, replace with 'GASTRO
4. If the text contain 'ENCEPHALO' fragments, replace with 'NEURO.'
The final result I would like to get is shown in the 'Want' dataset. I think it might be used Prxchange, not sure how. If there are any other ways, I am happy to try it as long as it works.
data Have; infile datalines delimiter='/'; input Disease : $300. ; datalines; CLEFT PALETTE, ADHD, CHRONIC RHINITIS / HAVING A GTUBE, ENCEPHALOMALACIA, HIDRADENITIS, AUTISM/ ENCEPHALOPATHY, GTUBE SUPPORTIVE, HEP C EXPOSURE / AUTISM SPECTRUM, SPEECH & COGNITIVE DELAY, CLEFT LIP ; run; data Want; infile datalines delimiter='/'; input Disease : $300. ; datalines; AIRWAY, DEVDIS, CHRONIC RHINITIS / GASTRO, NEURO, OTHER, DEVDIS / NEURO, GASTRO, HEP C EXPOSURE / DEVDIS, DEVDIS, AIRWAY ; run;
Something like this should work (sorry can't test atm):
DISEASE=prxchange('s/[^,]*(CLEFT PALETTE|CLEFT LIP)[^,]*/AIRWAY/i',-1,DISEASE);
Something like this should work (sorry can't test atm):
DISEASE=prxchange('s/[^,]*(CLEFT PALETTE|CLEFT LIP)[^,]*/AIRWAY/i',-1,DISEASE);
Testing @ChrisNZ 's suggestion:
data Have;
infile datalines delimiter='/';
input Disease : $300. ;
datalines;
CLEFT PALETTE, ADHD, CHRONIC RHINITIS /
HAVING A GTUBE, ENCEPHALOMALACIA, HIDRADENITIS, AUTISM/
ENCEPHALOPATHY, GTUBE SUPPORTIVE, HEP C EXPOSURE /
AUTISM SPECTRUM, SPEECH & COGNITIVE DELAY, CLEFT LIP
;
data want;
set have;
Disease2 = prxchange('s/[^,]*(CLEFT PALETTE|CLEFT LIP)[^,]*/AIRWAY/i', -1, Disease);
Disease3 = prxchange('s/[^,]*(ADHD|AUTISM|SPEECH & COGNITIVE DELAY)[^,]*/DEVDIS/i', -1, Disease2);
Disease4 = prxchange('s/[^,]*(GTUBE)[^,]*/GASTRO/i', -1, Disease3);
Disease5 = prxchange('s/[^,]*(ENCEPHALO)[^,]*/NEURO/i', -1, Disease4);
Disease6 = prxchange('s/[^,]*(HIDRADENITIS)[^,]*/OTHER/i', -1, Disease5);
/* Add spaces after commas */
Disease7 = prxchange('s/,(\w)/, \1/', -1, Disease6);
Disease_transformed = Disease7;
keep Disease Disease_transformed;
run;
proc print; run;
He's right!
@ybz12003 wrote:
But why adding spacing after commas?
Simply because you had spaces after the commas in your example above.
> why adding spacing after commas
To make the code more legible. A very good habit indeed. Like using the case to mean something (like lower case for language keywords, and uppercase for user names such as variable names).
Thanks, Chris and PG made a great effort to help me out. Because Chris is the first one giving me the idea, I accepted his answer as the solution, although PG showed a more complete program and the details. Sometimes, it's hard to decide which is the best solution. I wish I could credit both of your valuable work.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.