Hi Experts,
I have a sample set, Datain, as shown below. Column 'Diagnosis' contains various names.
data Datain; infile datalines delimiter='/'; input Diagnosis : $300. AIRWAY_sp : $200. ENDO_sp : $200. GENE_sp : $200. ; datalines; CLEFT PALETTE, HYPOTHYROIDISM, CHRONIC RHINITIS / LARYNGOMALACIA,TEF / / / HAVING A GTUBE, BILATERAL POLYCYSTIC OVARIAN SYNDROME, AUTISM/ / / / VACTERL ASSOCIATION, TRACHEOESOPHAGEAL FISTULA, TEF / / / / AUTISM SPECTRUM, SPEECH & COGNITIVE DELAY, BECKWITH / / / METABOLIC DISEASE / ; run;
I have three different classes, AIRWAY, ENDO, and GENE. Each class includes different disease names as shown below.
/* AIRWAY Class */ data AIRWAY_class; infile datalines dsd; input Name : $150. ; datalines; CLEFT PALETTE, LARYNGOMALACIA, INEFFECTIVE AIRWAY CLEARAN, TRACHEOESOPHAGEAL FISTULA, TEF ; run; proc sql noprint; select trim(name) into : AIRWAY_Name separated by '|' from AIRWAY_class; quit; %put &AIRWAY_Name; /* ENDO Class */ data ENDO_class; infile datalines dsd; input Name : $150. ; datalines; BILATERAL POLYCYSTIC OVARIAN SYNDROME, HYPOTHYROIDISM, PANCREATIC INSUFFICIENCY, PANHYPOPITUITARISM ; run; proc sql noprint; select trim(name) into : ENDO_Name separated by '|' from ENDO_class; quit; %put &ENDO_Name; /* GENE Class */ data GENE_class; infile datalines dsd; input Name : $150. ; datalines; BECKWITH, METABOLIC DISEASE VACTERL ASSOCIATION ; run; proc sql noprint; select trim(name) into : GENE_Name separated by '|' from GENE_class; quit; %put &GENE_Name;
My first step is assigning the relate class based on the macro above to create a matching class column, as shown below.
/**** Data transform and create class column ****************/ data Dataout1; set Datain; Diagnosis2 = prxchange("s/[^,]*(&AIRWAY_Name.)[^,]*/AIRWAY/i", -1, Diagnosis); Diagnosis3 = prxchange("s/[^,]*(&ENDO_Name.)[^,]*/ENDO/i", -1, Diagnosis2); Diagnosis4 = prxchange("s/[^,]*(&GENE_Name.)[^,]*/GENE/i", -1, Diagnosis3); Diagnosis5 = prxchange('s/,(\w)/, \1/', -1, Diagnosis4); Diagnosis_update = Diagnosis5; Drop Diagnosis2-Diagnosis5; array vars airway endo gene; do over vars; if find(Diagnosis_update,vname(vars),"i") then vars=1; end; run;
My next step is removing the disease names in the above macro list from the column 'Diagnosis,' and appending them to the matching class_sp columns. Please see the details is show the 'Diagnosis_update', 'AIRWAY_sp', 'ENDO_sp', and 'GENE_sp', in the dataset 'Want.'
data Want; infile datalines delimiter='/'; input Diagnosis : $300. AIRWAY AIRWAY_sp : $200. ENDO ENDO_sp : $200. GENE GENE_sp : $200. Diagnosis_update : $200. ; datalines; CLEFT PALETTE, HYPOTHYROIDISM, CHRONIC RHINITIS / 1 / LARYNGOMALACIA, TEF, CLEFT PALETTE / 1 / HYPOTHYROIDISM / 0 / / CHRONIC RHINITIS / HAVING A GTUBE, BILATERAL POLYCYSTIC OVARIAN SYNDROME, AUTISM/ 0 / / 1 / BILATERAL POLYCYSTIC OVARIAN SYNDROME / 0 / / HAVING A GTUBE, AUTISM / VACTERL ASSOCIATION, TRACHEOESOPHAGEAL FISTULA, TEF / 1 / TRACHEOESOPHAGEAL FISTULA, TEF / 1 / VACTERL ASSOCIATION / 0 / / / AUTISM SPECTRUM, SPEECH & COGNITIVE DELAY, BECKWITH / 0 / / 0 / / 1 / METABOLIC DISEASE, BECKWITH / AUTISM SPECTRUM, SPEECH & COGNITIVE DELAY / ; run;
Does anyone how to approach the last step? Thank you.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.