Hi Experts,
I have a sample set, Datain, as shown below. Column 'Diagnosis' contains various names.
data Datain;
infile datalines delimiter='/';
input Diagnosis : $300. AIRWAY_sp : $200. ENDO_sp : $200. GENE_sp : $200. ;
datalines;
CLEFT PALETTE, HYPOTHYROIDISM, CHRONIC RHINITIS / LARYNGOMALACIA,TEF / / /
HAVING A GTUBE, BILATERAL POLYCYSTIC OVARIAN SYNDROME, AUTISM/ / / /
VACTERL ASSOCIATION, TRACHEOESOPHAGEAL FISTULA, TEF / / / /
AUTISM SPECTRUM, SPEECH & COGNITIVE DELAY, BECKWITH / / / METABOLIC DISEASE /
;
run;
I have three different classes, AIRWAY, ENDO, and GENE. Each class includes different disease names as shown below.
/* AIRWAY Class */ data AIRWAY_class; infile datalines dsd; input Name : $150. ; datalines; CLEFT PALETTE, LARYNGOMALACIA, INEFFECTIVE AIRWAY CLEARAN, TRACHEOESOPHAGEAL FISTULA, TEF ; run; proc sql noprint; select trim(name) into : AIRWAY_Name separated by '|' from AIRWAY_class; quit; %put &AIRWAY_Name; /* ENDO Class */ data ENDO_class; infile datalines dsd; input Name : $150. ; datalines; BILATERAL POLYCYSTIC OVARIAN SYNDROME, HYPOTHYROIDISM, PANCREATIC INSUFFICIENCY, PANHYPOPITUITARISM ; run; proc sql noprint; select trim(name) into : ENDO_Name separated by '|' from ENDO_class; quit; %put &ENDO_Name; /* GENE Class */ data GENE_class; infile datalines dsd; input Name : $150. ; datalines; BECKWITH, METABOLIC DISEASE VACTERL ASSOCIATION ; run; proc sql noprint; select trim(name) into : GENE_Name separated by '|' from GENE_class; quit; %put &GENE_Name;
My first step is assigning the relate class based on the macro above to create a matching class column, as shown below.
/**** Data transform and create class column ****************/
data Dataout1;
set Datain;
Diagnosis2 = prxchange("s/[^,]*(&AIRWAY_Name.)[^,]*/AIRWAY/i", -1, Diagnosis);
Diagnosis3 = prxchange("s/[^,]*(&ENDO_Name.)[^,]*/ENDO/i", -1, Diagnosis2);
Diagnosis4 = prxchange("s/[^,]*(&GENE_Name.)[^,]*/GENE/i", -1, Diagnosis3);
Diagnosis5 = prxchange('s/,(\w)/, \1/', -1, Diagnosis4);
Diagnosis_update = Diagnosis5;
Drop Diagnosis2-Diagnosis5;
array vars airway endo gene;
do over vars;
if find(Diagnosis_update,vname(vars),"i") then vars=1;
end;
run;
My next step is removing the disease names in the above macro list from the column 'Diagnosis,' and appending them to the matching class_sp columns. Please see the details is show the 'Diagnosis_update', 'AIRWAY_sp', 'ENDO_sp', and 'GENE_sp', in the dataset 'Want.'
data Want;
infile datalines delimiter='/';
input Diagnosis : $300. AIRWAY AIRWAY_sp : $200. ENDO ENDO_sp : $200. GENE GENE_sp : $200. Diagnosis_update : $200. ;
datalines;
CLEFT PALETTE, HYPOTHYROIDISM, CHRONIC RHINITIS / 1 / LARYNGOMALACIA, TEF, CLEFT PALETTE / 1 / HYPOTHYROIDISM / 0 / / CHRONIC RHINITIS /
HAVING A GTUBE, BILATERAL POLYCYSTIC OVARIAN SYNDROME, AUTISM/ 0 / / 1 / BILATERAL POLYCYSTIC OVARIAN SYNDROME / 0 / / HAVING A GTUBE, AUTISM /
VACTERL ASSOCIATION, TRACHEOESOPHAGEAL FISTULA, TEF / 1 / TRACHEOESOPHAGEAL FISTULA, TEF / 1 / VACTERL ASSOCIATION / 0 / / /
AUTISM SPECTRUM, SPEECH & COGNITIVE DELAY, BECKWITH / 0 / / 0 / / 1 / METABOLIC DISEASE, BECKWITH / AUTISM SPECTRUM, SPEECH & COGNITIVE DELAY /
;
run;
Does anyone how to approach the last step? Thank you.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.