Hi Experts,
I have a sample set, Datain, as shown below. Column 'Diagnosis' contains various names.
data Datain; infile datalines delimiter='/'; input Diagnosis : $300. AIRWAY_sp : $200. ENDO_sp : $200. GENE_sp : $200. ; datalines; CLEFT PALETTE, HYPOTHYROIDISM, CHRONIC RHINITIS / LARYNGOMALACIA,TEF / / / HAVING A GTUBE, BILATERAL POLYCYSTIC OVARIAN SYNDROME, AUTISM/ / / / VACTERL ASSOCIATION, TRACHEOESOPHAGEAL FISTULA, TEF / / / / AUTISM SPECTRUM, SPEECH & COGNITIVE DELAY, BECKWITH / / / METABOLIC DISEASE / ; run;
I have three different classes, AIRWAY, ENDO, and GENE. Each class includes different disease names as shown below.
/* AIRWAY Class */ data AIRWAY_class; infile datalines dsd; input Name : $150. ; datalines; CLEFT PALETTE, LARYNGOMALACIA, INEFFECTIVE AIRWAY CLEARAN, TRACHEOESOPHAGEAL FISTULA, TEF ; run; proc sql noprint; select trim(name) into : AIRWAY_Name separated by '|' from AIRWAY_class; quit; %put &AIRWAY_Name; /* ENDO Class */ data ENDO_class; infile datalines dsd; input Name : $150. ; datalines; BILATERAL POLYCYSTIC OVARIAN SYNDROME, HYPOTHYROIDISM, PANCREATIC INSUFFICIENCY, PANHYPOPITUITARISM ; run; proc sql noprint; select trim(name) into : ENDO_Name separated by '|' from ENDO_class; quit; %put &ENDO_Name; /* GENE Class */ data GENE_class; infile datalines dsd; input Name : $150. ; datalines; BECKWITH, METABOLIC DISEASE VACTERL ASSOCIATION ; run; proc sql noprint; select trim(name) into : GENE_Name separated by '|' from GENE_class; quit; %put &GENE_Name;
My first step is assigning the relate class based on the macro above to create a matching class column, as shown below.
/**** Data transform and create class column ****************/ data Dataout1; set Datain; Diagnosis2 = prxchange("s/[^,]*(&AIRWAY_Name.)[^,]*/AIRWAY/i", -1, Diagnosis); Diagnosis3 = prxchange("s/[^,]*(&ENDO_Name.)[^,]*/ENDO/i", -1, Diagnosis2); Diagnosis4 = prxchange("s/[^,]*(&GENE_Name.)[^,]*/GENE/i", -1, Diagnosis3); Diagnosis5 = prxchange('s/,(\w)/, \1/', -1, Diagnosis4); Diagnosis_update = Diagnosis5; Drop Diagnosis2-Diagnosis5; array vars airway endo gene; do over vars; if find(Diagnosis_update,vname(vars),"i") then vars=1; end; run;
My next step is removing the disease names in the above macro list from the column 'Diagnosis,' and appending them to the matching class_sp columns. Please see the details is show the 'Diagnosis_update', 'AIRWAY_sp', 'ENDO_sp', and 'GENE_sp', in the dataset 'Want.'
data Want; infile datalines delimiter='/'; input Diagnosis : $300. AIRWAY AIRWAY_sp : $200. ENDO ENDO_sp : $200. GENE GENE_sp : $200. Diagnosis_update : $200. ; datalines; CLEFT PALETTE, HYPOTHYROIDISM, CHRONIC RHINITIS / 1 / LARYNGOMALACIA, TEF, CLEFT PALETTE / 1 / HYPOTHYROIDISM / 0 / / CHRONIC RHINITIS / HAVING A GTUBE, BILATERAL POLYCYSTIC OVARIAN SYNDROME, AUTISM/ 0 / / 1 / BILATERAL POLYCYSTIC OVARIAN SYNDROME / 0 / / HAVING A GTUBE, AUTISM / VACTERL ASSOCIATION, TRACHEOESOPHAGEAL FISTULA, TEF / 1 / TRACHEOESOPHAGEAL FISTULA, TEF / 1 / VACTERL ASSOCIATION / 0 / / / AUTISM SPECTRUM, SPEECH & COGNITIVE DELAY, BECKWITH / 0 / / 0 / / 1 / METABOLIC DISEASE, BECKWITH / AUTISM SPECTRUM, SPEECH & COGNITIVE DELAY / ; run;
Does anyone how to approach the last step? Thank you.
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Select SAS Training centers are offering in-person courses. View upcoming courses for: