As others have pointed out, you never specified what rules you want to execute. So you have left the job of interpreting the objective to us. Some folks (including me) have, after some study, deduced that each entry in the new variable is supposed to be extracted from successive parenthetical expressions.
At least that is what seems to be the case, even though your sample result does not completely honor that rule. That's why respondents have posted the question about row 2, and about MEASLES vs Rubella in line 1.
To help us help you, please be complete in describing the task.
Here is my suggestion, based on capturing consecutive pairs of parenthetical expressions - I call the first expresssion _CONDITION, and the second as _CODES. Then it checks _CODES for "IgM" and "IgG" to decide whether to add content to the new variable:
data Test;
length Test_Name $2000;
infile datalines delimiter='#';
input Test_Name;
datalines;
Encephalitis Antibody Panel, CSF Includes: Herpes Simplex Virus (HSV) 1/2 (IgG) Type Antibody, CSF Lymphocytic Choriomeningitis (LCM) Virus (IgG, IgM) Antibody, IFA, CSF Measles (Rubeola) (IgG, IgM) Antibody Panel, IFA, CSF Mumps Antibody Panel, IFA, CSF Varicella-Zoster Virus (VZV) Antibodies (Total, IgM), ACIF/IFA, CSF West Nile Virus (WNV) Antibodies (IgG, IgM), CSF #
Herpes Simplex Type 1 and Type 2 Glycoprotein G-Specific Antibodies, IgG by CIA#
run;
data want (drop=_: L R); /*L & R for left & right parentheses*/
set test;
length Sub_Analyte $1000;
do L=findc(test_name,'(') by 0 while(L^=0);
R=findc(test_name,')',,L);
if R=0 then leave;
_condition=substr(test_name,L+1,R-L-1);
L=findc(test_name,'(',,R);
R=findc(test_name,')',,L);
if L=0 or R=0 then leave;
_codes=substr(test_name,L+1,R-L-1);
do _txt='IgG','IgM';
if findw(_codes,trim(_txt),', ')>0 then sub_analyte=catx(', ',sub_analyte,cats(_condition,'_',_txt));
end;
L=findc(test_name,'(',,R);
end;
put sub_analyte=;
run;
... View more