Hi,
I have a data set with about 4000 records with 14 different specimen, tests, and results. I want to know what test is positive or negative. So i am creating this code to create testres1- testres14.
i am sure there must be a shortcut way to do this. i am not very familiar with Arrray function. Please help.
data testing_1;
set testing;
length Testres1 $15;
if specimen1="Pharyngeal" AND testType1="Culture" AND Result1="Not Applicable (e.g. Culture positive for N. Meningitides)" then Testres1="PharCulNA";
if specimen1="Pharyngeal" AND testType1="Culture" AND Result1="GC Posotive" then Testres1="PharCulPos";
if specimen1="Pharyngeal" AND testType1="Culture" AND Result1="GC Negative" then Testres1="PharCulNeg";
if specimen1="Pharyngeal" AND testType1="GC nucleic acid amplification test (NAAT)" AND Result1="GC Posotive" then Testres1="PharNAATPos";
if specimen1="Pharyngeal" AND testType1="GC nucleic acid amplification test (NAAT)" AND Result1="GC Negative" then Testres1="PharNAATNeg";
if specimen1="Urethral" AND testType1="GC nucleic acid amplification test (NAAT)" AND Result1="GC Negative" then Testres1="UretNAATNeg";
if specimen1="Urethral" AND testType1="GC nucleic acid amplification test (NAAT)" AND Result1="GC Positive" then Testres1="UretNAATPos";
if specimen1="Urethral" AND testType1="Culture" AND Result1="GC Negative" then Testres1="UretCulNeg";
if specimen1="Urethral" AND testType1="Culture" AND Result1="GC Positive" then Testres1="UretCulPos";
if specimen1="Rectal" AND testType1="GC nucleic acid amplification test (NAAT)" AND Result1="GC Positive" then Testres1="RectNAATPos";
if specimen1="Rectal" AND testType1="GC nucleic acid amplification test (NAAT)" AND Result1="GC Negative" then Testres1="RectNAATNeg";
if specimen1="Rectal" AND testType1="Culture" AND Result1="GC Positive" then Testres1="RectCulPos";
if specimen1="Rectal" AND testType1="Culture" AND Result1="GC Negative" then Testres1="RectCulNeg";
if specimen1="Endocervical" AND testType1="GC nucleic acid amplification test (NAAT)" AND Result1="GC Negative" then Testres1="EndoNAATNeg";
if specimen1="Endocervical" AND testType1="GC nucleic acid amplification test (NAAT)" AND Result1="GC Positive" then Testres1="EndoNAATPos";
if specimen1="Endocervical" AND testType1="Culture" AND Result1="GC Negative" then Testres1="EndoCulNeg";
if specimen1="Endocervical" AND testType1="Culture" AND Result1="GC Positive" then Testres1="EndoCulPos";
if specimen1="Urine" AND testType1="GC nucleic acid amplification test (NAAT)" AND Result1="GC Negative" then Testres1="UrinNAATNeg";
if specimen1="Urine" AND testType1="GC nucleic acid amplification test (NAAT)" AND Result1="GC Positive" then Testres1="UrinNAATPos";
if specimen1="Urine" AND testType1="Culture" AND Result1="GC Negative" then Testres1="UrinCulNeg";
if specimen1="Urine" AND testType1="Culture" AND Result1="GC Positive" then Testres1="UrinCulPos";
run;
Use the data step select() statement:
select (specimen1);
when ("Pharyngeal") select (testtype1);
when ("Culture") select (result1);
when ("Not Applicable (e.g. Culture positive for N. Meningitides)") testres1="PharCulNA";
when ("GC Posotive") testres1 = "PharCulPos";
when ("GC Negative") testres1 = "PharCulNeg";
end;
when ("GC nucleic acid amplification test (NAAT)") select (result1);
when (GC Posotive") testres1 = "PharNAATPos";
and so on.
I didn't provide complete examples of the formats for specimen and testtype below but I think the pattern should be easy to follow.
proc format library=work; value $spec "Pharyngeal"='Phar' "Urethral" ="Uret" ; value $tt "Culture"= 'Cul' "GC nucleic acid amplification test (NAAT)" = "NAAT" ; value $res "Not Applicable (e.g. Culture positive for N. Meningitides)" = "NA" "GC Posotive" ="Pos" "GC Negative" ="Neg" ; run; data example; infile datalines dlm=',' dsd truncover; informat specimen1 $15. testtype1 $50. result1 $70.; input specimen1 testtype1 result1; length testres1 $ 15; testres1= cats(put(specimen1,$spec.),put(testtype1,$tt.),put(result1,$res.)); datalines; "Pharyngeal","Culture","Not Applicable (e.g. Culture positive for N. Meningitides)" "Pharyngeal","Culture","GC Posotive" "Pharyngeal","Culture","GC Negative" "Pharyngeal","GC nucleic acid amplification test (NAAT)","Not Applicable (e.g. Culture positive for N. Meningitides)" "Pharyngeal","GC nucleic acid amplification test (NAAT)","GC Posotive" "Pharyngeal","GC nucleic acid amplification test (NAAT)","GC Negative" "Urethral","Culture","Not Applicable (e.g. Culture positive for N. Meningitides)" "Urethral","Culture","GC Posotive" "Urethral","Culture","GC Negative" "Urethral","GC nucleic acid amplification test (NAAT)","Not Applicable (e.g. Culture positive for N. Meningitides)" "Urethral","GC nucleic acid amplification test (NAAT)","GC Posotive" "Urethral","GC nucleic acid amplification test (NAAT)","GC Negative" ; run;
You may have to tweek length of the testres if you have other not-listed items to hold the results.
Arrays would only come into this if you have specimen1, specimen2, specimen3 with matching test type and result variables if that is the case and you need to create matching test results that would look something like:
data testresult; set testing; array s specimen1-specimen3; array t testtype1-testtype3; array r result1 -result3; array tr (3) $15; do i= 1 to dim(s); tr[i]=cats(put(s[i],$spec.),put(t[i],$tt.),put(r[i],$res.)); end; drop i; run;
I would prefer to keep the rules for testing1 separate from the program. Put the rules in a dataset and use code to create the new variable:
data key;
length specimen1 testType1 Result1 $64 Testres1 $15;
input (specimen1 testType1 Result1 Testres1) (&);
datalines;
Pharyngeal Culture Not Applicable (e.g. Culture positive for N. Meningitides) PharCulNA
Pharyngeal Culture GC Posotive PharCulPos
Pharyngeal Culture GC Negative PharCulNeg
Pharyngeal GC nucleic acid amplification test (NAAT) GC Posotive PharNAATPos
Pharyngeal GC nucleic acid amplification test (NAAT) GC Negative PharNAATNeg
Urethral GC nucleic acid amplification test (NAAT) GC Negative UretNAATNeg
Urethral GC nucleic acid amplification test (NAAT) GC Positive UretNAATPos
Urethral Culture GC Negative UretCulNeg
Urethral Culture GC Positive UretCulPos
Rectal GC nucleic acid amplification test (NAAT) GC Positive RectNAATPos
Rectal GC nucleic acid amplification test (NAAT) GC Negative RectNAATNeg
Rectal Culture GC Positive RectCulPos
Rectal Culture GC Negative RectCulNeg
Endocervical GC nucleic acid amplification test (NAAT) GC Negative EndoNAATNeg
Endocervical GC nucleic acid amplification test (NAAT) GC Positive EndoNAATPos
Endocervical Culture GC Negative EndoCulNeg
Endocervical Culture GC Positive EndoCulPos
Urine GC nucleic acid amplification test (NAAT) GC Negative UrinNAATNeg
Urine GC nucleic acid amplification test (NAAT) GC Positive UrinNAATPos
Urine Culture GC Negative UrinCulNeg
Urine Culture GC Positive UrinCulPos
;
proc sql;
create table testing_1 as
select a.*, b.testres1
from testing as a natural left join key as b;
quit;
You should really consider transforming your data structure from wide to long, with one record for each specimen and test. That would simplify your analysis quite a bit. Your records would be something like:
patientId, specimenId, specimenType, testType, result, testRes
with testRes derived from a table, as I proposed.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.