BookmarkSubscribeRSS Feed
Dhana18
Obsidian | Level 7
Hi,
  I have a data set with about 4000 records with 14 different specimen, tests, and results. I want to know what test is positive or negative. So i am creating this code to create testres1- testres14.
i am sure there must be a shortcut way to do this. i am not very familiar with Arrray function. Please help.

 data testing_1;
set testing;
length Testres1 $15;
if specimen1="Pharyngeal" AND testType1="Culture" AND Result1="Not Applicable (e.g. Culture positive for N. Meningitides)" then Testres1="PharCulNA";
if specimen1="Pharyngeal" AND testType1="Culture" AND Result1="GC Posotive" then Testres1="PharCulPos";
if specimen1="Pharyngeal" AND testType1="Culture" AND Result1="GC Negative" then Testres1="PharCulNeg"; 
if specimen1="Pharyngeal" AND testType1="GC nucleic acid amplification test (NAAT)" AND Result1="GC Posotive" then Testres1="PharNAATPos";
if specimen1="Pharyngeal" AND testType1="GC nucleic acid amplification test (NAAT)" AND Result1="GC Negative" then Testres1="PharNAATNeg";

if specimen1="Urethral" AND testType1="GC nucleic acid amplification test (NAAT)" AND Result1="GC Negative" then Testres1="UretNAATNeg";
if specimen1="Urethral" AND testType1="GC nucleic acid amplification test (NAAT)" AND Result1="GC Positive" then Testres1="UretNAATPos";
if specimen1="Urethral" AND testType1="Culture" AND Result1="GC Negative" then Testres1="UretCulNeg";
if specimen1="Urethral" AND testType1="Culture" AND Result1="GC Positive" then Testres1="UretCulPos";

if specimen1="Rectal" AND testType1="GC nucleic acid amplification test (NAAT)" AND Result1="GC Positive" then Testres1="RectNAATPos";
if specimen1="Rectal" AND testType1="GC nucleic acid amplification test (NAAT)" AND Result1="GC Negative" then Testres1="RectNAATNeg";
if specimen1="Rectal" AND testType1="Culture" AND Result1="GC Positive" then Testres1="RectCulPos";
if specimen1="Rectal" AND testType1="Culture" AND Result1="GC Negative" then Testres1="RectCulNeg";

if specimen1="Endocervical" AND testType1="GC nucleic acid amplification test (NAAT)" AND Result1="GC Negative" then Testres1="EndoNAATNeg";
if specimen1="Endocervical" AND testType1="GC nucleic acid amplification test (NAAT)" AND Result1="GC Positive" then Testres1="EndoNAATPos";
if specimen1="Endocervical" AND testType1="Culture" AND Result1="GC Negative" then Testres1="EndoCulNeg";
if specimen1="Endocervical" AND testType1="Culture" AND Result1="GC Positive" then Testres1="EndoCulPos";

if specimen1="Urine" AND testType1="GC nucleic acid amplification test (NAAT)" AND Result1="GC Negative" then Testres1="UrinNAATNeg";
if specimen1="Urine" AND testType1="GC nucleic acid amplification test (NAAT)" AND Result1="GC Positive" then Testres1="UrinNAATPos";
if specimen1="Urine" AND testType1="Culture" AND Result1="GC Negative" then Testres1="UrinCulNeg";
if specimen1="Urine" AND testType1="Culture" AND Result1="GC Positive" then Testres1="UrinCulPos";
run;
5 REPLIES 5
Kurt_Bremser
Super User

Use the data step select() statement:

select (specimen1);
  when ("Pharyngeal") select (testtype1);
    when ("Culture") select (result1);
      when ("Not Applicable (e.g. Culture positive for N. Meningitides)") testres1="PharCulNA";
      when ("GC Posotive") testres1 = "PharCulPos";
      when ("GC Negative") testres1 = "PharCulNeg";
    end;
    when ("GC nucleic acid amplification test (NAAT)") select (result1);
      when (GC Posotive") testres1 = "PharNAATPos";

and so on. 

ballardw
Super User

I didn't provide complete examples of the formats for specimen and testtype below but I think the pattern should be easy to follow.

 

proc format library=work;
value $spec
"Pharyngeal"='Phar'
"Urethral"  ="Uret"
;
value $tt
"Culture"= 'Cul'
"GC nucleic acid amplification test (NAAT)" = "NAAT"
;
value $res
"Not Applicable (e.g. Culture positive for N. Meningitides)" = "NA"
"GC Posotive" ="Pos"
"GC Negative" ="Neg"
;
run;

data example;
   infile datalines dlm=',' dsd truncover;
   informat specimen1 $15. testtype1 $50. result1 $70.; 
   input specimen1 testtype1 result1;
   length testres1 $ 15;
   testres1= cats(put(specimen1,$spec.),put(testtype1,$tt.),put(result1,$res.));
datalines;
"Pharyngeal","Culture","Not Applicable (e.g. Culture positive for N. Meningitides)"
"Pharyngeal","Culture","GC Posotive"
"Pharyngeal","Culture","GC Negative"
"Pharyngeal","GC nucleic acid amplification test (NAAT)","Not Applicable (e.g. Culture positive for N. Meningitides)"
"Pharyngeal","GC nucleic acid amplification test (NAAT)","GC Posotive"
"Pharyngeal","GC nucleic acid amplification test (NAAT)","GC Negative"
"Urethral","Culture","Not Applicable (e.g. Culture positive for N. Meningitides)"
"Urethral","Culture","GC Posotive"
"Urethral","Culture","GC Negative"
"Urethral","GC nucleic acid amplification test (NAAT)","Not Applicable (e.g. Culture positive for N. Meningitides)"
"Urethral","GC nucleic acid amplification test (NAAT)","GC Posotive"
"Urethral","GC nucleic acid amplification test (NAAT)","GC Negative"

;
run;

You may have to tweek length of the testres if you have other not-listed items to hold the results.

 

Arrays would only come into this if you have specimen1, specimen2, specimen3 with matching test type and result variables if that is the case and you need to create matching test results that would look something like:

data testresult;
   set testing;
   array s specimen1-specimen3;
   array t testtype1-testtype3;
   array r result1  -result3;
   array tr (3) $15;
   do i= 1 to dim(s);
      tr[i]=cats(put(s[i],$spec.),put(t[i],$tt.),put(r[i],$res.));
   end;
   drop i;
run;
PGStats
Opal | Level 21

I would prefer to keep the rules for testing1 separate from the program. Put the rules in a dataset and use code to create the new variable:

 

data key;
  length specimen1 testType1 Result1 $64 Testres1 $15;
  input (specimen1 testType1 Result1 Testres1) (&);
  datalines;
  Pharyngeal    Culture    Not Applicable (e.g. Culture positive for N. Meningitides)   PharCulNA 
  Pharyngeal    Culture    GC Posotive   PharCulPos 
  Pharyngeal    Culture    GC Negative   PharCulNeg  
  Pharyngeal    GC nucleic acid amplification test (NAAT)    GC Posotive   PharNAATPos 
  Pharyngeal    GC nucleic acid amplification test (NAAT)    GC Negative   PharNAATNeg 
  Urethral    GC nucleic acid amplification test (NAAT)    GC Negative   UretNAATNeg 
  Urethral    GC nucleic acid amplification test (NAAT)    GC Positive   UretNAATPos 
  Urethral    Culture    GC Negative   UretCulNeg 
  Urethral    Culture    GC Positive   UretCulPos 
  Rectal    GC nucleic acid amplification test (NAAT)    GC Positive   RectNAATPos 
  Rectal    GC nucleic acid amplification test (NAAT)    GC Negative   RectNAATNeg 
  Rectal    Culture    GC Positive   RectCulPos 
  Rectal    Culture    GC Negative   RectCulNeg 
  Endocervical    GC nucleic acid amplification test (NAAT)    GC Negative   EndoNAATNeg 
  Endocervical    GC nucleic acid amplification test (NAAT)    GC Positive   EndoNAATPos 
  Endocervical    Culture    GC Negative   EndoCulNeg 
  Endocervical    Culture    GC Positive   EndoCulPos 
  Urine    GC nucleic acid amplification test (NAAT)    GC Negative   UrinNAATNeg 
  Urine    GC nucleic acid amplification test (NAAT)    GC Positive   UrinNAATPos 
  Urine    Culture    GC Negative   UrinCulNeg 
  Urine    Culture    GC Positive   UrinCulPos 
  ;
  proc sql;
  create table testing_1 as
  select a.*, b.testres1
  from testing as a natural left join key as b;
  quit;
PG
Dhana18
Obsidian | Level 7
Thank you for taking time to help me. I am little unclear about the solution you provided. Let tell you little bit about the data set I have. I have a patient visit data set with about 400 visit. In each visit patient can have multiple specimens collected for different tests and have multiple results. I have unique visited and specimen1-specimen14, Testytype1-Testype14, Result1-result14. I could probably just do proc freq to find neg or pos result of specific test, but I wanted to create a separate variable for each specimen, test, and result.
So my question here is how does creating a new dataset helps and how do I connect this new dataset with visit id? I am sorry my question may confusing to you.
PGStats
Opal | Level 21

You should really consider transforming your data structure from wide to long, with one record for each specimen and test. That would simplify your analysis quite a bit. Your records would be something like:

 

patientId, specimenId, specimenType, testType, result, testRes

 

with testRes derived from a table, as I proposed.

PG

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 626 views
  • 1 like
  • 4 in conversation