shortcut code

Dhana18 · Posted 09-24-2019 04:33 PM

Hi,
  I have a data set with about 4000 records with 14 different specimen, tests, and results. I want to know what test is positive or negative. So i am creating this code to create testres1- testres14.
i am sure there must be a shortcut way to do this. i am not very familiar with Arrray function. Please help.

 data testing_1;
set testing;
length Testres1 $15;
if specimen1="Pharyngeal" AND testType1="Culture" AND Result1="Not Applicable (e.g. Culture positive for N. Meningitides)" then Testres1="PharCulNA";
if specimen1="Pharyngeal" AND testType1="Culture" AND Result1="GC Posotive" then Testres1="PharCulPos";
if specimen1="Pharyngeal" AND testType1="Culture" AND Result1="GC Negative" then Testres1="PharCulNeg"; 
if specimen1="Pharyngeal" AND testType1="GC nucleic acid amplification test (NAAT)" AND Result1="GC Posotive" then Testres1="PharNAATPos";
if specimen1="Pharyngeal" AND testType1="GC nucleic acid amplification test (NAAT)" AND Result1="GC Negative" then Testres1="PharNAATNeg";

if specimen1="Urethral" AND testType1="GC nucleic acid amplification test (NAAT)" AND Result1="GC Negative" then Testres1="UretNAATNeg";
if specimen1="Urethral" AND testType1="GC nucleic acid amplification test (NAAT)" AND Result1="GC Positive" then Testres1="UretNAATPos";
if specimen1="Urethral" AND testType1="Culture" AND Result1="GC Negative" then Testres1="UretCulNeg";
if specimen1="Urethral" AND testType1="Culture" AND Result1="GC Positive" then Testres1="UretCulPos";

if specimen1="Rectal" AND testType1="GC nucleic acid amplification test (NAAT)" AND Result1="GC Positive" then Testres1="RectNAATPos";
if specimen1="Rectal" AND testType1="GC nucleic acid amplification test (NAAT)" AND Result1="GC Negative" then Testres1="RectNAATNeg";
if specimen1="Rectal" AND testType1="Culture" AND Result1="GC Positive" then Testres1="RectCulPos";
if specimen1="Rectal" AND testType1="Culture" AND Result1="GC Negative" then Testres1="RectCulNeg";

if specimen1="Endocervical" AND testType1="GC nucleic acid amplification test (NAAT)" AND Result1="GC Negative" then Testres1="EndoNAATNeg";
if specimen1="Endocervical" AND testType1="GC nucleic acid amplification test (NAAT)" AND Result1="GC Positive" then Testres1="EndoNAATPos";
if specimen1="Endocervical" AND testType1="Culture" AND Result1="GC Negative" then Testres1="EndoCulNeg";
if specimen1="Endocervical" AND testType1="Culture" AND Result1="GC Positive" then Testres1="EndoCulPos";

if specimen1="Urine" AND testType1="GC nucleic acid amplification test (NAAT)" AND Result1="GC Negative" then Testres1="UrinNAATNeg";
if specimen1="Urine" AND testType1="GC nucleic acid amplification test (NAAT)" AND Result1="GC Positive" then Testres1="UrinNAATPos";
if specimen1="Urine" AND testType1="Culture" AND Result1="GC Negative" then Testres1="UrinCulNeg";
if specimen1="Urine" AND testType1="Culture" AND Result1="GC Positive" then Testres1="UrinCulPos";
run;

Kurt_Bremser · Posted 09-24-2019 04:56 PM

Use the data step select() statement:

select (specimen1);
  when ("Pharyngeal") select (testtype1);
    when ("Culture") select (result1);
      when ("Not Applicable (e.g. Culture positive for N. Meningitides)") testres1="PharCulNA";
      when ("GC Posotive") testres1 = "PharCulPos";
      when ("GC Negative") testres1 = "PharCulNeg";
    end;
    when ("GC nucleic acid amplification test (NAAT)") select (result1);
      when (GC Posotive") testres1 = "PharNAATPos";

and so on.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

ballardw · Posted 09-24-2019 05:16 PM

I didn't provide complete examples of the formats for specimen and testtype below but I think the pattern should be easy to follow.

proc format library=work;
value $spec
"Pharyngeal"='Phar'
"Urethral"  ="Uret"
;
value $tt
"Culture"= 'Cul'
"GC nucleic acid amplification test (NAAT)" = "NAAT"
;
value $res
"Not Applicable (e.g. Culture positive for N. Meningitides)" = "NA"
"GC Posotive" ="Pos"
"GC Negative" ="Neg"
;
run;

data example;
   infile datalines dlm=',' dsd truncover;
   informat specimen1 $15. testtype1 $50. result1 $70.; 
   input specimen1 testtype1 result1;
   length testres1 $ 15;
   testres1= cats(put(specimen1,$spec.),put(testtype1,$tt.),put(result1,$res.));
datalines;
"Pharyngeal","Culture","Not Applicable (e.g. Culture positive for N. Meningitides)"
"Pharyngeal","Culture","GC Posotive"
"Pharyngeal","Culture","GC Negative"
"Pharyngeal","GC nucleic acid amplification test (NAAT)","Not Applicable (e.g. Culture positive for N. Meningitides)"
"Pharyngeal","GC nucleic acid amplification test (NAAT)","GC Posotive"
"Pharyngeal","GC nucleic acid amplification test (NAAT)","GC Negative"
"Urethral","Culture","Not Applicable (e.g. Culture positive for N. Meningitides)"
"Urethral","Culture","GC Posotive"
"Urethral","Culture","GC Negative"
"Urethral","GC nucleic acid amplification test (NAAT)","Not Applicable (e.g. Culture positive for N. Meningitides)"
"Urethral","GC nucleic acid amplification test (NAAT)","GC Posotive"
"Urethral","GC nucleic acid amplification test (NAAT)","GC Negative"

;
run;

You may have to tweek length of the testres if you have other not-listed items to hold the results.

Arrays would only come into this if you have specimen1, specimen2, specimen3 with matching test type and result variables if that is the case and you need to create matching test results that would look something like:

data testresult;
   set testing;
   array s specimen1-specimen3;
   array t testtype1-testtype3;
   array r result1  -result3;
   array tr (3) $15;
   do i= 1 to dim(s);
      tr[i]=cats(put(s[i],$spec.),put(t[i],$tt.),put(r[i],$res.));
   end;
   drop i;
run;

PGStats · Posted 09-24-2019 11:53 PM

I would prefer to keep the rules for testing1 separate from the program. Put the rules in a dataset and use code to create the new variable:

data key;
  length specimen1 testType1 Result1 $64 Testres1 $15;
  input (specimen1 testType1 Result1 Testres1) (&);
  datalines;
  Pharyngeal    Culture    Not Applicable (e.g. Culture positive for N. Meningitides)   PharCulNA 
  Pharyngeal    Culture    GC Posotive   PharCulPos 
  Pharyngeal    Culture    GC Negative   PharCulNeg  
  Pharyngeal    GC nucleic acid amplification test (NAAT)    GC Posotive   PharNAATPos 
  Pharyngeal    GC nucleic acid amplification test (NAAT)    GC Negative   PharNAATNeg 
  Urethral    GC nucleic acid amplification test (NAAT)    GC Negative   UretNAATNeg 
  Urethral    GC nucleic acid amplification test (NAAT)    GC Positive   UretNAATPos 
  Urethral    Culture    GC Negative   UretCulNeg 
  Urethral    Culture    GC Positive   UretCulPos 
  Rectal    GC nucleic acid amplification test (NAAT)    GC Positive   RectNAATPos 
  Rectal    GC nucleic acid amplification test (NAAT)    GC Negative   RectNAATNeg 
  Rectal    Culture    GC Positive   RectCulPos 
  Rectal    Culture    GC Negative   RectCulNeg 
  Endocervical    GC nucleic acid amplification test (NAAT)    GC Negative   EndoNAATNeg 
  Endocervical    GC nucleic acid amplification test (NAAT)    GC Positive   EndoNAATPos 
  Endocervical    Culture    GC Negative   EndoCulNeg 
  Endocervical    Culture    GC Positive   EndoCulPos 
  Urine    GC nucleic acid amplification test (NAAT)    GC Negative   UrinNAATNeg 
  Urine    GC nucleic acid amplification test (NAAT)    GC Positive   UrinNAATPos 
  Urine    Culture    GC Negative   UrinCulNeg 
  Urine    Culture    GC Positive   UrinCulPos 
  ;
  proc sql;
  create table testing_1 as
  select a.*, b.testres1
  from testing as a natural left join key as b;
  quit;

PG

Dhana18 · Posted 09-25-2019 12:09 AM

Thank you for taking time to help me. I am little unclear about the solution you provided. Let tell you little bit about the data set I have. I have a patient visit data set with about 400 visit. In each visit patient can have multiple specimens collected for different tests and have multiple results. I have unique visited and specimen1-specimen14, Testytype1-Testype14, Result1-result14. I could probably just do proc freq to find neg or pos result of specific test, but I wanted to create a separate variable for each specimen, test, and result.
So my question here is how does creating a new dataset helps and how do I connect this new dataset with visit id? I am sorry my question may confusing to you.

PGStats · Posted 09-25-2019 12:39 AM

You should really consider transforming your data structure from wide to long, with one record for each specimen and test. That would simplify your analysis quite a bit. Your records would be something like:

patientId, specimenId, specimenType, testType, result, testRes

with testRes derived from a table, as I proposed.

PG

shortcut code

Re: shortcut code

Re: shortcut code

Re: shortcut code

Re: shortcut code

Re: shortcut code

shortcut code

Re: shortcut code

Re: shortcut code

Re: shortcut code

Re: shortcut code

Re: shortcut code

SAS Innovate 2025: Call for Content

Click image to register for webinar

Classroom Training Available!