I have a data like this IncidentID CollDate Specimen TestType Result 233 7/6/2017 Urethral Culture GC Positive 233 7/17/2017 Urethral Culture GC Negative 394 7/17/2017 Rectal Culture GC Negative 394 7/17/2017 Pharyngeal Culture GC Positive 394 7/24/2017 Pharyngeal Culture GC Negative 878 8/1/2017 Endocervical Culture GC Negative 878 8/1/2017 Pharyngeal Culture GC Positive 906 11/27/2017 Urethral Culture GC Positive 906 11/27/2017 Pharyngeal Culture GC Negative 906 12/11/2017 Urethral Culture GC Negative 916 11/29/2017 Endocervical Culture GC Positive 916 11/29/2017 Pharyngeal Culture GC Positive 916 1/10/2018 Endocervical Culture GC Negative 916 1/10/2018 Pharyngeal Culture GC Negative 830 11/29/2017 Pharyngeal Culture GC Positive 830 11/29/2017 Rectal Culture GC Positive 830 11/29/2017 Endocervical Culture GC Negative 182 12/12/2017 Urethral Culture GC Positive 182 12/12/2017 Rectal Culture GC Positive I am trying to analyze, how many of these id had more than one specimen collected? How many of them have more than one and or more specimen positive? How many days after same specimen tested negative for the same ID? I want the table like this: IncidentID CollDate Site tested Site_infected Days between Neg and Pos test 233 7/6/2017 Urethral urethra 11 394 7/17/2017 Rectal&Pharyngeal Pharyngeal 7 878 8/1/2017 EndocervicalPharyngeal Pharyngeal NA 906 11/27/2017 Urethral&Pharyngeal Urethral 14 916 11/29/2017 Endocervical&Pharyngeal Pharyngeal 42 830 11/29/2017 Pharyngeal&Rectal&Endocervical Pharyngeal& rectal NA 182 12/12/2017 Urethral&Rectal Urethral&Rectal NA
I tried this code, but did not work
data LABCHTLK.LAB_JAN20_CULTURE_TEST_D;
set LABCHTLK.LAB_JAN20_CULTURE_TEST_C;
LENGTH SITE_TESTED $20;
IF SPECIMEN = "Urethral" then SITE_TESTED="URETHRAL_ONLY";
IF SPECIMEN = "Endocervical" then SITE_TESTED="Endocerv_Only";
IF SPECIMEN = "Pharyngeal" then SITE_TESTED="PHAR_only";
IF SPECIMEN = "Rectal" then SITE_TESTED="RECTAL_only" ;
IF SPECIMEN IN("Pharyngeal" , "Rectal")then SITE_TESTED="PHAR&RECT";
IF SPECIMEN IN("Endocervical" , "Pharyngeal")then SITE_TESTED="ENDO&PHAR";
IF SPECIMEN IN("Pharyngeal" , "Urethral")then SITE_TESTED="URETH&PHAR";
IF SPECIMEN IN("Endocervical" , "Pharyngeal" ,"Rectal") then SITE_TESTED="ENDO&PHAR&RECT";
IF SPECIMEN IN("Pharyngeal" , "Rectal" , "Urethral") then SITE_TESTED="PHAR&RECT&URETH";
RUN;
Please explain "did not work". Are you getting the wrong results (if so, be specific about what is wrong)? Or is there an error in the log? In either case, please show us, and please provide the data following these instructions: How to create a data step version of your data AKA generate sample data for forums
log says
NOTE: There were 359 observations read from the data set LABCHTLK.LAB_JAN20_CULTURE_TEST_A.
NOTE: The data set LABCHTLK.LAB_JAN20_CULTURE_TEST_BAA has 359 observations and 18 variables.
NOTE: DATA statement used (Total process time):
real time 0.10 seconds
cpu time
The out put is this IncidentID CollDate Specimen TestType Result SITE_TESTED 233 7/6/2017 Urethral Culture GC Positive PHAR&RECT&URETH 233 7/17/2017 Urethral Culture GC Negative PHAR&RECT&URETH 394 7/17/2017 Rectal Culture GC Negative PHAR&RECT&URETH 394 7/17/2017 Pharyngeal Culture GC Positive PHAR&RECT&URETH 394 7/24/2017 Pharyngeal Culture GC Negative PHAR&RECT&URETH 878 8/1/2017 Endocervical Culture GC Negative ENDO&PHAR&RECT 878 8/1/2017 Pharyngeal Culture GC Positive PHAR&RECT&URETH 906 11/27/2017 Urethral Culture GC Positive PHAR&RECT&URETH 906 11/27/2017 Pharyngeal Culture GC Negative PHAR&RECT&URETH 906 12/11/2017 Urethral Culture GC Negative PHAR&RECT&URETH 916 11/29/2017 Endocervical Culture GC Positive ENDO&PHAR&RECT 916 11/29/2017 Pharyngeal Culture GC Positive PHAR&RECT&URETH 916 1/10/2018 Endocervical Culture GC Negative ENDO&PHAR&RECT 916 1/10/2018 Pharyngeal Culture GC Negative PHAR&RECT&URETH 830 11/29/2017 Pharyngeal Culture GC Positive PHAR&RECT&URETH 830 11/29/2017 Rectal Culture GC Positive PHAR&RECT&URETH 830 11/29/2017 Endocervical Culture GC Negative ENDO&PHAR&RECT 182 12/12/2017 Urethral Culture GC Positive PHAR&RECT&URETH 182 12/12/2017 Rectal Culture GC Positive PHAR&RECT&URETH
It is not the way I like to have my table. I want the table I mentioned earlier.
Next code is not tested and I believe it is not complete, which means that I have the filling
that it does not cover all possible combinations but it may give you some idea of how to deal with what you want.
Try it and check the output:
data have;
infile cards;
input IncidentID
CollDate mmddyy10.
Specimen $
TestType $
Result $;
cards;
233 7/6/2017 Urethral Culture GC Positive
233 7/17/2017 Urethral Culture GC Negative
394 7/17/2017 Rectal Culture GC Negative
394 7/17/2017 Pharyngeal Culture GC Positive
394 7/24/2017 Pharyngeal Culture GC Negative
878 8/1/2017 Endocervical Culture GC Negative
878 8/1/2017 Pharyngeal Culture GC Positive
906 11/27/2017 Urethral Culture GC Positive
906 11/27/2017 Pharyngeal Culture GC Negative
906 12/11/2017 Urethral Culture GC Negative
916 11/29/2017 Endocervical Culture GC Positive
916 11/29/2017 Pharyngeal Culture GC Positive
916 1/10/2018 Endocervical Culture GC Negative
916 1/10/2018 Pharyngeal Culture GC Negative
830 11/29/2017 Pharyngeal Culture GC Positive
830 11/29/2017 Rectal Culture GC Positive
830 11/29/2017 Endocervical Culture GC Negative
182 12/12/2017 Urethral Culture GC Positive
182 12/12/2017 Rectal Culture GC Positive
;
run;
proc sort data=have;
by IncidentID Specimen CollDate;
run;
data want;
set have;
by IncidentID Specimen;
length Site_tested $40 Site_infected $40 days $3;
retain save_colldate Site_tested Site_infected Days save_result;
label days = "Days between Neg and Pos test";
if first.IncidentID then do;
site_tested = ' ';
Site_infected = ' ';
end;
if first.Specimen then do;
save_colldate = colldate;
if Site_tested = ' ' then Site_tested = Specimen;
else site_tested = catx('&',site_tested,Specimen);
save_result = result;
if locase(result) = 'positive' then do;
if site_infected = ' ' then site_infected = Specimen;
else site_infected = catx('&',site_infected, Specimen);
end;
end;
else if result ne save_result
then days = put(colldate - save_colldate, 3.);
else days = 'NA';
if last.IncidentID then output;
run;
I suggest to create a separate record for each IncidentID Specimen instead per IncidentID.
Please replace locase into lowcase. It was a typo.
You need to decide what you actually want.
Example: You are testing for the value "Endocervical" in 3 separate places, all with equality comparisons (that is what IN does: is the variable equal to any of the listed values) and attempting to assign the same variable Site_tested to different result values. So the last one encountered is going to be set. I suspect that you have way more "PHAR&RECT&URETH" values than you expected to see.
Also you have a "Days between Neg and Pos test" requirement. What do you want if there are multiple positive tests? Multiple negative tests? Is there a "within x days" requirement? And since you have multiple tests on the same day what are the rules involving one or more positive with one or more negative on the same day? These rules need a lot of fleshing out.
And if there are multiple positive tests what should Site_infected look like?
If you want to compare records across visits you need to either transpose the data to a wide format so all of the specimens and results are on one record (likely to be very messy) or create and maintain cumulative variables (Retain) and then keep the last record as the desired outcome. Since you have that "days between" requirement you will need to maintain a "first infection date" for comparisons with other collection dates.
I know that my testing data on similar topics can have 15 or more tests inside a 6 month period for some clients. So rules on any date limits need to be very carefully considered.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.