BookmarkSubscribeRSS Feed
gdaymte
Obsidian | Level 7

We are working on a case-case comparison project.

What we are wanting to do is identify clusters of Salmonella cases that have the same serotype (variable=SeroSite) and PFGE pattern (variable=PfgePrimary) within a 45 day period (variable=DtSpec) and compare those to all other records in the dataset (excluding those that match by serotype). For those cases that do match by serotype and PFGE, we are assigning a value of 1 to the variable case. If a record matches by serotype but not pattern, we would like to assign the variable case to missing. If a record does not match by serotype or PFGE pattern, we would like to assign case a value of 0. Our problem is that the cluster will always be changing, so we can't code around specific values. We would like to automate it as much as possible.

Below is what the data looks like so far...

SeroSitePfgePrimaryDtSpecCase
EnteritidisXBA.000401/01/20141
EnteritidisXBA.000401/18/20141
EnteritidisXBA.000402/28/20141
TyphimuriumXBA.131402/04/2014
EnteritidisXBA.000503/01/2014
CerroXBA.232702/15/2014
TyphimuriumXBA.121202/01/2014

This is what we need it to look like...

SeroSitePfgePrimaryDtSpecCase
EnteritidisXBA.000401/01/20141
EnteritidisXBA.000401/18/20141
EnteritidisXBA.000402/28/20141
TyphimuriumXBA.131402/04/20140
EnteritidisXBA.000503/01/2014.
CerroXBA.232702/15/20140
TyphimuriumXBA.121202/01/20140

We are unsure of how we can go about doing this...We could assign a value to case if we were able to somehow assign a dummy serotype variable (CaseSero) to all records in the dataset the value of the SeroSite variable where case=1. Any suggestions would be helpful.

5 REPLIES 5
ballardw
Super User

Do you need to manually assign the range on the DTSpec to determine the 45 day period or do you look at the date range recorded for the PfgePrimary and then determine the 45 day period?

Do you do this for each Serotype/PfgePrimary pair or only select ones?
What do you do if the PfgePrimary has a range of dates greater than 45 days? And likely a few more questions about order of rules and desired output.

I think a more detailed step-by-step description of a manual process will help us answer your question for automation.

Ksharp
Super User

"Our problem is that the cluster will always be changing,"

I don't understand.  if you already have 1 in CASE , you can fix these cluster  ? i.e. give it a initial value when case=1.

gdaymte
Obsidian | Level 7

We interview all Salmonella cases about their exposure history in the 7 day period before their illness. We have over 3 years worth of exposure data to look at. If a cluster of 2 or more cases is identified within  45 day period, there is a greater chance that there is a common exposure of interest.

The 45-day period is based on "Today's Date"... whatever day you run the program. If within that 45-day range a cluster of 2 or more cases (serotype/PFGE match) are found, a new dataset is output with that cluster sorted at the top and assigned case=1. If a record has the same serotype/PFGE match, but falls outside of the 45 day window, they will not be considered a part of the cluster. This is a standard way of looking at things as we are trying to identify common exposures during outbreaks.

A new dataset is generated for each and every cluster identified. So one week, there may be one cluster identified with one dataset output. In another week, there may be 10 clusters identified...so 10 datasets will be output with one cluster sorted at the top in each dataset. Whichever cluster is at the top will be assigned a value of 1 for the variable case. All other records in the dataset will need to be assigned a value of case=. or case=0. Missing will need to be those records that have the same serotype as the case but different PFGE. Zeros will be those with a different serotype than the case.

Each cluster will be evaluated separately. We will compare exposures for those cases to all of the other records in the dataset (excluding those records that have the same serotype as the case but different PFGE). Everything else in the dataset, which is a moving target of records over the last 3 years, will be considered the "control" group. I hope this explains a bit better.

The part we are need help with is the assignment of missing and zeros to the variable case. Everything else has been completed. We need to figure out a way to do this based upon what the serotype and PFGE is for the case.

Ksharp
Super User

Still not get you, since I am not in pharmaceutical field. You said there is a 45 days window, but in your original data ,date is over that window for the category marked 1.

data have;
infile cards truncover expandtabs;
input serosite : $20. pfgeprimary : $20. dtspec : $20. case;
cards;
Enteritidis     XBA.0004     01/01/2014     1
Enteritidis     XBA.0004     01/18/2014     1
Enteritidis     XBA.0004     02/28/2014     1
Typhimurium     XBA.1314     02/04/2014     
Enteritidis     XBA.0005     03/01/2014     
Cerro     XBA.2327     02/15/2014     
Typhimurium     XBA.1212     02/01/2014     
;
run;
proc sql;
create table want as
 select a.serosite,a.pfgeprimary,a.dtspec, case 
                                       when a.serosite eq b.serosite and a.pfgeprimary eq b.pfgeprimary then 1
                                                when a.serosite ne b.serosite and a.pfgeprimary ne b.pfgeprimary then 0
                                       when a.serosite eq b.serosite and a.pfgeprimary ne b.pfgeprimary then .
                                       else 99999
                                              end as case
  from have as a,(select distinct serosite,pfgeprimary from have as h where h.case eq 1) as b ;
  quit;


Xia Keshan

Message was edited by: xia keshan

gdaymte
Obsidian | Level 7

Sorry I just made a typo...The 3rd Enteritidis should say 01/28/2014. The year on the 4th Enteritidis should be 2013.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 892 views
  • 0 likes
  • 3 in conversation