Re: Need to flag records based on previous records with the same Patie...

gdaymte · Posted 08-29-2014 12:05 PM

I am trying to flag duplicates ('D'), carriers ('C') and multiple serotypes ('M') in a disease-based dataset (2005-2014) that I am working with. A duplicate is defined as a record that has a specimen collection date within 30 days from the original record with that PatientID and is the same pathogen and serotype combination. A carrier is defined as a record that has a specimen collection date (DtSpec) 31-365 days from the original record with that PatientID and is the same pathogen and serotype combination. If the record has the same pathogen, but different serotype then it would be classified as having multiple serotypes. If there are multiple records that have the same PatientID, but the Pathogen is different, then it would not be flagged. I also need to account for those records that were not serotyped or are missing DtSpec. If serotype was listed as not serotyped, then it is assumed that it is the same as the on that has a serotype. If a record is missing a DtSpec, but falls in the same year, it would get flagged as unknown (U).

PatientID	InvestigationID	Pathogen	Serotype	Year	DtSpec
N11674036	CAS10828288TN01	E. COLI	O157	2005	06/03/2005
N11674036	CAS10831498TN01	SALMONELLA	ENTERITIDIS	2010
N11734242	CAS10856256TN01	SHIGELLA	SONNEI	2007	01/05/2007
N11734242	CAS10865012TN01	SHIGELLA	SONNEI	2007
N11770074	CAS10867154TN01	SALMONELLA	NOT SEROTYPED	2013	03/15/2013
N11770074	CAS10867158TN01	SALMONELLA	TYPHIMURIUM	2013	04/02/2013
N11770136	CAS10857544TN01	SALMONELLA	JAVIANA	2009	10/02/2009
N11770136	CAS10857677TN01	SALMONELLA	BRAENDERUP	2009	10/02/2009
N11770144	CAS10858088TN01	SALMONELLA	TYPHIMURIUM	2006	05/01/2006
N11770144	CAS10856216TN01	SALMONELLA	TYPHIMURIUM	2008	11/01/2008
N11770144	CAS10859003TN01	SALMONELLA	TYPHIMURIUM	2008	12/31/2008

Ultimately, I would like to have a new variable (Flag) created that indicates why it has been flagged.

PatientID	InvestigationID	Pathogen	Serotype	Year	DtSpec	Flag
N11674036	CAS10828288TN01	E. COLI	O157	2005	06/03/2005
N11674036	CAS10831498TN01	SALMONELLA	ENTERITIDIS	2010
N11734242	CAS10856256TN01	SHIGELLA	SONNEI	2007	01/05/2007
N11734242	CAS10865012TN01	SHIGELLA	SONNEI	2007		U
N11770074	CAS10867154TN01	SALMONELLA	NOT SEROTYPED	2013	03/15/2013
N11770074	CAS10867158TN01	SALMONELLA	TYPHIMURIUM	2013	04/02/2013	D
N11770136	CAS10857544TN01	SALMONELLA	JAVIANA	2009	10/02/2009
N11770136	CAS10857677TN01	SALMONELLA	BRAENDERUP	2009	10/02/2009	M
N11770144	CAS10858088TN01	SALMONELLA	TYPHIMURIUM	2006	05/01/2006
N11770144	CAS10856216TN01	SALMONELLA	TYPHIMURIUM	2008	11/01/2008
N11770144	CAS10859003TN01	SALMONELLA	TYPHIMURIUM	2008	12/31/2008	C

I had some code I used in Crystal Reports using a function called previous that would flag some of this. Now that I have moved my extract entirely into SAS, I'm not sure where to begin. Any help would be greatly appreciated.

jakarman · Posted 08-29-2014 12:58 PM

sort duplicates? Sql ? using point (dataset) ?

What was the code in Chrystal reports?

---->-- ja karman --<-----

gdaymte · Posted 08-29-2014 01:10 PM

So the dataset is original sorted by Pathogen=>PatientID=>DtSpec.

Someone had mentioned using Retain and Lag. I've never used those before in SAS and am reading up on them. These days, I spend more of my time coding with Proc SQL

The code in Crystal Reports only flagged some of what I need to do...

IF PREVIOUS({Case_Lab_Data_New.Patient_Local_ID})={Case_Lab_Data_New.Patient_Local_ID}

AND PREVIOUS({Case_Lab_Data_New.Condition})={Case_Lab_Data_New.Condition}

AND PREVIOUS({Case_Lab_Data_New.SeroType})<>{Case_Lab_Data_New.SeroType}

THEN "M"

ELSE IF PREVIOUS({Case_Lab_Data_New.Patient_Local_ID})={Case_Lab_Data_New.Patient_Local_ID}

AND PREVIOUS({Case_Lab_Data_New.Condition})={Case_Lab_Data_New.Condition}

AND PREVIOUS({Case_Lab_Data_New.SeroType})={Case_Lab_Data_New.SeroType}

THEN "D"

ELSE ""

I'm not sure if I am giving you the answer you really want.

PGStats · Posted 08-29-2014 02:28 PM

Well exposed problem. This would do it. I gave the flag to every duplicate observation (not just the last one) :

data have;

infile datalines missover;

length PatientID InvestigationID Pathogen Serotype $20;

input PatientID & InvestigationID & Pathogen & Serotype & Year DtSpec :mmddyy.;

format dtSpec yymmdd10.;

datalines;

N11674036 CAS10828288TN01 E. COLI O157 2005 06/03/2005

N11674036 CAS10831498TN01 SALMONELLA ENTERITIDIS 2010

N11734242 CAS10856256TN01 SHIGELLA SONNEI 2007 01/05/2007

N11734242 CAS10865012TN01 SHIGELLA SONNEI 2007

N11770074 CAS10867154TN01 SALMONELLA NOT SEROTYPED 2013 03/15/2013

N11770074 CAS10867158TN01 SALMONELLA TYPHIMURIUM 2013 04/02/2013

N11770136 CAS10857544TN01 SALMONELLA JAVIANA 2009 10/02/2009

N11770136 CAS10857677TN01 SALMONELLA BRAENDERUP 2009 10/02/2009

N11770144 CAS10858088TN01 SALMONELLA TYPHIMURIUM 2006 05/01/2006

N11770144 CAS10856216TN01 SALMONELLA TYPHIMURIUM 2008 11/01/2008

N11770144 CAS10859003TN01 SALMONELLA TYPHIMURIUM 2008 12/31/2008

;

proc format;

value flag

0 = .

1 = "U"

2 = "M"

3 = "C"

4 = "D";

run;

proc sql;

create table want as

select

a.*,

max(

case

when a.pathogen=b.pathogen and

(a.serotype=b.serotype or a.serotype="NOT SEROTYPED" or b.serotype="NOT SEROTYPED") and

abs(intck("DAY",a.dtspec, b.dtspec)) between 1 and 30 then 4

when a.pathogen=b.pathogen and

(a.serotype=b.serotype or a.serotype="NOT SEROTYPED" or b.serotype="NOT SEROTYPED") and

abs(intck("DAY",a.dtspec, b.dtspec)) between 31 and 365 then 3

when a.pathogen=b.pathogen and a.dtspec is not missing and b.dtspec is not missing then 2

when (a.dtspec is missing or b.dtspec is missing) and a.year=b.year then 1

else 0

end) as flag format=flag.

from

have as a inner join

have as b on a.patientid=b.patientid and a.investigationid ne b.investigationid

group by a.patientid, a.investigationid, a.year, a.dtspec, a.pathogen, a.serotype

;

quit;

PG

jakarman · Posted 08-29-2014 02:43 PM

Will do, I see a Cartesian product selection, When the data is not too big and that is not to be expected.

---->-- ja karman --<-----

Need to flag records based on previous records with the same PatientID

Re: Need to flag records based on previous records with the same PatientID

Re: Need to flag records based on previous records with the same PatientID

Re: Need to flag records based on previous records with the same PatientID

Re: Need to flag records based on previous records with the same PatientID

Catch up on SAS Innovate 2026

Need to flag records based on previous records with the same PatientID

Re: Need to flag records based on previous records with the same PatientID

Re: Need to flag records based on previous records with the same PatientID

Re: Need to flag records based on previous records with the same PatientID

Re: Need to flag records based on previous records with the same PatientID

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away