BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
djbateman
Lapis Lazuli | Level 10

One of my teammates brought an issue to me that I am unable to resolve.  I hope someone here has some experience with this.

 

We are trying to reconcile a clinical database with a safety database.  We have Subject ID, AE Term, and AE Start Date in both databases.  If they match, they are considered the same record, and then we can compare that other information matches (like relationship, actions taken, seriousness, etc.).  The problem is that they also want to reconcile the start date.  I don't know how we can have the start date be a key ID variable and a comparison check at the same time.  Has anyone come across this situation before?

 

Here is an example.  We have a subject that has the same AE on two separate occasions.  If I merge by subject, AE, and start date, they line up just fine, and there is no discrepancy detected:

 

proc sql;
	create table CLIN_AE (SUBJID char(11), AEDECOD char(100), AESTDAT num format=date9.);
		insert into clin_ae (subjid, aedecod, aestdat)
			values ('101-001-001','Rash','30MAY2025'd)
			values ('101-001-001','Rash','19JUN2025'd);
	create table SAFE_AE (USUBJID char(11), DECOD char(100), STDTC num format=date9.);
		insert into safe_ae (usubjid, decod, stdtc)
			values ('101-001-001','Rash','30MAY2025'd)
			values ('101-001-001','Rash','19JUN2025'd);

	create table compare as
		select a.*, b.*
		from clin_ae as a full join safe_ae as b
		on a.subjid=b.usubjid & a.aedecod=b.decod & a.aestdat=b.stdtc;
quit;

data compare;
	set compare;
	if aestdat^=stdtc then DISCREP='Start Dates not Matching';
run;

 

Now, suppose that one of the dates was off by a day.  They are technically the same event, but a merge will not consider it as such.  It will still properly flag the discrepancy, but it duplicates it, and there is no way to know that they are the same event with a date discrepancy:

 

proc sql;
	create table CLIN_AE (SUBJID char(11), AEDECOD char(100), AESTDAT num format=date9.);
		insert into clin_ae (subjid, aedecod, aestdat)
			values ('101-001-001','Rash','30MAY2025'd)
			values ('101-001-001','Rash','18JUN2025'd);
	create table SAFE_AE (USUBJID char(11), DECOD char(100), STDTC num format=date9.);
		insert into safe_ae (usubjid, decod, stdtc)
			values ('101-001-001','Rash','30MAY2025'd)
			values ('101-001-001','Rash','19JUN2025'd);

	create table compare as
		select a.*, b.*
		from clin_ae as a full join safe_ae as b
		on a.subjid=b.usubjid & a.aedecod=b.decod & a.aestdat=b.stdtc;
quit;

data compare;
	set compare;
	if aestdat^=stdtc then DISCREP='Start Dates not Matching';
run;

Is there any way to accommodate this?  I know this can't be a new problem.  I just can't imagine how to use a key ID variable as a comparison variable at the same time.

1 ACCEPTED SOLUTION

Accepted Solutions
Quentin
Super User

I think you may be entering the world of fuzzy matching / fuzzy merging. 

 

For example, if you want to join any records  where the dates are within one day of each other, you could replace :

 

on a.subjid=b.usubjid & a.aedecod=b.decod & a.aestdat=b.stdtc;

with:

on a.subjid=b.usubjid & a.aedecod=b.decod & abs(a.aestdat-b.stdtc)<=1;

But then you might have to deal with one-to-many matches, or even many-to-many.

 

You might want to think of this as a data cleaning step, where first you find all the exact matches, then look for any "off-by-one" errors.

 

If you search lexjansen.com for "fuzzy match" and "fuzzy merge" you'll likely find some helpful papers.

View solution in original post

2 REPLIES 2
Quentin
Super User

I think you may be entering the world of fuzzy matching / fuzzy merging. 

 

For example, if you want to join any records  where the dates are within one day of each other, you could replace :

 

on a.subjid=b.usubjid & a.aedecod=b.decod & a.aestdat=b.stdtc;

with:

on a.subjid=b.usubjid & a.aedecod=b.decod & abs(a.aestdat-b.stdtc)<=1;

But then you might have to deal with one-to-many matches, or even many-to-many.

 

You might want to think of this as a data cleaning step, where first you find all the exact matches, then look for any "off-by-one" errors.

 

If you search lexjansen.com for "fuzzy match" and "fuzzy merge" you'll likely find some helpful papers.

Ksharp
Super User

For this scenario , you could match this condition by operator BETWEEN AND.

 

proc sql;
	create table CLIN_AE (SUBJID char(11), AEDECOD char(100), AESTDAT num format=date9.);
		insert into clin_ae (subjid, aedecod, aestdat)
			values ('101-001-001','Rash','30MAY2025'd)
			values ('101-001-001','Rash','18JUN2025'd);
	create table SAFE_AE (USUBJID char(11), DECOD char(100), STDTC num format=date9.);
		insert into safe_ae (usubjid, decod, stdtc)
			values ('101-001-001','Rash','30MAY2025'd)
			values ('101-001-001','Rash','19JUN2025'd);

	create table compare as
		select a.*, b.*
		from clin_ae as a full join safe_ae as b
		on a.subjid=b.usubjid & a.aedecod=b.decod & b.stdtc between a.aestdat and a.aestdat+1;
quit;

data compare;
	set compare;
	if aestdat^=stdtc then DISCREP='Start Dates not Matching';
run;

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 271 views
  • 2 likes
  • 3 in conversation