BookmarkSubscribeRSS Feed
vidya3
Calcite | Level 5
Hi,

Kindly help as this is very urgent.

There are two fields to be considered, date and time. If they are both duplicates across any 2 visits in a subject, then it must fire in the final output.

The issue here is, I tried self join but many to many join is occurring. For instance, for subject 101, Screening visit from table 1 (date = 30Jan2020 and time=11:15) is getting matched with Day 1 visit from table 2 (date = 30Jan2020 and time=11:15) which is the 1st row in the final output, but I'm also getting another 2nd row where Day 1 visit from table 1 (date = 30Jan2020 and time=11:15) is getting matched with Screening visit from table 2 (date = 30Jan2020 and time=11:15) . I'm expecting just one record whereas I'm getting vice versa scenarios, how do I avoid the vice versa scenario here and get just 1 record in the output?

Thanks in advance. Kindly help.
5 REPLIES 5
PeterClemmensen
Super User

Please post sample data. Makes it much easier to provide a usable code answer.

AMSAS
SAS Super FREQ

As @PeterClemmensen recommends please provide samples of both input datasets (table1 and table2), a sample of what you want in your output dataset based on the inputs, and the SAS code you have currently.

For example:

/* Create sample Table 1 data */
data work.table1 ;
	format 
		date1 date7. 
		time1 time5. ;
	infile cards ;
	input 
		subject1      $3.
		date1    : date7.
		time1    : time5. ;
	output 
		work.table1 ;
cards ;
101 30Jan20 11:15
201 31Jan20 10:15
;

/* Create sample Table 2 data */
data work.table2 ;
	format 
		date2 date7. 
		time2 time5. ;
	infile cards ;
	input 
		subject2      $3.
		date2    : date7.
		time2    : time5. ;
	output 
		work.table2 ;
cards ;
101 30Jan20 11:15
101 30Jan20 11:15
201 31Jan20 10:15
201 01Feb20 10:15
;

/* Create sample want data                        */
/* This is the resulting dataset I want           */
/* Based on the input datasets table1 and table 2 */
data work.want ;
	format 
		date date7. 
		time time5. ;
	infile cards ;
	input 
		subject      $3.
		date    : date7.
		time    : time5. ;
	output 
		work.want ;
cards ;
101 30Jan20 11:15
201 31Jan20 10:15
201 01Feb20 10:15
;

/* My code to generate the want dataset from the input datasets table1 and table2 */
proc sql ;
	create table 
		work.joined as
	select 
		table1.*,
		table2.*
	from
		table1, 
		table2 
	where
		table1.subject1=table2.subject2 and 
		table1.date1=table2.date2 and 
		table1.time1=table2.time2 
	;
		
quit ;
s_lassen
Meteorite | Level 14

If it is a self join, I assume you are joining on something like

a.patient_id=b.patient_id and
a.visit_time = b.visit_time and 
a.visit_text ne b.visit_text

If you do not want the output both ways, just put change the "not equal" to "greater than" or "smaller than", e.g.

a.patient_id=b.patient_id and
a.visit_time = b.visit_time and 
a.visit_text < b.visit_text

 

s_lassen
Meteorite | Level 14

Of course, if you have more than 2 visits on the same time, you will still get too many combinations. In that case, it may be simpler with a datastep, e.g.

proc sort data=visits;
  by patient_id visit_time;
run;

data duplicates;
  set visits;
  by patient_id visit_time;
  if not(first.visit_time and last.visit_time);
run;
A_Kh
Lapis Lazuli | Level 10

You can use first.variable/last.variable to flag duplicate records. Depending on the VISIT values, you can consider sorting order (eg. subjectid date time descending visit)

Eg:

data have;
	input subjectid $ visit $ date:date9. time:time5.;
	format date date9. time time5.;
	cards;
101 Screen	30Jan2020 11:15
101 Day1 30Jan2020 11:15
101 Week6 15Mar2020 12:15
101 Week12 01May2020 12:00
;
proc print;run;

proc sort data=have;
	by subjectid date time descending visit;
run; 

data want;
	set have;
	by subjectid date time;
	length flag $20;
	if not first.time then flag= 'duplicate date';
proc print;run; 

Capture.PNG 

SAS INNOVATE 2024

Innovate_SAS_Blue.png

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Get the $99 certification deal.jpg

 

 

Back in the Classroom!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 228 views
  • 0 likes
  • 5 in conversation