BookmarkSubscribeRSS Feed
vidya3
Calcite | Level 5
Hi,

Kindly help as this is very urgent.

There are two fields to be considered, date and time. If they are both duplicates across any 2 visits in a subject, then it must fire in the final output.

The issue here is, I tried self join but many to many join is occurring. For instance, for subject 101, Screening visit from table 1 (date = 30Jan2020 and time=11:15) is getting matched with Day 1 visit from table 2 (date = 30Jan2020 and time=11:15) which is the 1st row in the final output, but I'm also getting another 2nd row where Day 1 visit from table 1 (date = 30Jan2020 and time=11:15) is getting matched with Screening visit from table 2 (date = 30Jan2020 and time=11:15) . I'm expecting just one record whereas I'm getting vice versa scenarios, how do I avoid the vice versa scenario here and get just 1 record in the output?

Thanks in advance. Kindly help.
5 REPLIES 5
PeterClemmensen
Tourmaline | Level 20

Please post sample data. Makes it much easier to provide a usable code answer.

AMSAS
SAS Super FREQ

As @PeterClemmensen recommends please provide samples of both input datasets (table1 and table2), a sample of what you want in your output dataset based on the inputs, and the SAS code you have currently.

For example:

/* Create sample Table 1 data */
data work.table1 ;
	format 
		date1 date7. 
		time1 time5. ;
	infile cards ;
	input 
		subject1      $3.
		date1    : date7.
		time1    : time5. ;
	output 
		work.table1 ;
cards ;
101 30Jan20 11:15
201 31Jan20 10:15
;

/* Create sample Table 2 data */
data work.table2 ;
	format 
		date2 date7. 
		time2 time5. ;
	infile cards ;
	input 
		subject2      $3.
		date2    : date7.
		time2    : time5. ;
	output 
		work.table2 ;
cards ;
101 30Jan20 11:15
101 30Jan20 11:15
201 31Jan20 10:15
201 01Feb20 10:15
;

/* Create sample want data                        */
/* This is the resulting dataset I want           */
/* Based on the input datasets table1 and table 2 */
data work.want ;
	format 
		date date7. 
		time time5. ;
	infile cards ;
	input 
		subject      $3.
		date    : date7.
		time    : time5. ;
	output 
		work.want ;
cards ;
101 30Jan20 11:15
201 31Jan20 10:15
201 01Feb20 10:15
;

/* My code to generate the want dataset from the input datasets table1 and table2 */
proc sql ;
	create table 
		work.joined as
	select 
		table1.*,
		table2.*
	from
		table1, 
		table2 
	where
		table1.subject1=table2.subject2 and 
		table1.date1=table2.date2 and 
		table1.time1=table2.time2 
	;
		
quit ;
s_lassen
Meteorite | Level 14

If it is a self join, I assume you are joining on something like

a.patient_id=b.patient_id and
a.visit_time = b.visit_time and 
a.visit_text ne b.visit_text

If you do not want the output both ways, just put change the "not equal" to "greater than" or "smaller than", e.g.

a.patient_id=b.patient_id and
a.visit_time = b.visit_time and 
a.visit_text < b.visit_text

 

s_lassen
Meteorite | Level 14

Of course, if you have more than 2 visits on the same time, you will still get too many combinations. In that case, it may be simpler with a datastep, e.g.

proc sort data=visits;
  by patient_id visit_time;
run;

data duplicates;
  set visits;
  by patient_id visit_time;
  if not(first.visit_time and last.visit_time);
run;
A_Kh
Lapis Lazuli | Level 10

You can use first.variable/last.variable to flag duplicate records. Depending on the VISIT values, you can consider sorting order (eg. subjectid date time descending visit)

Eg:

data have;
	input subjectid $ visit $ date:date9. time:time5.;
	format date date9. time time5.;
	cards;
101 Screen	30Jan2020 11:15
101 Day1 30Jan2020 11:15
101 Week6 15Mar2020 12:15
101 Week12 01May2020 12:00
;
proc print;run;

proc sort data=have;
	by subjectid date time descending visit;
run; 

data want;
	set have;
	by subjectid date time;
	length flag $20;
	if not first.time then flag= 'duplicate date';
proc print;run; 

Capture.PNG 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 466 views
  • 0 likes
  • 5 in conversation