BookmarkSubscribeRSS Feed
TingSern
Obsidian | Level 7

Hi,

 

Supposed I have two SAS datasets - call these JOBS and DATA.  They have two common variables TIME and NAME.

 

I sort JOBS and DATA by TIME and then NAME.

 

So, I code -

 

Data Combined;

  Merge JOBS DATA;

  by TIME NAME;

Run;

 

This assumes that the TIME matches exactly for both JOBS and DATA.

 

My problem is that TIME for both JOBS and DATA are not "EQUAL" - but can be off by up to 10 seconds, let's say.

 

There is another variable in both datasets (NAME) - that can be used to match as well. 

 

For example - in JOBS -

TIME       NAME    PET

120005   ABC       CAT

120007   ABD       DOG

120403   ABC       COW

 

In DATA -

TIME        NAME     ADDRESS

120003    ABC        London

120008    ABD        Paris

120400    ABC        Warsaw

 

Combined output -

 

TIME        NAME      PET           ADDRESS

120005    ABC         CAT           London

120007    ABD         DOG          Paris

120403    ABC         COW          Warsaw

 

Is there a way to do this? 

 

Thank you.

5 REPLIES 5
PeterClemmensen
Tourmaline | Level 20

Many ways to do this but here is one 🙂

 

data JOBS;
input TIME NAME $ PET $;
datalines;
120005 ABC CAT
120007 ABD DOG
120403 ABC COW
;

data DATA;
input TIME NAME $ ADRESS $;
datalines;
120003 ABC London
120008 ABD Paris
120400 ABC Warsaw
;


proc sort data = JOBS;by TIME NAME;run;
proc sort data = DATA;by TIME NAME;run;

data want;
	format TIME NAME PET ADRESS;
	merge JOBS(rename=(TIME=JOBS_TIME NAME = JOBS_NAME)) 
		  DATA(rename=(TIME=DATA_TIME));

	if (JOBS_TIME - DATA_TIME) <= 10 then do;
		TIME = JOBS_TIME;
		if NAME = JOBS_NAME then output;
	end;
	else if (DATA_TIME - JOBS_TIME) <= 10 then do;
		TIME = DATA_TIME;
		if NAME = JOBS_NAME then output;
	end;
	
	drop DATA_TIME JOBS_TIME JOBS_NAME;
run;
TingSern
Obsidian | Level 7

Thank you, very much. 

 

I will test it out. 

 

TS

PeterClemmensen
Tourmaline | Level 20

Just edited the code a bit 🙂 Test that instead

RW9
Diamond | Level 26 RW9
Diamond | Level 26

The question here really is what is going to be your fixed logic.  For instance if you have several rows which are all within 10 seconds of multiple timepoints, what then, is it a many to many merge it that case, i.e. you clause would be:

from HAVE1 A

full join HAVE2 B

on    A.NAME=B.NAME

and  A.TIME-10 <= B.TIME <= A.TIME+10

 

However if you want 1 row per item, tehn it becomes more difficult, would you take the first time occuring within that group in the window, or the last/mean etc.In your example, 120005 is within 10 seconds of both 120003 and 120008.  You seem to take the lowest, but do confirm.  If so, then maybe something like:

proc sql;
  create table WANT as
  select A.*,
         B.ADDRESS
  from   HAVE1 A
  left join (select NAME,ADDRESS from (select * from HAVE2 where TIME-10 <= A.TIME <= TIME+10) group by NAME,ADDRESS having TIME=min(TIME)) B
  on     A.NAME=B.NAME;
quit;
           

Also, Post test data in the form of a datastep!

Ksharp
Super User

data JOBS;
input TIME NAME $ PET $;
datalines;
120005 ABC CAT
120007 ABD DOG
120403 ABC COW
;

data DATA;
input TIME NAME $ ADRESS $;
datalines;
120003 ABC London
120008 ABD Paris
120400 ABC Warsaw
;

proc sql;
select a.*,b.adress
 from jobs as a left jion data as b
  on a.name=b.name and b.time between a.time-10 and a.time+10;
quit;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1772 views
  • 0 likes
  • 4 in conversation