- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Supposed I have two SAS datasets - call these JOBS and DATA. They have two common variables TIME and NAME.
I sort JOBS and DATA by TIME and then NAME.
So, I code -
Data Combined;
Merge JOBS DATA;
by TIME NAME;
Run;
This assumes that the TIME matches exactly for both JOBS and DATA.
My problem is that TIME for both JOBS and DATA are not "EQUAL" - but can be off by up to 10 seconds, let's say.
There is another variable in both datasets (NAME) - that can be used to match as well.
For example - in JOBS -
TIME NAME PET
120005 ABC CAT
120007 ABD DOG
120403 ABC COW
In DATA -
TIME NAME ADDRESS
120003 ABC London
120008 ABD Paris
120400 ABC Warsaw
Combined output -
TIME NAME PET ADDRESS
120005 ABC CAT London
120007 ABD DOG Paris
120403 ABC COW Warsaw
Is there a way to do this?
Thank you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Many ways to do this but here is one 🙂
data JOBS;
input TIME NAME $ PET $;
datalines;
120005 ABC CAT
120007 ABD DOG
120403 ABC COW
;
data DATA;
input TIME NAME $ ADRESS $;
datalines;
120003 ABC London
120008 ABD Paris
120400 ABC Warsaw
;
proc sort data = JOBS;by TIME NAME;run;
proc sort data = DATA;by TIME NAME;run;
data want;
format TIME NAME PET ADRESS;
merge JOBS(rename=(TIME=JOBS_TIME NAME = JOBS_NAME))
DATA(rename=(TIME=DATA_TIME));
if (JOBS_TIME - DATA_TIME) <= 10 then do;
TIME = JOBS_TIME;
if NAME = JOBS_NAME then output;
end;
else if (DATA_TIME - JOBS_TIME) <= 10 then do;
TIME = DATA_TIME;
if NAME = JOBS_NAME then output;
end;
drop DATA_TIME JOBS_TIME JOBS_NAME;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you, very much.
I will test it out.
TS
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Just edited the code a bit 🙂 Test that instead
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The question here really is what is going to be your fixed logic. For instance if you have several rows which are all within 10 seconds of multiple timepoints, what then, is it a many to many merge it that case, i.e. you clause would be:
from HAVE1 A
full join HAVE2 B
on A.NAME=B.NAME
and A.TIME-10 <= B.TIME <= A.TIME+10
However if you want 1 row per item, tehn it becomes more difficult, would you take the first time occuring within that group in the window, or the last/mean etc.In your example, 120005 is within 10 seconds of both 120003 and 120008. You seem to take the lowest, but do confirm. If so, then maybe something like:
proc sql; create table WANT as select A.*, B.ADDRESS from HAVE1 A left join (select NAME,ADDRESS from (select * from HAVE2 where TIME-10 <= A.TIME <= TIME+10) group by NAME,ADDRESS having TIME=min(TIME)) B on A.NAME=B.NAME; quit;
Also, Post test data in the form of a datastep!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data JOBS;
input TIME NAME $ PET $;
datalines;
120005 ABC CAT
120007 ABD DOG
120403 ABC COW
;
data DATA;
input TIME NAME $ ADRESS $;
datalines;
120003 ABC London
120008 ABD Paris
120400 ABC Warsaw
;
proc sql;
select a.*,b.adress
from jobs as a left jion data as b
on a.name=b.name and b.time between a.time-10 and a.time+10;
quit;