Hi,
I posted this question yesterday and I am reframing my question again
The criteria is that there are two datasets one with placed times and the other for removal times.
The removal time has to be greater than the placed time and
The placed-removal times should be closest to each other
ALSO When we I merge I am getting repeats of BY variables
ID PLACED
101 06JUN2013:04:50:00
101 06JUN2013:04:57:00
101 11JUN2013:17:45:00
101 11JUN2013:19:38:00
102 18JUN2013:06:30:00
ID REMOVED
101 11JUN2013:13:35:00
101 15JUN2013:14:30:00
102 19JUN2013:22:40:00
102 27JUN2013:16:35:00
102 27JUN2013:17:59:00
WANT
ID PLACED REMOVED
101 11JUN2013:19:38:00 15JUN2013:14:30:00
102 18JUN2013:06:30:00 19JUN2013:22:40:00
Thanks
Indeed you can't ignore it, I had missed on that comment. SAS merge statement does not do cartesian product merging when you have many-to-many merges. If you search for MERGE in SAS help, you can find the following quote:
Note: The MERGE statement does not produce a
Cartesian product on a many-to-many match-merge. Instead it performs a
one-to-one merge while there are observations in the BY group in at least one
data set. When all observations in the BY group have been read from one data set
and there are still more observations in another data set, SAS performs a
one-to-many merge until all observations have been read for the BY group.
There are ways to do many-to-many merges through datasteps like using multidata enabled hash objects but the simple approach, again, is to fall back to proc sql that does a wonderful job.
proc sql;
create table merged as
select t1.id, t1.placed, t2.removed
from table1 as t1
full join
table2 as t2
on t1.id=t2.id;
quit;
SQL also allows to do merge on intermediately calculated variables or on variables with different names (not that adding a rename statement as a dataset option is very complicated) but it is somewhat more flexible than data step on quite a few things.
Plus, with SAS9.4 coming, you will be able to use SQL syntax in proc DS2 (datastep2 new syntax) to retrieve tables for the Set statement so it's definitely worth taking the time to learn.
Vincent
*edit SASKiwi's SQL syntax is equivalent to the above
An easy but not processing-efficient solution would be to use sql summary functions to only retain the record, from your merged dataset, such that removed-placed is minimal.
proc sql;
create table want as
select *
from merged
group by id
having min(removed-placed) and removed>placed;
quit;
Assuming your placed and removed columns are effectively datetime values and not character strings. If they're character strings, you can change
having min(removed-placed);
to
having min(input(removed, datetime18.)-input(placed, datetime18.)) and input(removed, datetime18.) GT input(placed, datetime18.);
or well, something similar (sometimes informats disapprove width and require datetime. informat or something similar).
Hi,
Thanks for the reply.
How can we acheive this with simple datasteps ????
thanks in advance
hrm I would be inclined to say take a stab at proc sql; it simplifies a lot of multiple step (spaghetti code) data syntax and improves readability. However, here's a 3 step approach that could achieve the same with data step syntax
/* step1 calculate the difference between removed and placed and put it in a new variable */
data merged;
set merged;
diff=removed-placed;
if dif>0 then output;
end;
/* Step 2, sort by id & diff so that the minimum diff (only positive ones leftover from step1) can be targetted with first.diff */
proc sort data=merged;
by id diff;
run;
/* Step 3, output only first.diff with by group processing into a new dataset */
data want;
set merged;
by id;
if first.diff then output;
run;
This should do more or less the same as the single step sql syntax proposed above.
Vincent
Hi
Thanks again.
But when we merge prior to the diff calculation we get message note: Merge has more than 2 datasets with repeats of BY variables??
Can we ignore it??????
Thanks
No you shouldn't ignore the message. The MERGE statement can only handle one-to-one and one-to-many joins, not many-to-many. Your merged data will drop the duplicate IDs from your second dataset (marked as follows in bold):
ID REMOVED
101 11JUN2013:13:35:00
101 15JUN2013:14:30:00
102 19JUN2013:22:40:00
102 27JUN2013:16:35:00
102 27JUN2013:17:59:00
Join your two datasets first using SQL to avoid this:
proc sql
create merged as
select *
from placed,
removed
where placed.id = removed.id
;
quit;
Indeed you can't ignore it, I had missed on that comment. SAS merge statement does not do cartesian product merging when you have many-to-many merges. If you search for MERGE in SAS help, you can find the following quote:
Note: The MERGE statement does not produce a
Cartesian product on a many-to-many match-merge. Instead it performs a
one-to-one merge while there are observations in the BY group in at least one
data set. When all observations in the BY group have been read from one data set
and there are still more observations in another data set, SAS performs a
one-to-many merge until all observations have been read for the BY group.
There are ways to do many-to-many merges through datasteps like using multidata enabled hash objects but the simple approach, again, is to fall back to proc sql that does a wonderful job.
proc sql;
create table merged as
select t1.id, t1.placed, t2.removed
from table1 as t1
full join
table2 as t2
on t1.id=t2.id;
quit;
SQL also allows to do merge on intermediately calculated variables or on variables with different names (not that adding a rename statement as a dataset option is very complicated) but it is somewhat more flexible than data step on quite a few things.
Plus, with SAS9.4 coming, you will be able to use SQL syntax in proc DS2 (datastep2 new syntax) to retrieve tables for the Set statement so it's definitely worth taking the time to learn.
Vincent
*edit SASKiwi's SQL syntax is equivalent to the above
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.