BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
robertrao
Quartz | Level 8

Hi,

I posted this question yesterday and I am reframing my question again

The criteria is that there are two datasets one with placed times and the other for removal times.

The removal time has to be greater than the placed time and

The placed-removal times should be closest to each other

ALSO When we I merge I am getting repeats of BY variables

ID        PLACED

101   06JUN2013:04:50:00
101   06JUN2013:04:57:00
101   11JUN2013:17:45:00
101   11JUN2013:19:38:00
102   18JUN2013:06:30:00


ID       REMOVED
101    11JUN2013:13:35:00
101    15JUN2013:14:30:00
102    19JUN2013:22:40:00
102    27JUN2013:16:35:00
102    27JUN2013:17:59:00


WANT

ID    PLACED                 REMOVED
101  11JUN2013:19:38:00     15JUN2013:14:30:00
102  18JUN2013:06:30:00     19JUN2013:22:40:00

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
Vince28_Statcan
Quartz | Level 8

Indeed you can't ignore it, I had missed on that comment. SAS merge statement does not do cartesian product merging when you have many-to-many merges. If you search for MERGE in SAS help, you can find the following quote:

Note:   The MERGE statement does not produce a

Cartesian product on a many-to-many match-merge. Instead it performs a

one-to-one merge while there are observations in the BY group in at least one

data set. When all observations in the BY group have been read from one data set

and there are still more observations in another data set, SAS performs a

one-to-many merge until all observations have been read for the BY group.

There are ways to do many-to-many merges through datasteps like using multidata enabled hash objects but the simple approach, again, is to fall back to proc sql that does a wonderful job.

proc sql;

     create table merged as

     select t1.id, t1.placed, t2.removed

     from table1 as t1

     full join

     table2 as t2

     on t1.id=t2.id;

quit;

SQL also allows to do merge on intermediately calculated variables or on variables with different names (not that adding a rename statement as a dataset option is very complicated) but it is somewhat more flexible than data step on quite a few things.

Plus, with SAS9.4 coming, you will be able to use SQL syntax in proc DS2 (datastep2 new syntax) to retrieve tables for the Set statement so it's definitely worth taking the time to learn.

Vincent

*edit SASKiwi's SQL syntax is equivalent to the above

View solution in original post

6 REPLIES 6
Vince28_Statcan
Quartz | Level 8


An easy but not processing-efficient solution would be to use sql summary functions to only retain the record, from your merged dataset, such that removed-placed is minimal.

proc sql;

     create table want as

     select *

     from merged

     group by id

     having min(removed-placed) and removed>placed;

quit;

Assuming your placed and removed columns are effectively datetime values and not character strings. If they're character strings, you can change

having min(removed-placed);

to

having min(input(removed, datetime18.)-input(placed, datetime18.)) and input(removed, datetime18.) GT input(placed, datetime18.);

or well, something similar (sometimes informats disapprove width and require datetime. informat or something similar).

robertrao
Quartz | Level 8

Hi,

Thanks for the reply.

How can we acheive this with simple datasteps ????

thanks in advance

Vince28_Statcan
Quartz | Level 8

hrm I would be inclined to say take a stab at proc sql; it simplifies a lot of multiple step (spaghetti code) data syntax and improves readability. However, here's a 3 step approach that could achieve the same with data step syntax

/* step1 calculate the difference between removed and placed and put it in a new variable */

data merged;

     set merged;

     diff=removed-placed;

     if dif>0 then output;

end;

/* Step 2, sort by id & diff so that the minimum diff (only positive ones leftover from step1) can be targetted with first.diff */

proc sort data=merged;

     by id diff;

run;

/* Step 3, output only first.diff with by group processing into a new dataset */

data want;

     set merged;

     by id;

     if first.diff then output;

run;

This should do more or less the same as the single step sql syntax proposed above.

Vincent

robertrao
Quartz | Level 8

Hi

Thanks again.

But when we merge prior to the diff calculation we get message note: Merge has more than 2 datasets with repeats of BY variables??

Can we ignore it??????

Thanks

SASKiwi
PROC Star

No you shouldn't ignore the message. The MERGE statement can only handle one-to-one and one-to-many joins, not many-to-many. Your merged data will drop the duplicate IDs from your second dataset (marked as follows in bold):

ID       REMOVED

101    11JUN2013:13:35:00

101    15JUN2013:14:30:00

102    19JUN2013:22:40:00

102    27JUN2013:16:35:00

102    27JUN2013:17:59:00

Join your two datasets first using SQL to avoid this:

proc sql

  create merged as

  select *

  from placed,

         removed

where placed.id = removed.id

;

quit;

Vince28_Statcan
Quartz | Level 8

Indeed you can't ignore it, I had missed on that comment. SAS merge statement does not do cartesian product merging when you have many-to-many merges. If you search for MERGE in SAS help, you can find the following quote:

Note:   The MERGE statement does not produce a

Cartesian product on a many-to-many match-merge. Instead it performs a

one-to-one merge while there are observations in the BY group in at least one

data set. When all observations in the BY group have been read from one data set

and there are still more observations in another data set, SAS performs a

one-to-many merge until all observations have been read for the BY group.

There are ways to do many-to-many merges through datasteps like using multidata enabled hash objects but the simple approach, again, is to fall back to proc sql that does a wonderful job.

proc sql;

     create table merged as

     select t1.id, t1.placed, t2.removed

     from table1 as t1

     full join

     table2 as t2

     on t1.id=t2.id;

quit;

SQL also allows to do merge on intermediately calculated variables or on variables with different names (not that adding a rename statement as a dataset option is very complicated) but it is somewhat more flexible than data step on quite a few things.

Plus, with SAS9.4 coming, you will be able to use SQL syntax in proc DS2 (datastep2 new syntax) to retrieve tables for the Set statement so it's definitely worth taking the time to learn.

Vincent

*edit SASKiwi's SQL syntax is equivalent to the above

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1310 views
  • 0 likes
  • 3 in conversation