Help using Base SAS procedures

Merge

Accepted Solution Solved
Reply
Super Contributor
Posts: 1,041
Accepted Solution

Merge

Hi,

I posted this question yesterday and I am reframing my question again

The criteria is that there are two datasets one with placed times and the other for removal times.

The removal time has to be greater than the placed time and

The placed-removal times should be closest to each other

ALSO When we I merge I am getting repeats of BY variables

ID        PLACED

101   06JUN2013:04:50:00
101   06JUN2013:04:57:00
101   11JUN2013:17:45:00
101   11JUN2013:19:38:00
102   18JUN2013:06:30:00


ID       REMOVED
101    11JUN2013:13:35:00
101    15JUN2013:14:30:00
102    19JUN2013:22:40:00
102    27JUN2013:16:35:00
102    27JUN2013:17:59:00


WANT

ID    PLACED                 REMOVED
101  11JUN2013:19:38:00     15JUN2013:14:30:00
102  18JUN2013:06:30:00     19JUN2013:22:40:00

Thanks


Accepted Solutions
Solution
‎08-08-2013 04:02 PM
Super Contributor
Posts: 339

Re: Merge

Posted in reply to robertrao

Indeed you can't ignore it, I had missed on that comment. SAS merge statement does not do cartesian product merging when you have many-to-many merges. If you search for MERGE in SAS help, you can find the following quote:

Note:   The MERGE statement does not produce a

Cartesian product on a many-to-many match-merge. Instead it performs a

one-to-one merge while there are observations in the BY group in at least one

data set. When all observations in the BY group have been read from one data set

and there are still more observations in another data set, SAS performs a

one-to-many merge until all observations have been read for the BY group.

There are ways to do many-to-many merges through datasteps like using multidata enabled hash objects but the simple approach, again, is to fall back to proc sql that does a wonderful job.

proc sql;

     create table merged as

     select t1.id, t1.placed, t2.removed

     from table1 as t1

     full join

     table2 as t2

     on t1.id=t2.id;

quit;

SQL also allows to do merge on intermediately calculated variables or on variables with different names (not that adding a rename statement as a dataset option is very complicated) but it is somewhat more flexible than data step on quite a few things.

Plus, with SAS9.4 coming, you will be able to use SQL syntax in proc DS2 (datastep2 new syntax) to retrieve tables for the Set statement so it's definitely worth taking the time to learn.

Vincent

*edit SASKiwi's SQL syntax is equivalent to the above

View solution in original post


All Replies
Super Contributor
Posts: 339

Re: Merge

Posted in reply to robertrao


An easy but not processing-efficient solution would be to use sql summary functions to only retain the record, from your merged dataset, such that removed-placed is minimal.

proc sql;

     create table want as

     select *

     from merged

     group by id

     having min(removed-placed) and removed>placed;

quit;

Assuming your placed and removed columns are effectively datetime values and not character strings. If they're character strings, you can change

having min(removed-placed);

to

having min(input(removed, datetime18.)-input(placed, datetime18.)) and input(removed, datetime18.) GT input(placed, datetime18.);

or well, something similar (sometimes informats disapprove width and require datetime. informat or something similar).

Super Contributor
Posts: 1,041

Re: Merge

Posted in reply to Vince28_Statcan

Hi,

Thanks for the reply.

How can we acheive this with simple datasteps ????

thanks in advance

Super Contributor
Posts: 339

Re: Merge

Posted in reply to robertrao

hrm I would be inclined to say take a stab at proc sql; it simplifies a lot of multiple step (spaghetti code) data syntax and improves readability. However, here's a 3 step approach that could achieve the same with data step syntax

/* step1 calculate the difference between removed and placed and put it in a new variable */

data merged;

     set merged;

     diff=removed-placed;

     if dif>0 then output;

end;

/* Step 2, sort by id & diff so that the minimum diff (only positive ones leftover from step1) can be targetted with first.diff */

proc sort data=merged;

     by id diff;

run;

/* Step 3, output only first.diff with by group processing into a new dataset */

data want;

     set merged;

     by id;

     if first.diff then output;

run;

This should do more or less the same as the single step sql syntax proposed above.

Vincent

Super Contributor
Posts: 1,041

Re: Merge

Posted in reply to Vince28_Statcan

Hi

Thanks again.

But when we merge prior to the diff calculation we get message note: Merge has more than 2 datasets with repeats of BY variables??

Can we ignore it??????

Thanks

Super User
Posts: 3,250

Re: Merge

Posted in reply to robertrao

No you shouldn't ignore the message. The MERGE statement can only handle one-to-one and one-to-many joins, not many-to-many. Your merged data will drop the duplicate IDs from your second dataset (marked as follows in bold):

ID       REMOVED

101    11JUN2013:13:35:00

101    15JUN2013:14:30:00

102    19JUN2013:22:40:00

102    27JUN2013:16:35:00

102    27JUN2013:17:59:00

Join your two datasets first using SQL to avoid this:

proc sql

  create merged as

  select *

  from placed,

         removed

where placed.id = removed.id

;

quit;

Solution
‎08-08-2013 04:02 PM
Super Contributor
Posts: 339

Re: Merge

Posted in reply to robertrao

Indeed you can't ignore it, I had missed on that comment. SAS merge statement does not do cartesian product merging when you have many-to-many merges. If you search for MERGE in SAS help, you can find the following quote:

Note:   The MERGE statement does not produce a

Cartesian product on a many-to-many match-merge. Instead it performs a

one-to-one merge while there are observations in the BY group in at least one

data set. When all observations in the BY group have been read from one data set

and there are still more observations in another data set, SAS performs a

one-to-many merge until all observations have been read for the BY group.

There are ways to do many-to-many merges through datasteps like using multidata enabled hash objects but the simple approach, again, is to fall back to proc sql that does a wonderful job.

proc sql;

     create table merged as

     select t1.id, t1.placed, t2.removed

     from table1 as t1

     full join

     table2 as t2

     on t1.id=t2.id;

quit;

SQL also allows to do merge on intermediately calculated variables or on variables with different names (not that adding a rename statement as a dataset option is very complicated) but it is somewhat more flexible than data step on quite a few things.

Plus, with SAS9.4 coming, you will be able to use SQL syntax in proc DS2 (datastep2 new syntax) to retrieve tables for the Set statement so it's definitely worth taking the time to learn.

Vincent

*edit SASKiwi's SQL syntax is equivalent to the above

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 250 views
  • 0 likes
  • 3 in conversation