MERGE by group possible? - Page 2

astolz0 · Posted 10-22-2013 05:24 AM

Hi everyone,

attached you find a slightly larger sample which better explains what we are looking for (this test sample is very similar to the one posted before; however, we added four more id_groups).

The following code by FrederikE gives us exactly what we want, but it is hard to implement with thousands of id_groups in our full sample:

PROC SQL;

CREATE TABLE test2

AS SELECT *

FROM

(SELECT *, id_group AS id_group_new FROM test)
UNION

(SELECT a.*, b.id_group_new FROM test AS a

INNER JOIN

(SELECT *,id AS bid, id_group AS id_group_new FROM test WHERE id_group eq 10) AS b

ON b.date_consensus > a.date_forecast

AND b.date_consensus_minus90d <= a.date_forecast_confirmation

AND b.date_consensus_minus90d >= a.date_forecast

)

UNION

(SELECT c.*, d.id_group_new FROM test AS c

INNER JOIN

(SELECT *, id_group AS id_group_new FROM test WHERE id_group eq 100) AS d

ON d.date_consensus > c.date_forecast

AND d.date_consensus_minus90d <= c.date_forecast_confirmation

AND d.date_consensus_minus90d >= c.date_forecast

)

UNION

(SELECT e.*, f.id_group_new FROM test AS e

INNER JOIN

(SELECT *, id_group AS id_group_new FROM test WHERE id_group eq 1000) AS f

ON f.date_consensus > e.date_forecast

AND f.date_consensus_minus90d <= e.date_forecast_confirmation

AND f.date_consensus_minus90d >= e.date_forecast

)

UNION

(SELECT g.*, h.id_group_new FROM test AS g

INNER JOIN

(SELECT *, id_group AS id_group_new FROM test WHERE id_group eq 10000) AS h

ON h.date_consensus > g.date_forecast

AND h.date_consensus_minus90d <= g.date_forecast_confirmation

AND h.date_consensus_minus90d >= g.date_forecast

          )
     UNION
          (SELECT i.*, j.id_group_new FROM test AS i

INNER JOIN

(SELECT *, id_group AS id_group_new FROM test WHERE id_group eq 100000) AS j

ON j.date_consensus > i.date_forecast

AND j.date_consensus_minus90d <= i.date_forecast_confirmation

AND j.date_consensus_minus90d >= i.date_forecast

)

UNION

(SELECT k.*, l.id_group_new FROM test AS k

INNER JOIN

(SELECT *, id_group AS id_group_new FROM test WHERE id_group eq 1000000) AS l

ON l.date_consensus > k.date_forecast

AND l.date_consensus_minus90d <= k.date_forecast_confirmation

AND l.date_consensus_minus90d >= k.date_forecast

)

;

QUIT;

PROC SORT DATA=test2;

BY id_group_new date_forecast;

RUN;

DATA test2;

RETAIN

id id_group_new date_consensus date_forecast date_consensus_minus90d

date_forecast_confirmation eps_forecast;

SET test2;

RENAME id_group = id_group_old;

RUN;

--> RESULT:

Here, the code works fine as id = 1 is NOT merged to id_group = 1000000 because it does not fulfill

l.date_consensus_minus90d <= k.date_forecast_confirmation

--> 22/01/1998 > 16/01/1998

@Patrick: unfortunately, your code does not seem to account for that restriction?!

Regarding the thousands of id_groups in our full sample, we cannot implement the code above as it requires to manually type in all different id_group numbers.

Is there a possibility to combine the PROC SQL approach above with a loop?

Something like:

(we know this code does not work as it would overwrite test2 each time the loop runs; neither does the do command itself run; but hopefully you get the idea?!)

%MACRO loop;

PROC SQL;

%DO i = 1 %TO 1000000;

CREATE TABLE test2

AS SELECT *

FROM

(SELECT *, id_group AS id_group_new FROM test)
UNION

(SELECT a.*, b.id_group_new FROM test AS a

INNER JOIN

(SELECT *,id AS bid, id_group AS id_group_new FROM test WHERE id_group eq &i.) AS b

ON b.date_consensus > a.date_forecast

AND b.date_consensus_minus90d <= a.date_forecast_confirmation

AND b.date_consensus_minus90d >= a.date_forecast

)

%END;

;

QUIT;

%MEND;

Thank you all very much!

Regards,

Alex and Niklas

astolz0 · Posted 10-25-2013 04:25 AM

Any ideas?

Thank you!

Mit · Posted 10-20-2013 09:52 PM

The logic you are using is different than what you want.

PROCSQL; There should be a space in between PROC SQL;
CREATE TABLE test2
AS SELECT a.*

FROM test a
LEFT JOIN test b
       ON       a.date_consensus                    >      b.date_forecast

AND a.date_forecast_minus90d <= b.date_forecast_confirmation

AND a.date_forecast_minus90d >= b.date_forecast

GROUP BY identgroup;

QUIT;

Please check the logic.

1. resulting date_consensus ( 07/10/1997) not greater than date_forecast ( 07/10/1997)

2. date_forecast_minus90d (09/07/1997) le date_forecast_confirmation (16/01/1998) -This is correct

3. resulting date_forecast_minus90d (09/07/1997) not greater than date_forecast (07/10/1997)

Also there is no by group as identgroup in your data.

Please post that data in usable format so that the time is saved.

astolz0 · Posted 10-21-2013 05:40 AM

Hi Mit,

that was just a typo in here. Of course, there is a space in PROC SORT and I meant id_group not ident_group.

Thanks!

Re: MERGE by group possible?

Re: MERGE by group possible?

Re: MERGE by group possible?

Re: MERGE by group possible?

Re: MERGE by group possible?

Re: MERGE by group possible?

Re: MERGE by group possible?

Re: MERGE by group possible?

Click image to register for webinar

Classroom Training Available!