BookmarkSubscribeRSS Feed
astolz0
Calcite | Level 5

Hi everyone,

attached you find a slightly larger sample which better explains what we are looking for (this test sample is very similar to the one posted before; however, we added four more id_groups).

Test.jpg

The following code by FrederikE gives us exactly what we want, but it is hard to implement with thousands of id_groups in our full sample:

PROC SQL;

     CREATE TABLE test2

     AS SELECT *

     FROM

          (SELECT *, id_group AS id_group_new FROM test)
     UNION

          (SELECT a.*, b.id_group_new FROM test AS a

          INNER JOIN

          (SELECT *,id AS bid, id_group AS id_group_new FROM test WHERE id_group eq 10) AS b

          ON     b.date_consensus               >     a.date_forecast         

          AND    b.date_consensus_minus90d     <=     a.date_forecast_confirmation

          AND    b.date_consensus_minus90d     >=     a.date_forecast

          )

     UNION

          (SELECT c.*, d.id_group_new FROM test AS c

          INNER JOIN

          (SELECT *, id_group AS id_group_new FROM test WHERE id_group eq 100) AS d

          ON     d.date_consensus               >     c.date_forecast         

          AND    d.date_consensus_minus90d     <=     c.date_forecast_confirmation

          AND    d.date_consensus_minus90d     >=     c.date_forecast

          )

     UNION

          (SELECT e.*, f.id_group_new FROM test AS e

          INNER JOIN

          (SELECT *, id_group AS id_group_new FROM test WHERE id_group eq 1000) AS f

          ON     f.date_consensus               >     e.date_forecast         

          AND    f.date_consensus_minus90d     <=     e.date_forecast_confirmation

          AND    f.date_consensus_minus90d     >=     e.date_forecast

          )

     UNION

          (SELECT g.*, h.id_group_new FROM test AS g

          INNER JOIN

          (SELECT *, id_group AS id_group_new FROM test WHERE id_group eq 10000) AS h

          ON     h.date_consensus               >     g.date_forecast         

          AND    h.date_consensus_minus90d     <=     g.date_forecast_confirmation

          AND    h.date_consensus_minus90d     >=     g.date_forecast

          )
     UNION
          (SELECT i.*, j.id_group_new FROM test AS i

          INNER JOIN

          (SELECT *, id_group AS id_group_new FROM test WHERE id_group eq 100000) AS j

          ON      j.date_consensus              >     i.date_forecast         

          AND     j.date_consensus_minus90d    <=     i.date_forecast_confirmation

          AND     j.date_consensus_minus90d    >=     i.date_forecast

          )

     UNION

          (SELECT k.*, l.id_group_new FROM test AS k

          INNER JOIN

          (SELECT *, id_group AS id_group_new FROM test WHERE id_group eq 1000000) AS l

          ON      l.date_consensus              >     k.date_forecast         

          AND     l.date_consensus_minus90d    <=     k.date_forecast_confirmation

          AND     l.date_consensus_minus90d    >=     k.date_forecast

          )

         ;

     QUIT;


     PROC SORT DATA=test2;

          BY id_group_new date_forecast;

     RUN;

     DATA test2;

          RETAIN   

          id id_group_new date_consensus date_forecast date_consensus_minus90d 

          date_forecast_confirmation eps_forecast;

          SET test2;

          RENAME id_group = id_group_old;

     RUN;

--> RESULT:

Test2.jpg

    

Here, the code works fine as id = 1 is NOT merged to id_group = 1000000 because it does not fulfill

                     

     l.date_consensus_minus90d <= k.date_forecast_confirmation

     --> 22/01/1998 > 16/01/1998

@Patrick: unfortunately, your code does not seem to account for that restriction?!

Regarding the thousands of id_groups in our full sample, we cannot implement the code above as it requires to manually type in all different id_group numbers.

Is there a possibility to combine the PROC SQL approach above with a loop?

Something like:

(we know this code does not work as it would overwrite test2 each time the loop runs; neither does the do command itself run; but hopefully you get the idea?!)

%MACRO loop;

PROC SQL;

     %DO i = 1 %TO 1000000;

     CREATE TABLE test2

     AS SELECT *

     FROM

          (SELECT *, id_group AS id_group_new FROM test)
     UNION

          (SELECT a.*, b.id_group_new FROM test AS a

          INNER JOIN

          (SELECT *,id AS bid, id_group AS id_group_new FROM test WHERE id_group eq &i.) AS b

          ON     b.date_consensus               >     a.date_forecast         

          AND    b.date_consensus_minus90d     <=     a.date_forecast_confirmation

          AND    b.date_consensus_minus90d     >=     a.date_forecast

          )

     %END;

;

QUIT;

%MEND;

Thank you all very much!

Regards,

Alex and Niklas

         

astolz0
Calcite | Level 5

Any ideas?

Thank you!

Mit
Calcite | Level 5 Mit
Calcite | Level 5

The logic you are using is different than what you want.

PROCSQL;    There should be a space in between     PROC SQL;
 
CREATE TABLE test2          
 
AS SELECT a.*

  FROM test a          
 
LEFT JOIN test b        
       ON       a.date_consensus                    >      b.date_forecast    
    

       AND     a.date_forecast_minus90d <= b.date_forecast_confirmation

       AND     a.date_forecast_minus90d      >=    b.date_forecast

  GROUP BY identgroup;

QUIT;

Please check the logic.

1.      resulting date_consensus ( 07/10/1997) not greater than   date_forecast ( 07/10/1997)

2.     date_forecast_minus90d (09/07/1997)   le     date_forecast_confirmation (16/01/1998)  -This is correct

3.     resulting date_forecast_minus90d (09/07/1997) not greater than date_forecast (07/10/1997)

Also there is no by group as identgroup in your data.

Please post that data in usable format so that the time is saved.

astolz0
Calcite | Level 5

Hi Mit,

that was just a typo in here. Of course, there is a space in PROC SORT and I meant id_group not ident_group.

Thanks!

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 18 replies
  • 1383 views
  • 0 likes
  • 6 in conversation