Hi everyone,
attached you find a slightly larger sample which better explains what we are looking for (this test sample is very similar to the one posted before; however, we added four more id_groups).
The following code by FrederikE gives us exactly what we want, but it is hard to implement with thousands of id_groups in our full sample:
PROC SQL;
CREATE TABLE test2
AS SELECT *
FROM
(SELECT *, id_group AS id_group_new FROM test)
UNION
(SELECT a.*, b.id_group_new FROM test AS a
INNER JOIN
(SELECT *,id AS bid, id_group AS id_group_new FROM test WHERE id_group eq 10) AS b
ON b.date_consensus > a.date_forecast
AND b.date_consensus_minus90d <= a.date_forecast_confirmation
AND b.date_consensus_minus90d >= a.date_forecast
)
UNION
(SELECT c.*, d.id_group_new FROM test AS c
INNER JOIN
(SELECT *, id_group AS id_group_new FROM test WHERE id_group eq 100) AS d
ON d.date_consensus > c.date_forecast
AND d.date_consensus_minus90d <= c.date_forecast_confirmation
AND d.date_consensus_minus90d >= c.date_forecast
)
UNION
(SELECT e.*, f.id_group_new FROM test AS e
INNER JOIN
(SELECT *, id_group AS id_group_new FROM test WHERE id_group eq 1000) AS f
ON f.date_consensus > e.date_forecast
AND f.date_consensus_minus90d <= e.date_forecast_confirmation
AND f.date_consensus_minus90d >= e.date_forecast
)
UNION
(SELECT g.*, h.id_group_new FROM test AS g
INNER JOIN
(SELECT *, id_group AS id_group_new FROM test WHERE id_group eq 10000) AS h
ON h.date_consensus > g.date_forecast
AND h.date_consensus_minus90d <= g.date_forecast_confirmation
AND h.date_consensus_minus90d >= g.date_forecast
)
UNION
(SELECT i.*, j.id_group_new FROM test AS i
INNER JOIN
(SELECT *, id_group AS id_group_new FROM test WHERE id_group eq 100000) AS j
ON j.date_consensus > i.date_forecast
AND j.date_consensus_minus90d <= i.date_forecast_confirmation
AND j.date_consensus_minus90d >= i.date_forecast
)
UNION
(SELECT k.*, l.id_group_new FROM test AS k
INNER JOIN
(SELECT *, id_group AS id_group_new FROM test WHERE id_group eq 1000000) AS l
ON l.date_consensus > k.date_forecast
AND l.date_consensus_minus90d <= k.date_forecast_confirmation
AND l.date_consensus_minus90d >= k.date_forecast
)
;
QUIT;
PROC SORT DATA=test2;
BY id_group_new date_forecast;
RUN;
DATA test2;
RETAIN
id id_group_new date_consensus date_forecast date_consensus_minus90d
date_forecast_confirmation eps_forecast;
SET test2;
RENAME id_group = id_group_old;
RUN;
--> RESULT:
Here, the code works fine as id = 1 is NOT merged to id_group = 1000000 because it does not fulfill
l.date_consensus_minus90d <= k.date_forecast_confirmation
--> 22/01/1998 > 16/01/1998
@Patrick: unfortunately, your code does not seem to account for that restriction?!
Regarding the thousands of id_groups in our full sample, we cannot implement the code above as it requires to manually type in all different id_group numbers.
Is there a possibility to combine the PROC SQL approach above with a loop?
Something like:
(we know this code does not work as it would overwrite test2 each time the loop runs; neither does the do command itself run; but hopefully you get the idea?!)
%MACRO loop;
PROC SQL;
%DO i = 1 %TO 1000000;
CREATE TABLE test2
AS SELECT *
FROM
(SELECT *, id_group AS id_group_new FROM test)
UNION
(SELECT a.*, b.id_group_new FROM test AS a
INNER JOIN
(SELECT *,id AS bid, id_group AS id_group_new FROM test WHERE id_group eq &i.) AS b
ON b.date_consensus > a.date_forecast
AND b.date_consensus_minus90d <= a.date_forecast_confirmation
AND b.date_consensus_minus90d >= a.date_forecast
)
%END;
;
QUIT;
%MEND;
Thank you all very much!
Regards,
Alex and Niklas
Any ideas?
Thank you!
The logic you are using is different than what you want.
PROCSQL; There should be a space in between PROC SQL;
CREATE TABLE test2
AS SELECT a.*
FROM test a
LEFT JOIN test b
ON a.date_consensus > b.date_forecast
AND a.date_forecast_minus90d <= b.date_forecast_confirmation
AND a.date_forecast_minus90d >= b.date_forecast
GROUP BY identgroup;
QUIT;
Please check the logic.
1. resulting date_consensus ( 07/10/1997) not greater than date_forecast ( 07/10/1997)
2. date_forecast_minus90d (09/07/1997) le date_forecast_confirmation (16/01/1998) -This is correct
3. resulting date_forecast_minus90d (09/07/1997) not greater than date_forecast (07/10/1997)
Also there is no by group as identgroup in your data.
Please post that data in usable format so that the time is saved.
Hi Mit,
that was just a typo in here. Of course, there is a space in PROC SORT and I meant id_group not ident_group.
Thanks!
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.