- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Data have_contract ; input contract_number contract_version version_begin_date ; informat version_begin_date ddmmyy10. ; format version_begin_date ddmmyy10. ; datalines ; 123456789 1 01/09/2021 123456789 2 02/10/2021 123456789 3 03/11/2021 run ; data have_event ; input contract_number event_date ; informat event_date ddmmyy10. ; format event_date ddmmyy10. ; datalines; 123456789 01/09/2021 123456789 08/09/2021 123456789 03/10/2021 123456789 30/10/2021 123456789 03/11/2021 123456789 13/11/2021 123456789 23/11/2021 run ; Data want ; input contract_number contract_version version_begin_date event_date ; informat version_begin_date event_date ddmmyy10. ; format version_begin_date event_date ddmmyy10. ; datalines ; 123456789 1 01/09/2021 01/09/2021 123456789 1 01/09/2021 08/09/2021 123456789 2 02/10/2021 03/10/2021 123456789 2 02/10/2021 30/10/2021 123456789 3 03/11/2021 03/11/2021 123456789 3 03/11/2021 13/11/2021 123456789 3 03/11/2021 23/11/2021 run ;
Hello,
I would like to join this 2 tables "have".
But I only want to join the events wich have an event date greater or equal the begin_date of the version and less than the the begin date of the next version.
thanks in advance for your help
kind regards
Nasser
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This is a case of "bringing history forward". I.e. you want to bring forward ("retain" in SAS language) the contract record through all subsequent event records, until the next contract record. And you want one record per event.
For sorted data it's easy:
data want;
set have_contract (in=inc keep=contract_number version_begin_date rename=(version_begin_date=event_date))
have_event (in=ine);
by contract_number event_date;
if inc then set have_contract;
if ine;
run;
Note this program depends a little on the data being relatively "clean". That is, it assumes that no contract has an (erroneous) event date that precedes the first contract date. If such a condition exists, then that event would inherit data from the preceding contract.
The "trick" here, is that all the variables that are only read in by the "if inc then set have_contract" are retained through all subsequent event records until the next have_contract date. This in turn is why the first SET statement only keeps the two BY-variables from have_contract, thereby avoiding overwriting all the other have_contract variables.
Editted note: Yes, this can also be done via PROC SQL, but would likely require a good deal more code to satisfy an atavistic impulse. And this technique is easily expanded to apply to more than two input datasets. See History Carried Forward, Future Carried Back: Mixing Time Series of Differing Frequencies.
@Nasser_DRMCP asked:
is it possible to get all contracts even thoses without events ?
Yes. Drop the
if ine:
statement.
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
hello
I succeeded to get the result I expected . first I sort my data by descending date .and then with LAG I could determine the version_begin_date_next.
and then with the filter below
and event_date>= a.version_begin_date
and b.event_date < a.version_bvegin_date_next
Nasser
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This is a case of "bringing history forward". I.e. you want to bring forward ("retain" in SAS language) the contract record through all subsequent event records, until the next contract record. And you want one record per event.
For sorted data it's easy:
data want;
set have_contract (in=inc keep=contract_number version_begin_date rename=(version_begin_date=event_date))
have_event (in=ine);
by contract_number event_date;
if inc then set have_contract;
if ine;
run;
Note this program depends a little on the data being relatively "clean". That is, it assumes that no contract has an (erroneous) event date that precedes the first contract date. If such a condition exists, then that event would inherit data from the preceding contract.
The "trick" here, is that all the variables that are only read in by the "if inc then set have_contract" are retained through all subsequent event records until the next have_contract date. This in turn is why the first SET statement only keeps the two BY-variables from have_contract, thereby avoiding overwriting all the other have_contract variables.
Editted note: Yes, this can also be done via PROC SQL, but would likely require a good deal more code to satisfy an atavistic impulse. And this technique is easily expanded to apply to more than two input datasets. See History Carried Forward, Future Carried Back: Mixing Time Series of Differing Frequencies.
@Nasser_DRMCP asked:
is it possible to get all contracts even thoses without events ?
Yes. Drop the
if ine:
statement.
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi mkeintz
many thanks for your help. I have an additional question.
I notice that if a contratc exists in the in have_contract table but not in the have_event table then this contract does not exist in the "want" table.
is it possible to get all contracts even thoses without events ?
thanks in advance
Nasser
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
/*First sort the have_contract and create a lag variable:*/
proc sort data=have_Contract; by descending version_begin_date; run;
data have_contract2;
set have_contract;
version_lag = lag(version_begin_date);
format version_lag ddmmyy10.;
run;
/*Then create a Cartesian product followed by filtering for what you need:*/
proc sql;
create table all1 as select a.*, b.event_date from have_contract2 a, have_event b where a.contract_number=b.contract_number
order by version_begin_date;
quit;
data all2;
set all1;
if version_lag ne . and version_begin_date=<event_date<version_lag then inc = 1;
else if version_lag eq . and version_begin_date=<event_date then inc =1;
if inc=1;
drop version_lag inc;
run;
proc sort data=all2; by contract_number contract_version version_begin_date event_date;