Hi all,
As I'm quite new to SAS I'm still learning some basic steps.
I have the following dataset:
contract_nr start_date end_date product_code
1392358 01-02-2017 31-07-2017 A
1392358 01-08-2017 30-11-2017 A
1392358 01-12-2017 28-02-2018 B
1392358 01-03-2018 30-04-2018 B
1847204 08-12-2017 31-12-2017 A
1847204 02-02-2018 31-03-2018 B
(I tried to make a example SAS code, but failed)
I want to find out when a product code has changed, but only if there are less than, say, 20 days between the start and end date. So, my want dataset looks like this:
contract_nr start_date end_date product_code indicator
1392358 01-02-2017 31-07-2017 A 0
1392358 01-08-2017 30-11-2017 A 0
1392358 01-12-2017 28-02-2018 B 1
1392358 01-03-2018 30-04-2018 B 1
1847204 08-12-2017 31-12-2017 A 0
1847204 02-02-2018 31-03-2018 B 0
By the way, the product code can only change from A to B.
I prefer not to use loops, but I do think we can get around them. I hope someone can help me!
data have;
input contract_nr $ start_date :ddmmyy10. end_date :ddmmyy10. product_code $;
cards;
1392358 01-02-2017 31-07-2017 A
1392358 01-08-2017 30-11-2017 A
1392358 01-12-2017 28-02-2018 B
1392358 01-03-2018 30-04-2018 B
1847204 08-12-2017 31-12-2017 A
1847204 02-02-2018 31-03-2018 B
;
run;
data want;
set have;
if start_date - lag(end_date) > 20 and product_code ne lag(product_code) and contract_nr = lag(contract_nr) then indicator = 1;
else indicator = 0;
run;
You wrote "20 days between the start and end date. So, my want dataset looks like this:"
Can you mark this in the output to help me understand better?
data have;
input contract_nr $ start_date :ddmmyy10. end_date :ddmmyy10. product_code $;
cards;
1392358 01-02-2017 31-07-2017 A
1392358 01-08-2017 30-11-2017 A
1392358 01-12-2017 28-02-2018 B
1392358 01-03-2018 30-04-2018 B
1847204 08-12-2017 31-12-2017 A
1847204 02-02-2018 31-03-2018 B
;
run;
data want;
set have;
if start_date - lag(end_date) > 20 and product_code ne lag(product_code) and contract_nr = lag(contract_nr) then indicator = 1;
else indicator = 0;
run;
Thanks! Never thought it would be this easy
It seems like, on the one hand, you are comparing a record to the record immediately above it, but then you also have this 20 day restriction which is comparing two values on the same line (and not the record above it). It's not clear to me how these two restrictions work together. If I am understanding this properly (and I probably am not understanding it properly), the 20 day restriction would mean that the indicator variable would result in zeros for every row on the table.
I'm sorry for not being clear.
There has to be a 20-day restriction between records, not on the same line. So, I want to know the number of days between the last end date and the next start date. This number has to be smaller than 20.
So, if you look at my example dataset. The number of days between 30-11-2017 and 01-12-2017 is 0 (for 1392358). Therefore the change from A to B should get a 1.
For 1847204, there are 32 days between 31-12-2017 and 02-02-2018. Therefore, the indicator should be 0 for this change.
Assuming i understand what you want:
data have;
input contract_nr $ start_date :ddmmyy10. end_date :ddmmyy10. product_code $;
format start_date end_date ddmmyy10.;
cards;
1392358 01-02-2017 31-07-2017 A
1392358 01-08-2017 30-11-2017 A
1392358 01-12-2017 28-02-2018 B
1392358 01-03-2018 30-04-2018 B
1847204 08-12-2017 31-12-2017 A
1847204 02-02-2018 31-03-2018 B
;
run;
data want;
set have;
by contract_nr;
if first.contract_nr then call missing(k);
retain k;
indicator= product_code='B' and intck('day',k,start_date)<20 ;
k=end_date;
drop k;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.