Hi Team,
I need the code logic to create the base data from the existing monthly data which has all the accounts with different start and end date.
For example, below is the sample data..
acct_nbr | total_enrolled | year | Program | Start _date | End_Date | Difference - I calculated (need in logic to get these numbers) |
xxx112 | 1 | 2019 | MOR | 01May2019 | 01Jul2019 | 1 |
xxx617 | 1 | 2019 | MOR | 01Feb2019 | 01May2019 | 11 |
xxx617 | 2 | 2020 | MOR | 01Apr2020 | 01Jun2020 | 11 |
xxx674 | 1 | 2019 | MOR | 01Mar2019 | 01May2019 | 2 |
xxx674 | 2 | 2019 | MOR | 01Jul2019 | 01Aug2019 | 2 |
xxx906 | 1 | 2019 | MOR | 01Sep2019 | 01Oct2019 | 1 |
xxx906 | 2 | 2019 | MOR | 01Nov2019 | 01Feb2020 | 1 |
xxx102 | 1 | 2020 | MOR | 01Apr2020 | 01Jun2020 | 3 |
xxx110 | 1 | 2020 | MOR | 01Apr2020 | 01Jul2020 | 4 |
xxx144 | 1 | 2020 | MOR | 01Mar2020 | 01May2020 | 2 |
xxx227 | 1 | 2019 | MOR | 01Nov2019 | 01Mar2020 | 2 |
xxx227 | 2 | 2020 | MOR | 01May2020 | 01Jun2020 | 1 |
xxx227 | 3 | 2020 | MOR | 01Jun2020 | 01Oct2020 | 1 |
The concept is the customer(account) has enrolled in the MOR Program many times in a year..i have to create a base data to include the accounts where the enrolled account in the MOR program(next program) has a gap of greater 6 months from start and end date from each previous enrolled times.
if they have entered once then its fine to keep in records, the logic only for multiple times enrolled.
Below is the output data that needs to create.
acct_nbr | total_enrolled | year | Program | Start _date | End_Date |
xxx112 | 1 | 2019 | MOR | 01May2019 | 01Jul2019 |
xxx617 | 1 | 2019 | MOR | 01Feb2019 | 01May2019 |
xxx617 | 2 | 2020 | MOR | 01Apr2020 | 01Jun2020 |
xxx674 | 2 | 2019 | MOR | 01Mar2019 | 01Aug2019 |
xxx906 | 2 | 2019 | MOR | 01Sep2019 | 01Feb2020 |
xxx102 | 1 | 2020 | MOR | 01Apr2020 | 01Jun2020 |
xxx110 | 1 | 2020 | MOR | 01Apr2020 | 01Jul2020 |
xxx144 | 1 | 2020 | MOR | 01Mar2020 | 01May2020 |
xxx227 | 1 | 2019 | MOR | 01Nov2019 | 01Oct2020 |
> 6 mnts I should keep all the records even if one account has enrolled many times.
< 6 months..i should consider only 1 record with first start ( first enrolled) and last end date (last enrolled) - between each enrolled is less than 6 months.
Appreciate your help and suggestion. thanks
RS
Combine a "look-ahead" with use of lagged values:
data have;
infile datalines dlm='09'x;
input acct_nbr $ total_enrolled year program $ (start_date end_date) (:date9.);
format start_date end_date yymmdd10.;
datalines;
xxx112 1 2019 MOR 01May2019 01Jul2019
xxx617 1 2019 MOR 01Feb2019 01May2019
xxx617 2 2020 MOR 01Apr2020 01Jun2020
xxx674 1 2019 MOR 01Mar2019 01May2019
xxx674 2 2019 MOR 01Jul2019 01Aug2019
xxx906 1 2019 MOR 01Sep2019 01Oct2019
xxx906 2 2019 MOR 01Nov2019 01Feb2020
xxx102 1 2020 MOR 01Apr2020 01Jun2020
xxx110 1 2020 MOR 01Apr2020 01Jul2020
xxx144 1 2020 MOR 01Mar2020 01May2020
xxx227 1 2019 MOR 01Nov2019 01Mar2020
xxx227 2 2020 MOR 01May2020 01Jun2020
xxx227 3 2020 MOR 01Jun2020 01Oct2020
;
data want;
merge
have
have (
firstobs=2
keep=acct_nbr start_date
rename=(acct_nbr=_a start_date=_s)
)
;
format _l_start _l_end yymmdd10.;
retain _l_start;
_l_end = lag(end_date);
if acct_nbr ne _a or intck('month',end_date,_s) > 6
then do;
if _l_start ne . and intck('month',_l_end,start_date) le 6
then start_date = _l_start;
output;
_l_start = .;
end;
else if _l_start = .
then _l_start = start_date;
drop _:;
run;
The LAG function has to be used unconditionally to correctly fill its FIFO queue.
Please use data steps with datalines for example data, so we don't need to waste time doing that for you. It will get you quicker answers.
Combine a "look-ahead" with use of lagged values:
data have;
infile datalines dlm='09'x;
input acct_nbr $ total_enrolled year program $ (start_date end_date) (:date9.);
format start_date end_date yymmdd10.;
datalines;
xxx112 1 2019 MOR 01May2019 01Jul2019
xxx617 1 2019 MOR 01Feb2019 01May2019
xxx617 2 2020 MOR 01Apr2020 01Jun2020
xxx674 1 2019 MOR 01Mar2019 01May2019
xxx674 2 2019 MOR 01Jul2019 01Aug2019
xxx906 1 2019 MOR 01Sep2019 01Oct2019
xxx906 2 2019 MOR 01Nov2019 01Feb2020
xxx102 1 2020 MOR 01Apr2020 01Jun2020
xxx110 1 2020 MOR 01Apr2020 01Jul2020
xxx144 1 2020 MOR 01Mar2020 01May2020
xxx227 1 2019 MOR 01Nov2019 01Mar2020
xxx227 2 2020 MOR 01May2020 01Jun2020
xxx227 3 2020 MOR 01Jun2020 01Oct2020
;
data want;
merge
have
have (
firstobs=2
keep=acct_nbr start_date
rename=(acct_nbr=_a start_date=_s)
)
;
format _l_start _l_end yymmdd10.;
retain _l_start;
_l_end = lag(end_date);
if acct_nbr ne _a or intck('month',end_date,_s) > 6
then do;
if _l_start ne . and intck('month',_l_end,start_date) le 6
then start_date = _l_start;
output;
_l_start = .;
end;
else if _l_start = .
then _l_start = start_date;
drop _:;
run;
The LAG function has to be used unconditionally to correctly fill its FIFO queue.
Please use data steps with datalines for example data, so we don't need to waste time doing that for you. It will get you quicker answers.
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.