I have a dataset i.e. -
Coverage_Start Termination_Date Member_Id
24-Jul-19 1-Jun-21 42968701
24-Jul-19 1-Mar-21 42968701
29-Feb-20 1-Mar-20 42968701
16-Feb-19 1-Mar-19 42968701
1-Mar-17 1-Mar-18 42968701
1-Mar-16 1-Mar-17 42968701
1-Dec-15 31-Dec-16 42968701
I want to reduce this dataset, suppose in last three rows minimum coverage_start- 1-Dec-15 and maximum termination_date- 1-Mar-18, so I want to combine all three bottom rows because it has continuous coverage. As result the bottom three rows will be reduced to "1-Dec-15 1-Mar-18 42968701".
Reduced Dataset should be like -
Coverage_Start Termination_Date Member_Id
24-Jun-19 1-Jun-21 42968701
16-Feb-19 1-Mar-19 42968701
1-Dec-15 1-Mar-18 42968701c
Can anyone please help me with this? I'm trying this since a very log time but couldn't achieve it.
I have a dataset i.e. -
Coverage_Start Termination_Date Member_Id
24-Jul-19 1-Jun-21 42968701
24-Jul-19 1-Mar-21 42968701
29-Feb-20 1-Mar-20 42968701
16-Feb-19 1-Mar-19 42968701
1-Mar-17 1-Mar-18 42968701
1-Mar-16 1-Mar-17 42968701
1-Dec-15 31-Dec-16 42968701
I want to reduce this dataset, suppose in last three rows minimum coverage_start- 1-Dec-15 and maximum termination_date- 1-Mar-18, so I want to combine all three bottom rows because it has continuous coverage. As result the bottom three rows will be reduced to "1-Dec-15 1-Mar-18 42968701".
Reduced Dataset should be like -
Coverage_Start Termination_Date Member_Id
24-Jun-19 1-Jun-21 42968701
16-Feb-19 1-Mar-19 42968701
1-Dec-15 1-Mar-18 42968701c
Can anyone please help me with this? I'm trying this since a very log time but couldn't achieve it.
data have;
input Coverage_Start : date11. Termination_Date : date11. Member_Id;
format Coverage_Start Termination_Date date11. ;
cards;
24-Jul-19 1-Jun-21 42968701
24-Jul-19 1-Mar-21 42968701
29-Feb-20 1-Mar-20 42968701
16-Feb-19 1-Mar-19 42968701
1-Mar-17 1-Mar-18 42968701
1-Mar-16 1-Mar-17 42968701
1-Dec-15 31-Dec-16 42968701
;
data temp;
set have;
do date=Coverage_Start to Termination_Date ;
output;
end;
keep date Member_Id;
run;
proc sort data=temp nodupkey;by Member_Id date;run;
data temp ;
set temp;
by Member_Id;
if first.Member_Id or dif(date) ne 1 then group+1;
run;
proc sql;
create table want as
select group,max(Member_Id) as Member_Id,
min(date) as Coverage_Start format=date11.,
max(date) as Termination_Date format=date11.
from temp
group by group;
quit;
I merged your questions. Please do not double-post.
Three questions:
1. Yes, the grouping vairable is member_id.
2. For first two observations, since there is no break between them so took the minimum of coverage_start and maximum of termination_date in the desired output.
3. Member_id in last obs should be 42968701. Sorry for the typo.
Some questions:
1. No. Dates are in date9. format.
2. I have a dataset having millions of row I want reduce the number rows against each member_id.
The above obs are sample of rows against a member.
See this:
data have;
input (Coverage_Start Termination_Date) (:date9.) Member_Id $;
format coverage_start termination_date yymmdd10.;
datalines;
24-Jul-19 1-Jun-21 42968701
24-Jul-19 1-Mar-21 42968701
29-Feb-20 1-Mar-20 42968701
16-Feb-19 1-Mar-19 42968701
1-Mar-17 1-Mar-18 42968701
1-Mar-16 1-Mar-17 42968701
1-Dec-15 31-Dec-16 42968701
;
proc sort data=have;
by member_id coverage_start termination_date;
run;
data want;
set have;
by member_id;
set
have (
firstobs=2
keep=member_id coverage_start
rename=(member_id=_mem coverage_start=_start)
)
have (
obs=1
keep=member_id coverage_start
rename=(member_id=_mem coverage_start=_start)
)
;
retain start;
if first.member_id then start = coverage_start;
if last.member_id or _start gt termination_date
then do;
coverage_start = start;
output;
start = _start;
end;
else put _all_;
format start yymmdd10.;
keep member_id coverage_start termination_date;
run;
The second SET with FIRSTOBS=2 creates a "look-ahead"; reading have again with OBS=1 is necessary to prevent a premature termination of the data step when OBS - 1 observations are read.
Please note how I presented source data in a data step with datalines; do so yourself in the future to prevent all questions about data type, formats, raw content etc.
I have a dataset i.e. -
Coverage_Start Termination_Date Member_Id
24-Jul-19 1-Jun-21 42968701
24-Jul-19 1-Mar-21 42968701
29-Feb-20 1-Mar-20 42968701
16-Feb-19 1-Mar-19 42968701
1-Mar-17 1-Mar-18 42968701
1-Mar-16 1-Mar-17 42968701
1-Dec-15 31-Dec-16 42968701
I want to reduce this dataset, suppose in last three rows minimum coverage_start- 1-Dec-15 and maximum termination_date- 1-Mar-18, so I want to combine all three bottom rows because it has continuous coverage.
As result the bottom three rows will be reduced to "1-Dec-15 1-Mar-18 42968701".
Reduced Dataset should be like -
Coverage_Start Termination_Date Member_Id
24-Jun-19 1-Jun-21 42968701
16-Feb-19 1-Mar-19 42968701
1-Dec-15 1-Mar-18 42968701
I want to achieve this task using SAS programming.
Can anyone please help me with this? I'm trying this since a very log time but couldn't achieve it.
Hello,
> I want to achieve this task using SAS programming.
I have moved your question to the "SAS programming" board.
A question like this is answered in the programming board in an instant.
I'll have a look at it on Monday (but you'll most probably have an answer by then).
Also (this is a suggestion for your next post)
, always provide us with your example data by means of a data step with datalines, like here :
data work.have;
informat Coverage_Start Termination_Date date9.;
format Coverage_Start Termination_Date date11.;
input Coverage_Start Termination_Date Member_Id $;
datalines;
24-Jul-19 1-Jun-21 42968701
24-Jul-19 1-Mar-21 42968701
29-Feb-20 1-Mar-20 42968701
16-Feb-19 1-Mar-19 42968701
1-Mar-17 1-Mar-18 42968701
1-Mar-16 1-Mar-17 42968701
1-Dec-15 31-Dec-16 42968701
;
run;
Koen
I have merged your post, please don't post the same issue multiple times, this will only cause confusion and you won't get help any faster.
Tweaked code:
data want;
set have;
by member_id;
set
have (
firstobs=2
keep=member_id coverage_start
rename=(member_id=_mem coverage_start=_start)
)
have (
obs=1
keep=member_id coverage_start
rename=(member_id=_mem coverage_start=_start)
)
;
retain start term;
if first.member_id
then do;
start = coverage_start;
term = termination_date;
end;
term = max(termination_date,term);
put _all_;
if last.member_id or _start gt term
then do;
coverage_start = start;
termination_date = term;
output;
start = _start;
term = '01jan1900'd;
end;
format start term yymmdd10.;
keep member_id coverage_start termination_date;
run;
data have;
input Coverage_Start : date11. Termination_Date : date11. Member_Id;
format Coverage_Start Termination_Date date11. ;
cards;
24-Jul-19 1-Jun-21 42968701
24-Jul-19 1-Mar-21 42968701
29-Feb-20 1-Mar-20 42968701
16-Feb-19 1-Mar-19 42968701
1-Mar-17 1-Mar-18 42968701
1-Mar-16 1-Mar-17 42968701
1-Dec-15 31-Dec-16 42968701
;
data temp;
set have;
do date=Coverage_Start to Termination_Date ;
output;
end;
keep date Member_Id;
run;
proc sort data=temp nodupkey;by Member_Id date;run;
data temp ;
set temp;
by Member_Id;
if first.Member_Id or dif(date) ne 1 then group+1;
run;
proc sql;
create table want as
select group,max(Member_Id) as Member_Id,
min(date) as Coverage_Start format=date11.,
max(date) as Termination_Date format=date11.
from temp
group by group;
quit;
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.