I have a dataset i.e. -
Coverage_Start Termination_Date Member_Id
24-Jul-19 1-Jun-21 42968701
24-Jul-19 1-Mar-21 42968701
29-Feb-20 1-Mar-20 42968701
16-Feb-19 1-Mar-19 42968701
1-Mar-17 1-Mar-18 42968701
1-Mar-16 1-Mar-17 42968701
1-Dec-15 31-Dec-16 42968701
I want to reduce this dataset, suppose in last three rows minimum coverage_start- 1-Dec-15 and maximum termination_date- 1-Mar-18, so I want to combine all three bottom rows because it has continuous coverage. As result the bottom three rows will be reduced to "1-Dec-15 1-Mar-18 42968701".
Reduced Dataset should be like -
Coverage_Start Termination_Date Member_Id
24-Jun-19 1-Jun-21 42968701
16-Feb-19 1-Mar-19 42968701
1-Dec-15 1-Mar-18 42968701c
Can anyone please help me with this? I'm trying this since a very log time but couldn't achieve it.
I have a dataset i.e. -
Coverage_Start Termination_Date Member_Id
24-Jul-19 1-Jun-21 42968701
24-Jul-19 1-Mar-21 42968701
29-Feb-20 1-Mar-20 42968701
16-Feb-19 1-Mar-19 42968701
1-Mar-17 1-Mar-18 42968701
1-Mar-16 1-Mar-17 42968701
1-Dec-15 31-Dec-16 42968701
I want to reduce this dataset, suppose in last three rows minimum coverage_start- 1-Dec-15 and maximum termination_date- 1-Mar-18, so I want to combine all three bottom rows because it has continuous coverage. As result the bottom three rows will be reduced to "1-Dec-15 1-Mar-18 42968701".
Reduced Dataset should be like -
Coverage_Start Termination_Date Member_Id
24-Jun-19 1-Jun-21 42968701
16-Feb-19 1-Mar-19 42968701
1-Dec-15 1-Mar-18 42968701c
Can anyone please help me with this? I'm trying this since a very log time but couldn't achieve it.
data have;
input Coverage_Start : date11. Termination_Date : date11. Member_Id;
format Coverage_Start Termination_Date date11. ;
cards;
24-Jul-19 1-Jun-21 42968701
24-Jul-19 1-Mar-21 42968701
29-Feb-20 1-Mar-20 42968701
16-Feb-19 1-Mar-19 42968701
1-Mar-17 1-Mar-18 42968701
1-Mar-16 1-Mar-17 42968701
1-Dec-15 31-Dec-16 42968701
;
data temp;
set have;
do date=Coverage_Start to Termination_Date ;
output;
end;
keep date Member_Id;
run;
proc sort data=temp nodupkey;by Member_Id date;run;
data temp ;
set temp;
by Member_Id;
if first.Member_Id or dif(date) ne 1 then group+1;
run;
proc sql;
create table want as
select group,max(Member_Id) as Member_Id,
min(date) as Coverage_Start format=date11.,
max(date) as Termination_Date format=date11.
from temp
group by group;
quit;
I merged your questions. Please do not double-post.
Three questions:
1. Yes, the grouping vairable is member_id.
2. For first two observations, since there is no break between them so took the minimum of coverage_start and maximum of termination_date in the desired output.
3. Member_id in last obs should be 42968701. Sorry for the typo.
Some questions:
1. No. Dates are in date9. format.
2. I have a dataset having millions of row I want reduce the number rows against each member_id.
The above obs are sample of rows against a member.
See this:
data have;
input (Coverage_Start Termination_Date) (:date9.) Member_Id $;
format coverage_start termination_date yymmdd10.;
datalines;
24-Jul-19 1-Jun-21 42968701
24-Jul-19 1-Mar-21 42968701
29-Feb-20 1-Mar-20 42968701
16-Feb-19 1-Mar-19 42968701
1-Mar-17 1-Mar-18 42968701
1-Mar-16 1-Mar-17 42968701
1-Dec-15 31-Dec-16 42968701
;
proc sort data=have;
by member_id coverage_start termination_date;
run;
data want;
set have;
by member_id;
set
have (
firstobs=2
keep=member_id coverage_start
rename=(member_id=_mem coverage_start=_start)
)
have (
obs=1
keep=member_id coverage_start
rename=(member_id=_mem coverage_start=_start)
)
;
retain start;
if first.member_id then start = coverage_start;
if last.member_id or _start gt termination_date
then do;
coverage_start = start;
output;
start = _start;
end;
else put _all_;
format start yymmdd10.;
keep member_id coverage_start termination_date;
run;
The second SET with FIRSTOBS=2 creates a "look-ahead"; reading have again with OBS=1 is necessary to prevent a premature termination of the data step when OBS - 1 observations are read.
Please note how I presented source data in a data step with datalines; do so yourself in the future to prevent all questions about data type, formats, raw content etc.
I have a dataset i.e. -
Coverage_Start Termination_Date Member_Id
24-Jul-19 1-Jun-21 42968701
24-Jul-19 1-Mar-21 42968701
29-Feb-20 1-Mar-20 42968701
16-Feb-19 1-Mar-19 42968701
1-Mar-17 1-Mar-18 42968701
1-Mar-16 1-Mar-17 42968701
1-Dec-15 31-Dec-16 42968701
I want to reduce this dataset, suppose in last three rows minimum coverage_start- 1-Dec-15 and maximum termination_date- 1-Mar-18, so I want to combine all three bottom rows because it has continuous coverage.
As result the bottom three rows will be reduced to "1-Dec-15 1-Mar-18 42968701".
Reduced Dataset should be like -
Coverage_Start Termination_Date Member_Id
24-Jun-19 1-Jun-21 42968701
16-Feb-19 1-Mar-19 42968701
1-Dec-15 1-Mar-18 42968701
I want to achieve this task using SAS programming.
Can anyone please help me with this? I'm trying this since a very log time but couldn't achieve it.
Hello,
> I want to achieve this task using SAS programming.
I have moved your question to the "SAS programming" board.
A question like this is answered in the programming board in an instant.
I'll have a look at it on Monday (but you'll most probably have an answer by then).
Also (this is a suggestion for your next post)
, always provide us with your example data by means of a data step with datalines, like here :
data work.have;
informat Coverage_Start Termination_Date date9.;
format Coverage_Start Termination_Date date11.;
input Coverage_Start Termination_Date Member_Id $;
datalines;
24-Jul-19 1-Jun-21 42968701
24-Jul-19 1-Mar-21 42968701
29-Feb-20 1-Mar-20 42968701
16-Feb-19 1-Mar-19 42968701
1-Mar-17 1-Mar-18 42968701
1-Mar-16 1-Mar-17 42968701
1-Dec-15 31-Dec-16 42968701
;
run;
Koen
I have merged your post, please don't post the same issue multiple times, this will only cause confusion and you won't get help any faster.
Tweaked code:
data want;
set have;
by member_id;
set
have (
firstobs=2
keep=member_id coverage_start
rename=(member_id=_mem coverage_start=_start)
)
have (
obs=1
keep=member_id coverage_start
rename=(member_id=_mem coverage_start=_start)
)
;
retain start term;
if first.member_id
then do;
start = coverage_start;
term = termination_date;
end;
term = max(termination_date,term);
put _all_;
if last.member_id or _start gt term
then do;
coverage_start = start;
termination_date = term;
output;
start = _start;
term = '01jan1900'd;
end;
format start term yymmdd10.;
keep member_id coverage_start termination_date;
run;
data have;
input Coverage_Start : date11. Termination_Date : date11. Member_Id;
format Coverage_Start Termination_Date date11. ;
cards;
24-Jul-19 1-Jun-21 42968701
24-Jul-19 1-Mar-21 42968701
29-Feb-20 1-Mar-20 42968701
16-Feb-19 1-Mar-19 42968701
1-Mar-17 1-Mar-18 42968701
1-Mar-16 1-Mar-17 42968701
1-Dec-15 31-Dec-16 42968701
;
data temp;
set have;
do date=Coverage_Start to Termination_Date ;
output;
end;
keep date Member_Id;
run;
proc sort data=temp nodupkey;by Member_Id date;run;
data temp ;
set temp;
by Member_Id;
if first.Member_Id or dif(date) ne 1 then group+1;
run;
proc sql;
create table want as
select group,max(Member_Id) as Member_Id,
min(date) as Coverage_Start format=date11.,
max(date) as Termination_Date format=date11.
from temp
group by group;
quit;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.