BookmarkSubscribeRSS Feed
rahul11124
Fluorite | Level 6

I have a dataset i.e. -

 

Coverage_Start Termination_Date Member_Id
24-Jul-19 1-Jun-21 42968701
24-Jul-19 1-Mar-21 42968701
29-Feb-20 1-Mar-20 42968701
16-Feb-19 1-Mar-19 42968701
1-Mar-17 1-Mar-18 42968701
1-Mar-16 1-Mar-17 42968701
1-Dec-15 31-Dec-16 42968701

 

I want to reduce this dataset, suppose in last three rows minimum coverage_start- 1-Dec-15 and maximum termination_date- 1-Mar-18, so I want to combine all three bottom rows because it has continuous coverage. As result the bottom three rows will be reduced to "1-Dec-15 1-Mar-18 42968701".

 

Reduced Dataset should be like -

 

Coverage_Start Termination_Date Member_Id
24-Jun-19 1-Jun-21 42968701
16-Feb-19 1-Mar-19 42968701
1-Dec-15 1-Mar-18 42968701c

 

Can anyone please help me with this? I'm trying this since a very log time but couldn't achieve it.

16 REPLIES 16
rahul11124
Fluorite | Level 6

I have a dataset i.e. -

 

Coverage_Start Termination_Date Member_Id
24-Jul-19 1-Jun-21 42968701
24-Jul-19 1-Mar-21 42968701
29-Feb-20 1-Mar-20 42968701
16-Feb-19 1-Mar-19 42968701
1-Mar-17 1-Mar-18 42968701
1-Mar-16 1-Mar-17 42968701
1-Dec-15 31-Dec-16 42968701

 

I want to reduce this dataset, suppose in last three rows minimum coverage_start- 1-Dec-15 and maximum termination_date- 1-Mar-18, so I want to combine all three bottom rows because it has continuous coverage. As result the bottom three rows will be reduced to "1-Dec-15 1-Mar-18 42968701".

 

Reduced Dataset should be like -

 

Coverage_Start Termination_Date Member_Id
24-Jun-19 1-Jun-21 42968701
16-Feb-19 1-Mar-19 42968701
1-Dec-15 1-Mar-18 42968701c

 

Can anyone please help me with this? I'm trying this since a very log time but couldn't achieve it.

Ksharp
Super User
data have;
input Coverage_Start : date11. Termination_Date : date11. Member_Id;
format Coverage_Start  Termination_Date  date11. ;
cards;
24-Jul-19 1-Jun-21 42968701
24-Jul-19 1-Mar-21 42968701
29-Feb-20 1-Mar-20 42968701
16-Feb-19 1-Mar-19 42968701
1-Mar-17 1-Mar-18 42968701
1-Mar-16 1-Mar-17 42968701
1-Dec-15 31-Dec-16 42968701
;

data temp;
 set have;
 do date=Coverage_Start to Termination_Date ;
  output;
 end;
keep date Member_Id;
run;
proc sort data=temp nodupkey;by Member_Id date;run;
data temp ;
 set temp;
 by Member_Id;
 if first.Member_Id or dif(date) ne 1 then group+1;
run;
proc sql;
create table want as
select group,max(Member_Id) as Member_Id,
 min(date) as Coverage_Start format=date11.,
 max(date) as Termination_Date  format=date11.
 from temp
  group by group;
quit;
PeterClemmensen
Tourmaline | Level 20

Three questions:

 

  1. What is your grouping variable here? Member_Id? If so, how do we determine what obs to group in the data below? It has the same member_id for all obs.
  2. How does Coverage_Start  = 24-Jun-19 appear in the first obs of your desired result?
  3. How does Member_ID = 42968701c appear in the last obs of your desired result?
rahul11124
Fluorite | Level 6

1. Yes, the grouping vairable is member_id.

 

2. For first two observations, since there is no break between them so took the minimum of coverage_start and maximum of termination_date in the desired output.

 

3. Member_id in last obs should be 42968701. Sorry for the typo.

andreas_lds
Jade | Level 19

Some questions:

  1.  Do you have sas-dates or strings? If you have strings the first step is fixing this along with a format that shows all four digits of the year.
  2. Why are you combining the last three rows? How are groups identified?
rahul11124
Fluorite | Level 6

1. No. Dates are in date9. format. 

2. I have a dataset having millions of row I want reduce the number rows against each member_id.

    The above obs are sample of rows against a member. 

Kurt_Bremser
Super User

See this:

data have;
input (Coverage_Start Termination_Date) (:date9.) Member_Id $;
format coverage_start termination_date yymmdd10.;
datalines;
24-Jul-19 1-Jun-21 42968701
24-Jul-19 1-Mar-21 42968701
29-Feb-20 1-Mar-20 42968701
16-Feb-19 1-Mar-19 42968701
1-Mar-17 1-Mar-18 42968701
1-Mar-16 1-Mar-17 42968701
1-Dec-15 31-Dec-16 42968701
;

proc sort data=have;
by member_id coverage_start termination_date;
run;

data want;
set have;
by member_id;
set
  have (
    firstobs=2
    keep=member_id coverage_start
    rename=(member_id=_mem coverage_start=_start)
  )
  have (
    obs=1
    keep=member_id coverage_start
    rename=(member_id=_mem coverage_start=_start)
  )
;
retain start;
if first.member_id then start = coverage_start;
if last.member_id or _start gt termination_date
then do;
  coverage_start = start;
  output;
  start = _start;
end;
else put _all_;
format start yymmdd10.;
keep member_id coverage_start termination_date;
run;

The second SET with FIRSTOBS=2 creates a "look-ahead"; reading have again with OBS=1 is necessary to prevent a premature termination of the data step when OBS - 1 observations are read.

Please note how I presented source data in a data step with datalines; do so yourself in the future to prevent all questions about data type, formats, raw content etc.

rahul11124
Fluorite | Level 6
Not getting the correct output in the last row.
Getting -
Coverage_Start Termination_Date Member_Id
24JUL2019 01MAR2020 W22442968701

Want -
Coverage_Start Termination_Date Member_Id
24JUL2019 01Jun2021 W22442968701
rahul11124
Fluorite | Level 6

I have a dataset i.e. -

Coverage_Start Termination_Date Member_Id
24-Jul-19 1-Jun-21 42968701
24-Jul-19 1-Mar-21 42968701
29-Feb-20 1-Mar-20 42968701
16-Feb-19 1-Mar-19 42968701
1-Mar-17 1-Mar-18 42968701
1-Mar-16 1-Mar-17 42968701
1-Dec-15 31-Dec-16 42968701

I want to reduce this dataset, suppose in last three rows minimum coverage_start- 1-Dec-15 and maximum termination_date- 1-Mar-18, so I want to combine all three bottom rows because it has continuous coverage.
As result the bottom three rows will be reduced to "1-Dec-15 1-Mar-18 42968701".

Reduced Dataset should be like -

Coverage_Start Termination_Date Member_Id
24-Jun-19 1-Jun-21 42968701
16-Feb-19 1-Mar-19 42968701
1-Dec-15 1-Mar-18 42968701

I want to achieve this task using SAS programming.
Can anyone please help me with this? I'm trying this since a very log time but couldn't achieve it.

sbxkoenk
SAS Super FREQ

Hello,

 

I want to achieve this task using SAS programming.

I have moved your question to the "SAS programming" board.

 

A question like this is answered in the programming board in an instant.
I'll have a look at it on Monday (but you'll most probably have an answer by then).

 

Also (this is a suggestion for your next post)
, always provide us with your example data by means of a data step with datalines, like here :

data work.have;
informat Coverage_Start Termination_Date date9.;
format   Coverage_Start Termination_Date date11.;
input Coverage_Start Termination_Date Member_Id $;
datalines;
24-Jul-19 1-Jun-21 42968701
24-Jul-19 1-Mar-21 42968701
29-Feb-20 1-Mar-20 42968701
16-Feb-19 1-Mar-19 42968701
1-Mar-17  1-Mar-18 42968701
1-Mar-16  1-Mar-17 42968701
1-Dec-15 31-Dec-16 42968701
;
run;

Koen

andreas_lds
Jade | Level 19

I have merged your post, please don't post the same issue multiple times, this will only cause confusion and you won't get help any faster.

Kurt_Bremser
Super User

Tweaked code:

data want;
set have;
by member_id;
set
  have (
    firstobs=2
    keep=member_id coverage_start
    rename=(member_id=_mem coverage_start=_start)
  )
  have (
    obs=1
    keep=member_id coverage_start
    rename=(member_id=_mem coverage_start=_start)
  )
;
retain start term;
if first.member_id
then do;
  start = coverage_start;
  term = termination_date;
end;
term = max(termination_date,term);
put _all_;
if last.member_id or _start gt term
then do;
  coverage_start = start;
  termination_date = term;
  output;
  start = _start;
  term = '01jan1900'd;
end;
format start term yymmdd10.;
keep member_id coverage_start termination_date;
run;
Ksharp
Super User
data have;
input Coverage_Start : date11. Termination_Date : date11. Member_Id;
format Coverage_Start  Termination_Date  date11. ;
cards;
24-Jul-19 1-Jun-21 42968701
24-Jul-19 1-Mar-21 42968701
29-Feb-20 1-Mar-20 42968701
16-Feb-19 1-Mar-19 42968701
1-Mar-17 1-Mar-18 42968701
1-Mar-16 1-Mar-17 42968701
1-Dec-15 31-Dec-16 42968701
;

data temp;
 set have;
 do date=Coverage_Start to Termination_Date ;
  output;
 end;
keep date Member_Id;
run;
proc sort data=temp nodupkey;by Member_Id date;run;
data temp ;
 set temp;
 by Member_Id;
 if first.Member_Id or dif(date) ne 1 then group+1;
run;
proc sql;
create table want as
select group,max(Member_Id) as Member_Id,
 min(date) as Coverage_Start format=date11.,
 max(date) as Termination_Date  format=date11.
 from temp
  group by group;
quit;
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 16 replies
  • 3887 views
  • 4 likes
  • 6 in conversation