BookmarkSubscribeRSS Feed
rahul11124
Fluorite | Level 6

I have a dataset i.e. -

 

Coverage_Start Termination_Date Member_Id
24-Jul-19 1-Jun-21 42968701
24-Jul-19 1-Mar-21 42968701
29-Feb-20 1-Mar-20 42968701
16-Feb-19 1-Mar-19 42968701
1-Mar-17 1-Mar-18 42968701
1-Mar-16 1-Mar-17 42968701
1-Dec-15 31-Dec-16 42968701

 

I want to reduce this dataset, suppose in last three rows minimum coverage_start- 1-Dec-15 and maximum termination_date- 1-Mar-18, so I want to combine all three bottom rows because it has continuous coverage. As result the bottom three rows will be reduced to "1-Dec-15 1-Mar-18 42968701".

 

Reduced Dataset should be like -

 

Coverage_Start Termination_Date Member_Id
24-Jun-19 1-Jun-21 42968701
16-Feb-19 1-Mar-19 42968701
1-Dec-15 1-Mar-18 42968701c

 

Can anyone please help me with this? I'm trying this since a very log time but couldn't achieve it.

16 REPLIES 16
rahul11124
Fluorite | Level 6

I have a dataset i.e. -

 

Coverage_Start Termination_Date Member_Id
24-Jul-19 1-Jun-21 42968701
24-Jul-19 1-Mar-21 42968701
29-Feb-20 1-Mar-20 42968701
16-Feb-19 1-Mar-19 42968701
1-Mar-17 1-Mar-18 42968701
1-Mar-16 1-Mar-17 42968701
1-Dec-15 31-Dec-16 42968701

 

I want to reduce this dataset, suppose in last three rows minimum coverage_start- 1-Dec-15 and maximum termination_date- 1-Mar-18, so I want to combine all three bottom rows because it has continuous coverage. As result the bottom three rows will be reduced to "1-Dec-15 1-Mar-18 42968701".

 

Reduced Dataset should be like -

 

Coverage_Start Termination_Date Member_Id
24-Jun-19 1-Jun-21 42968701
16-Feb-19 1-Mar-19 42968701
1-Dec-15 1-Mar-18 42968701c

 

Can anyone please help me with this? I'm trying this since a very log time but couldn't achieve it.

Ksharp
Super User
data have;
input Coverage_Start : date11. Termination_Date : date11. Member_Id;
format Coverage_Start  Termination_Date  date11. ;
cards;
24-Jul-19 1-Jun-21 42968701
24-Jul-19 1-Mar-21 42968701
29-Feb-20 1-Mar-20 42968701
16-Feb-19 1-Mar-19 42968701
1-Mar-17 1-Mar-18 42968701
1-Mar-16 1-Mar-17 42968701
1-Dec-15 31-Dec-16 42968701
;

data temp;
 set have;
 do date=Coverage_Start to Termination_Date ;
  output;
 end;
keep date Member_Id;
run;
proc sort data=temp nodupkey;by Member_Id date;run;
data temp ;
 set temp;
 by Member_Id;
 if first.Member_Id or dif(date) ne 1 then group+1;
run;
proc sql;
create table want as
select group,max(Member_Id) as Member_Id,
 min(date) as Coverage_Start format=date11.,
 max(date) as Termination_Date  format=date11.
 from temp
  group by group;
quit;
PeterClemmensen
Tourmaline | Level 20

Three questions:

 

  1. What is your grouping variable here? Member_Id? If so, how do we determine what obs to group in the data below? It has the same member_id for all obs.
  2. How does Coverage_Start  = 24-Jun-19 appear in the first obs of your desired result?
  3. How does Member_ID = 42968701c appear in the last obs of your desired result?
rahul11124
Fluorite | Level 6

1. Yes, the grouping vairable is member_id.

 

2. For first two observations, since there is no break between them so took the minimum of coverage_start and maximum of termination_date in the desired output.

 

3. Member_id in last obs should be 42968701. Sorry for the typo.

andreas_lds
Jade | Level 19

Some questions:

  1.  Do you have sas-dates or strings? If you have strings the first step is fixing this along with a format that shows all four digits of the year.
  2. Why are you combining the last three rows? How are groups identified?
rahul11124
Fluorite | Level 6

1. No. Dates are in date9. format. 

2. I have a dataset having millions of row I want reduce the number rows against each member_id.

    The above obs are sample of rows against a member. 

Kurt_Bremser
Super User

See this:

data have;
input (Coverage_Start Termination_Date) (:date9.) Member_Id $;
format coverage_start termination_date yymmdd10.;
datalines;
24-Jul-19 1-Jun-21 42968701
24-Jul-19 1-Mar-21 42968701
29-Feb-20 1-Mar-20 42968701
16-Feb-19 1-Mar-19 42968701
1-Mar-17 1-Mar-18 42968701
1-Mar-16 1-Mar-17 42968701
1-Dec-15 31-Dec-16 42968701
;

proc sort data=have;
by member_id coverage_start termination_date;
run;

data want;
set have;
by member_id;
set
  have (
    firstobs=2
    keep=member_id coverage_start
    rename=(member_id=_mem coverage_start=_start)
  )
  have (
    obs=1
    keep=member_id coverage_start
    rename=(member_id=_mem coverage_start=_start)
  )
;
retain start;
if first.member_id then start = coverage_start;
if last.member_id or _start gt termination_date
then do;
  coverage_start = start;
  output;
  start = _start;
end;
else put _all_;
format start yymmdd10.;
keep member_id coverage_start termination_date;
run;

The second SET with FIRSTOBS=2 creates a "look-ahead"; reading have again with OBS=1 is necessary to prevent a premature termination of the data step when OBS - 1 observations are read.

Please note how I presented source data in a data step with datalines; do so yourself in the future to prevent all questions about data type, formats, raw content etc.

rahul11124
Fluorite | Level 6
Not getting the correct output in the last row.
Getting -
Coverage_Start Termination_Date Member_Id
24JUL2019 01MAR2020 W22442968701

Want -
Coverage_Start Termination_Date Member_Id
24JUL2019 01Jun2021 W22442968701
rahul11124
Fluorite | Level 6

I have a dataset i.e. -

Coverage_Start Termination_Date Member_Id
24-Jul-19 1-Jun-21 42968701
24-Jul-19 1-Mar-21 42968701
29-Feb-20 1-Mar-20 42968701
16-Feb-19 1-Mar-19 42968701
1-Mar-17 1-Mar-18 42968701
1-Mar-16 1-Mar-17 42968701
1-Dec-15 31-Dec-16 42968701

I want to reduce this dataset, suppose in last three rows minimum coverage_start- 1-Dec-15 and maximum termination_date- 1-Mar-18, so I want to combine all three bottom rows because it has continuous coverage.
As result the bottom three rows will be reduced to "1-Dec-15 1-Mar-18 42968701".

Reduced Dataset should be like -

Coverage_Start Termination_Date Member_Id
24-Jun-19 1-Jun-21 42968701
16-Feb-19 1-Mar-19 42968701
1-Dec-15 1-Mar-18 42968701

I want to achieve this task using SAS programming.
Can anyone please help me with this? I'm trying this since a very log time but couldn't achieve it.

sbxkoenk
SAS Super FREQ

Hello,

 

I want to achieve this task using SAS programming.

I have moved your question to the "SAS programming" board.

 

A question like this is answered in the programming board in an instant.
I'll have a look at it on Monday (but you'll most probably have an answer by then).

 

Also (this is a suggestion for your next post)
, always provide us with your example data by means of a data step with datalines, like here :

data work.have;
informat Coverage_Start Termination_Date date9.;
format   Coverage_Start Termination_Date date11.;
input Coverage_Start Termination_Date Member_Id $;
datalines;
24-Jul-19 1-Jun-21 42968701
24-Jul-19 1-Mar-21 42968701
29-Feb-20 1-Mar-20 42968701
16-Feb-19 1-Mar-19 42968701
1-Mar-17  1-Mar-18 42968701
1-Mar-16  1-Mar-17 42968701
1-Dec-15 31-Dec-16 42968701
;
run;

Koen

andreas_lds
Jade | Level 19

I have merged your post, please don't post the same issue multiple times, this will only cause confusion and you won't get help any faster.

Kurt_Bremser
Super User

Tweaked code:

data want;
set have;
by member_id;
set
  have (
    firstobs=2
    keep=member_id coverage_start
    rename=(member_id=_mem coverage_start=_start)
  )
  have (
    obs=1
    keep=member_id coverage_start
    rename=(member_id=_mem coverage_start=_start)
  )
;
retain start term;
if first.member_id
then do;
  start = coverage_start;
  term = termination_date;
end;
term = max(termination_date,term);
put _all_;
if last.member_id or _start gt term
then do;
  coverage_start = start;
  termination_date = term;
  output;
  start = _start;
  term = '01jan1900'd;
end;
format start term yymmdd10.;
keep member_id coverage_start termination_date;
run;
Ksharp
Super User
data have;
input Coverage_Start : date11. Termination_Date : date11. Member_Id;
format Coverage_Start  Termination_Date  date11. ;
cards;
24-Jul-19 1-Jun-21 42968701
24-Jul-19 1-Mar-21 42968701
29-Feb-20 1-Mar-20 42968701
16-Feb-19 1-Mar-19 42968701
1-Mar-17 1-Mar-18 42968701
1-Mar-16 1-Mar-17 42968701
1-Dec-15 31-Dec-16 42968701
;

data temp;
 set have;
 do date=Coverage_Start to Termination_Date ;
  output;
 end;
keep date Member_Id;
run;
proc sort data=temp nodupkey;by Member_Id date;run;
data temp ;
 set temp;
 by Member_Id;
 if first.Member_Id or dif(date) ne 1 then group+1;
run;
proc sql;
create table want as
select group,max(Member_Id) as Member_Id,
 min(date) as Coverage_Start format=date11.,
 max(date) as Termination_Date  format=date11.
 from temp
  group by group;
quit;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 16 replies
  • 3784 views
  • 4 likes
  • 6 in conversation