Reducing database to minimum entries

rahul11124 · Posted 12-23-2021 12:53 AM

I have a dataset i.e. -

Coverage_Start Termination_Date Member_Id
24-Jul-19 1-Jun-21 42968701
24-Jul-19 1-Mar-21 42968701
29-Feb-20 1-Mar-20 42968701
16-Feb-19 1-Mar-19 42968701
1-Mar-17 1-Mar-18 42968701
1-Mar-16 1-Mar-17 42968701
1-Dec-15 31-Dec-16 42968701

I want to reduce this dataset, suppose in last three rows minimum coverage_start- 1-Dec-15 and maximum termination_date- 1-Mar-18, so I want to combine all three bottom rows because it has continuous coverage. As result the bottom three rows will be reduced to "1-Dec-15 1-Mar-18 42968701".

Reduced Dataset should be like -

Coverage_Start Termination_Date Member_Id
24-Jun-19 1-Jun-21 42968701
16-Feb-19 1-Mar-19 42968701
1-Dec-15 1-Mar-18 42968701c

Can anyone please help me with this? I'm trying this since a very log time but couldn't achieve it.

rahul11124 · Posted 12-22-2021 08:51 AM

I have a dataset i.e. -

Coverage_Start Termination_Date Member_Id
24-Jul-19 1-Jun-21 42968701
24-Jul-19 1-Mar-21 42968701
29-Feb-20 1-Mar-20 42968701
16-Feb-19 1-Mar-19 42968701
1-Mar-17 1-Mar-18 42968701
1-Mar-16 1-Mar-17 42968701
1-Dec-15 31-Dec-16 42968701

I want to reduce this dataset, suppose in last three rows minimum coverage_start- 1-Dec-15 and maximum termination_date- 1-Mar-18, so I want to combine all three bottom rows because it has continuous coverage. As result the bottom three rows will be reduced to "1-Dec-15 1-Mar-18 42968701".

Reduced Dataset should be like -

Coverage_Start Termination_Date Member_Id
24-Jun-19 1-Jun-21 42968701
16-Feb-19 1-Mar-19 42968701
1-Dec-15 1-Mar-18 42968701c

Can anyone please help me with this? I'm trying this since a very log time but couldn't achieve it.

Ksharp · Posted 12-23-2021 06:57 AM

data have;
input Coverage_Start : date11. Termination_Date : date11. Member_Id;
format Coverage_Start  Termination_Date  date11. ;
cards;
24-Jul-19 1-Jun-21 42968701
24-Jul-19 1-Mar-21 42968701
29-Feb-20 1-Mar-20 42968701
16-Feb-19 1-Mar-19 42968701
1-Mar-17 1-Mar-18 42968701
1-Mar-16 1-Mar-17 42968701
1-Dec-15 31-Dec-16 42968701
;

data temp;
 set have;
 do date=Coverage_Start to Termination_Date ;
  output;
 end;
keep date Member_Id;
run;
proc sort data=temp nodupkey;by Member_Id date;run;
data temp ;
 set temp;
 by Member_Id;
 if first.Member_Id or dif(date) ne 1 then group+1;
run;
proc sql;
create table want as
select group,max(Member_Id) as Member_Id,
 min(date) as Coverage_Start format=date11.,
 max(date) as Termination_Date  format=date11.
 from temp
  group by group;
quit;

Kurt_Bremser · Posted 12-23-2021 09:16 AM

I merged your questions. Please do not double-post.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

PeterClemmensen · Posted 12-23-2021 01:43 AM

Three questions:

What is your grouping variable here? Member_Id? If so, how do we determine what obs to group in the data below? It has the same member_id for all obs.
How does Coverage_Start = 24-Jun-19 appear in the first obs of your desired result?
How does Member_ID = 42968701c appear in the last obs of your desired result?

The DATA to DATA Step Macro
Blog: SASnrd

rahul11124 · Posted 12-23-2021 03:11 AM

1. Yes, the grouping vairable is member_id.

2. For first two observations, since there is no break between them so took the minimum of coverage_start and maximum of termination_date in the desired output.

3. Member_id in last obs should be 42968701. Sorry for the typo.

andreas_lds · Posted 12-23-2021 03:43 AM

Some questions:

Do you have sas-dates or strings? If you have strings the first step is fixing this along with a format that shows all four digits of the year.
Why are you combining the last three rows? How are groups identified?

rahul11124 · Posted 12-23-2021 05:00 AM

1. No. Dates are in date9. format.

2. I have a dataset having millions of row I want reduce the number rows against each member_id.

The above obs are sample of rows against a member.

Kurt_Bremser · Posted 12-23-2021 05:57 AM

See this:

data have;
input (Coverage_Start Termination_Date) (:date9.) Member_Id $;
format coverage_start termination_date yymmdd10.;
datalines;
24-Jul-19 1-Jun-21 42968701
24-Jul-19 1-Mar-21 42968701
29-Feb-20 1-Mar-20 42968701
16-Feb-19 1-Mar-19 42968701
1-Mar-17 1-Mar-18 42968701
1-Mar-16 1-Mar-17 42968701
1-Dec-15 31-Dec-16 42968701
;

proc sort data=have;
by member_id coverage_start termination_date;
run;

data want;
set have;
by member_id;
set
  have (
    firstobs=2
    keep=member_id coverage_start
    rename=(member_id=_mem coverage_start=_start)
  )
  have (
    obs=1
    keep=member_id coverage_start
    rename=(member_id=_mem coverage_start=_start)
  )
;
retain start;
if first.member_id then start = coverage_start;
if last.member_id or _start gt termination_date
then do;
  coverage_start = start;
  output;
  start = _start;
end;
else put _all_;
format start yymmdd10.;
keep member_id coverage_start termination_date;
run;

The second SET with FIRSTOBS=2 creates a "look-ahead"; reading have again with OBS=1 is necessary to prevent a premature termination of the data step when OBS - 1 observations are read.

Please note how I presented source data in a data step with datalines; do so yourself in the future to prevent all questions about data type, formats, raw content etc.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

rahul11124 · Posted 12-23-2021 08:33 AM

Not getting the correct output in the last row.
Getting -
Coverage_Start Termination_Date Member_Id
24JUL2019 01MAR2020 W22442968701

Want -
Coverage_Start Termination_Date Member_Id
24JUL2019 01Jun2021 W22442968701

rahul11124 · Posted 12-22-2021 08:43 AM

I have a dataset i.e. -

Coverage_Start Termination_Date Member_Id
24-Jul-19 1-Jun-21 42968701
24-Jul-19 1-Mar-21 42968701
29-Feb-20 1-Mar-20 42968701
16-Feb-19 1-Mar-19 42968701
1-Mar-17 1-Mar-18 42968701
1-Mar-16 1-Mar-17 42968701
1-Dec-15 31-Dec-16 42968701

I want to reduce this dataset, suppose in last three rows minimum coverage_start- 1-Dec-15 and maximum termination_date- 1-Mar-18, so I want to combine all three bottom rows because it has continuous coverage.
As result the bottom three rows will be reduced to "1-Dec-15 1-Mar-18 42968701".

Reduced Dataset should be like -

Coverage_Start Termination_Date Member_Id
24-Jun-19 1-Jun-21 42968701
16-Feb-19 1-Mar-19 42968701
1-Dec-15 1-Mar-18 42968701

I want to achieve this task using SAS programming.
Can anyone please help me with this? I'm trying this since a very log time but couldn't achieve it.

sbxkoenk · Posted 12-24-2021 10:40 AM

Hello,

> I want to achieve this task using SAS programming.

I have moved your question to the "SAS programming" board.

A question like this is answered in the programming board in an instant.
I'll have a look at it on Monday (but you'll most probably have an answer by then).

Also (this is a suggestion for your next post)
, always provide us with your example data by means of a data step with datalines, like here :

data work.have;
informat Coverage_Start Termination_Date date9.;
format   Coverage_Start Termination_Date date11.;
input Coverage_Start Termination_Date Member_Id $;
datalines;
24-Jul-19 1-Jun-21 42968701
24-Jul-19 1-Mar-21 42968701
29-Feb-20 1-Mar-20 42968701
16-Feb-19 1-Mar-19 42968701
1-Mar-17  1-Mar-18 42968701
1-Mar-16  1-Mar-17 42968701
1-Dec-15 31-Dec-16 42968701
;
run;

Koen

andreas_lds · Posted 12-27-2021 03:22 AM

I have merged your post, please don't post the same issue multiple times, this will only cause confusion and you won't get help any faster.

Kurt_Bremser · Posted 12-23-2021 09:08 AM

Tweaked code:

data want;
set have;
by member_id;
set
  have (
    firstobs=2
    keep=member_id coverage_start
    rename=(member_id=_mem coverage_start=_start)
  )
  have (
    obs=1
    keep=member_id coverage_start
    rename=(member_id=_mem coverage_start=_start)
  )
;
retain start term;
if first.member_id
then do;
  start = coverage_start;
  term = termination_date;
end;
term = max(termination_date,term);
put _all_;
if last.member_id or _start gt term
then do;
  coverage_start = start;
  termination_date = term;
  output;
  start = _start;
  term = '01jan1900'd;
end;
format start term yymmdd10.;
keep member_id coverage_start termination_date;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Ksharp · Posted 12-23-2021 07:23 AM

data have;
input Coverage_Start : date11. Termination_Date : date11. Member_Id;
format Coverage_Start  Termination_Date  date11. ;
cards;
24-Jul-19 1-Jun-21 42968701
24-Jul-19 1-Mar-21 42968701
29-Feb-20 1-Mar-20 42968701
16-Feb-19 1-Mar-19 42968701
1-Mar-17 1-Mar-18 42968701
1-Mar-16 1-Mar-17 42968701
1-Dec-15 31-Dec-16 42968701
;

data temp;
 set have;
 do date=Coverage_Start to Termination_Date ;
  output;
 end;
keep date Member_Id;
run;
proc sort data=temp nodupkey;by Member_Id date;run;
data temp ;
 set temp;
 by Member_Id;
 if first.Member_Id or dif(date) ne 1 then group+1;
run;
proc sql;
create table want as
select group,max(Member_Id) as Member_Id,
 min(date) as Coverage_Start format=date11.,
 max(date) as Termination_Date  format=date11.
 from temp
  group by group;
quit;

Reducing Dataset to minimum records

Reducing database to minimum entries

Re: Reducing database to minimum entries

Re: Reducing database to minimum entries

Re: Reducing Dataset to minimum records

Re: Reducing Dataset to minimum records

Re: Reducing Dataset to minimum records

Re: Reducing Dataset to minimum records

Re: Reducing Dataset to minimum records

Re: Reducing Dataset to minimum records

Reducing database to minimum entries

Re: Reducing database to minimum entries

Re: Reducing database to minimum entries

Re: Reducing Dataset to minimum records

Re: Reducing Dataset to minimum records

Registration is open

Registration is open

SAS Training: Just a Click Away