Re: Create a count of people active across months based on start and e...

Sean_OConnor · Posted 06-30-2022 04:06 AM

Folks,

I have a dataset with 1.7million rows of data which looks as follows;

Start_Date	End_Date	ID
01/07/2016	31/12/2016	00241J
01/01/2017	30/06/2017	00241J
01/07/2017	31/12/2017	00241J
01/01/2018	30/06/2018	00241J
01/07/2018	31/12/2018	00241J
01/01/2019	30/06/2019	00241J
01/07/2019	31/12/2019	00241J
01/01/2020	30/06/2020	00241J
01/07/2020	31/12/2020	00241J
01/01/2021	30/06/2021	00241J
01/07/2016	31/08/2016	0038R
01/07/2016	31/08/2016	089Q
01/11/2016	31/12/2016	089Q
01/05/2016	31/08/2016	0099I
01/09/2016	31/12/2016	0099I
01/01/2017	30/04/2017	0099I
01/05/2017	31/08/2017	0099I

My ultimate goal is to be able to derive counts by for each year month of the number of unique IDs active in a month. Therefore, I think I need to create new variables (columns or rows) to identify what months an ID was active. So for example row 1 has dates from 1/7/16 to the 31/12/16, so I need to create something to say this ID was active in July, August, September all the way to December 16 but it also has to be done in groups.

Does anyone have any advice of the best way to do this?

Would it be easier to try and use the dates to output IDs into monthly datasets?

I advice is welcome.

All the best,

Sean

Kurt_Bremser · Posted 06-30-2022 05:11 AM

First, expand the observations over months, then use PROC FREQ:

data ym;
set have;
format ym yymmd7.;
ym = intnx("month",start_date,0,"b");
do until (ym > end_date);
  output;
  ym = intnx("month",ym,1,"b");
end;
keep id ym;
run;

proc freq data=ym order=formatted noprint;
tables ym / out=want (drop=percent);
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

andreas_lds · Posted 06-30-2022 07:10 AM

The problem could be solved by using a hash object, too:

data _null_;
   set have end=jobDone;
   length date count 8;
   format date yymmd7.;

   if _n_=1 then
      do;
         declare hash h(ordered: 'yes');
         h.defineKey('date');
         h.defineData('date', 'count');
         h.defineDone();
      end;
	  
   end_date=intnx('month', end_date, 0, 'b');
   date=start_date;

   do until(date > end_date);

      if h.find() ^=0 then
         do;
            count=1;
            h.add();
         end;
      else
         do;
            count=count + 1;
            h.replace();
         end;
      date=intnx('month', date, 1, 'b');
   end;

   if jobDone then
      do;
         h.output(dataset: 'work.want');
      end;
run;

Ksharp · Posted 06-30-2022 08:21 AM

data have;
infile cards expandtabs;
input Start_Date :ddmmyy12. End_Date :ddmmyy12. ID $;
format Start_Date End_Date ddmmyy10.;
cards;
01/07/2016 31/12/2016 00241J
01/01/2017 30/06/2017 00241J
01/07/2017 31/12/2017 00241J
01/01/2018 30/06/2018 00241J
01/07/2018 31/12/2018 00241J
01/01/2019 30/06/2019 00241J
01/07/2019 31/12/2019 00241J
01/01/2020 30/06/2020 00241J
01/07/2020 31/12/2020 00241J
01/01/2021 30/06/2021 00241J
01/07/2016 31/08/2016 0038R
01/07/2016 31/08/2016 089Q
01/11/2016 31/12/2016 089Q
01/05/2016 31/08/2016 0099I
01/09/2016 31/12/2016 0099I
01/01/2017 30/04/2017 0099I
01/05/2017 31/08/2017 0099I
;

data temp;
 set have;
 do date=intnx('month',Start_Date,0) to End_Date;
  month=month(date);
  if lag_month ne month then output;
  lag_month=month;
 end;
 keep id date;
 format date ddmmyy10.;
run;

proc sql;
create table want as
select date,count(distinct id) as count
 from temp
  group by date;
quit;

Reeza · Posted 06-30-2022 10:28 AM

What do you mean it has to be done in groups?

Create a count of people active across months based on start and end date

Re: Create a count of people active across months based on start and end date

Re: Create a count of people active across months based on start and end date

Re: Create a count of people active across months based on start and end date

Re: Create a count of people active across months based on start and end date

Catch up on SAS Innovate 2026

Create a count of people active across months based on start and end date

Re: Create a count of people active across months based on start and end date

Re: Create a count of people active across months based on start and end date

Re: Create a count of people active across months based on start and end date

Re: Create a count of people active across months based on start and end date

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away