BookmarkSubscribeRSS Feed
Sean_OConnor
Obsidian | Level 7

Folks,

I have a dataset with 1.7million rows of data which looks as follows;

Start_DateEnd_DateID
01/07/201631/12/201600241J
01/01/201730/06/201700241J
01/07/201731/12/201700241J
01/01/201830/06/201800241J
01/07/201831/12/201800241J
01/01/201930/06/201900241J
01/07/201931/12/201900241J
01/01/202030/06/202000241J
01/07/202031/12/202000241J
01/01/202130/06/202100241J
01/07/201631/08/20160038R
01/07/201631/08/2016089Q
01/11/201631/12/2016089Q
01/05/201631/08/20160099I
01/09/201631/12/20160099I
01/01/201730/04/20170099I
01/05/201731/08/20170099I

 

My ultimate goal is to be able to derive counts by for each year month of the number of unique IDs active in a month. Therefore, I think I need to create new variables (columns or rows) to identify what months an ID was active. So for example row 1 has dates from 1/7/16 to the 31/12/16, so I need to create something to say this ID was active in July, August, September all the way to December 16 but it also has to be done in groups. 

 

Does anyone have any advice of the best way to do this? 

 

Would it be easier to try and use the dates to output IDs into monthly datasets?

 

I advice is welcome.

 

All the best,

 

Sean 

 

 

4 REPLIES 4
Kurt_Bremser
Super User

First, expand the observations over months, then use PROC FREQ:

data ym;
set have;
format ym yymmd7.;
ym = intnx("month",start_date,0,"b");
do until (ym > end_date);
  output;
  ym = intnx("month",ym,1,"b");
end;
keep id ym;
run;

proc freq data=ym order=formatted noprint;
tables ym / out=want (drop=percent);
run;
andreas_lds
Jade | Level 19

The problem could be solved by using a hash object, too:

 

data _null_;
   set have end=jobDone;
   length date count 8;
   format date yymmd7.;

   if _n_=1 then
      do;
         declare hash h(ordered: 'yes');
         h.defineKey('date');
         h.defineData('date', 'count');
         h.defineDone();
      end;
	  
   end_date=intnx('month', end_date, 0, 'b');
   date=start_date;

   do until(date > end_date);

      if h.find() ^=0 then
         do;
            count=1;
            h.add();
         end;
      else
         do;
            count=count + 1;
            h.replace();
         end;
      date=intnx('month', date, 1, 'b');
   end;

   if jobDone then
      do;
         h.output(dataset: 'work.want');
      end;
run;
Ksharp
Super User
data have;
infile cards expandtabs;
input Start_Date :ddmmyy12. End_Date :ddmmyy12. ID $;
format Start_Date End_Date ddmmyy10.;
cards;
01/07/2016 31/12/2016 00241J
01/01/2017 30/06/2017 00241J
01/07/2017 31/12/2017 00241J
01/01/2018 30/06/2018 00241J
01/07/2018 31/12/2018 00241J
01/01/2019 30/06/2019 00241J
01/07/2019 31/12/2019 00241J
01/01/2020 30/06/2020 00241J
01/07/2020 31/12/2020 00241J
01/01/2021 30/06/2021 00241J
01/07/2016 31/08/2016 0038R
01/07/2016 31/08/2016 089Q
01/11/2016 31/12/2016 089Q
01/05/2016 31/08/2016 0099I
01/09/2016 31/12/2016 0099I
01/01/2017 30/04/2017 0099I
01/05/2017 31/08/2017 0099I
;

data temp;
 set have;
 do date=intnx('month',Start_Date,0) to End_Date;
  month=month(date);
  if lag_month ne month then output;
  lag_month=month;
 end;
 keep id date;
 format date ddmmyy10.;
run;

proc sql;
create table want as
select date,count(distinct id) as count
 from temp
  group by date;
quit;
Reeza
Super User

What do you mean it has to be done in groups?

 

 

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

Creating Custom Steps in SAS Studio

Check out this tutorial series to learn how to build your own steps in SAS Studio.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1823 views
  • 1 like
  • 5 in conversation