Folks,
I have a dataset with 1.7million rows of data which looks as follows;
Start_Date | End_Date | ID |
01/07/2016 | 31/12/2016 | 00241J |
01/01/2017 | 30/06/2017 | 00241J |
01/07/2017 | 31/12/2017 | 00241J |
01/01/2018 | 30/06/2018 | 00241J |
01/07/2018 | 31/12/2018 | 00241J |
01/01/2019 | 30/06/2019 | 00241J |
01/07/2019 | 31/12/2019 | 00241J |
01/01/2020 | 30/06/2020 | 00241J |
01/07/2020 | 31/12/2020 | 00241J |
01/01/2021 | 30/06/2021 | 00241J |
01/07/2016 | 31/08/2016 | 0038R |
01/07/2016 | 31/08/2016 | 089Q |
01/11/2016 | 31/12/2016 | 089Q |
01/05/2016 | 31/08/2016 | 0099I |
01/09/2016 | 31/12/2016 | 0099I |
01/01/2017 | 30/04/2017 | 0099I |
01/05/2017 | 31/08/2017 | 0099I |
My ultimate goal is to be able to derive counts by for each year month of the number of unique IDs active in a month. Therefore, I think I need to create new variables (columns or rows) to identify what months an ID was active. So for example row 1 has dates from 1/7/16 to the 31/12/16, so I need to create something to say this ID was active in July, August, September all the way to December 16 but it also has to be done in groups.
Does anyone have any advice of the best way to do this?
Would it be easier to try and use the dates to output IDs into monthly datasets?
I advice is welcome.
All the best,
Sean
First, expand the observations over months, then use PROC FREQ:
data ym;
set have;
format ym yymmd7.;
ym = intnx("month",start_date,0,"b");
do until (ym > end_date);
output;
ym = intnx("month",ym,1,"b");
end;
keep id ym;
run;
proc freq data=ym order=formatted noprint;
tables ym / out=want (drop=percent);
run;
The problem could be solved by using a hash object, too:
data _null_;
set have end=jobDone;
length date count 8;
format date yymmd7.;
if _n_=1 then
do;
declare hash h(ordered: 'yes');
h.defineKey('date');
h.defineData('date', 'count');
h.defineDone();
end;
end_date=intnx('month', end_date, 0, 'b');
date=start_date;
do until(date > end_date);
if h.find() ^=0 then
do;
count=1;
h.add();
end;
else
do;
count=count + 1;
h.replace();
end;
date=intnx('month', date, 1, 'b');
end;
if jobDone then
do;
h.output(dataset: 'work.want');
end;
run;
data have;
infile cards expandtabs;
input Start_Date :ddmmyy12. End_Date :ddmmyy12. ID $;
format Start_Date End_Date ddmmyy10.;
cards;
01/07/2016 31/12/2016 00241J
01/01/2017 30/06/2017 00241J
01/07/2017 31/12/2017 00241J
01/01/2018 30/06/2018 00241J
01/07/2018 31/12/2018 00241J
01/01/2019 30/06/2019 00241J
01/07/2019 31/12/2019 00241J
01/01/2020 30/06/2020 00241J
01/07/2020 31/12/2020 00241J
01/01/2021 30/06/2021 00241J
01/07/2016 31/08/2016 0038R
01/07/2016 31/08/2016 089Q
01/11/2016 31/12/2016 089Q
01/05/2016 31/08/2016 0099I
01/09/2016 31/12/2016 0099I
01/01/2017 30/04/2017 0099I
01/05/2017 31/08/2017 0099I
;
data temp;
set have;
do date=intnx('month',Start_Date,0) to End_Date;
month=month(date);
if lag_month ne month then output;
lag_month=month;
end;
keep id date;
format date ddmmyy10.;
run;
proc sql;
create table want as
select date,count(distinct id) as count
from temp
group by date;
quit;
What do you mean it has to be done in groups?
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.