I have a dataset called record with patients who activated an online account.
I have a column of dates as follows and I want to count the frequencies that each date occurs such that I can graph the number of activations (as a line over time) that occur every 2 weeks.
How should I go about doing this?
Activation_Date |
2020/10/29:08:15:09 AM |
2020/10/23:12:31:33 PM |
2020/10/04:02:45:36 PM |
2020/10/23:12:46:17 PM |
2020/10/04:02:24:47 PM |
2020/10/23:01:41:20 PM |
2020/10/04:08:20:47 AM |
2020/10/23:11:24:18 AM |
2020/09/13:10:11:59 AM |
2020/09/04:09:15:59 AM |
2020/09/23:07:07:17 AM |
2020/09/04:02:28:07 PM |
2020/10/23:12:19:19 PM |
So for example, in Excel, it would look like this:
You can turn your date time values into the half months via this data step command. From there, you can do whatever counting and plotting you want.
halfmonth=intnx('semimonth',datepart(activation_date),0,'b');
Okay, thanks. So I used the data step and created the halfmonth variable. Then I created a new table that counts the number of times each date (frequency) appeared in the halfmonth column. Then I created a line chart in proc sgplot.
data record_2;
set record;
halfmonth=intnx('semimonth',datepart(activation_date),0,'b');
format halfmonth MMDDYY10.;
run;
/* count the number of time a certain date appears in the halfmonth column */
proc sql;
create table record_3 as
select halfmonth,count(*) as count
FROM record_2
group by halfmonth;
quit;
proc sgplot data=record_3;
vline halfmonth / response=count;
yaxis grid;
run;
Obs Activation_Date halfmonth count12345678910111213
04SEP20:09:15:59 | 09/01/2020 | 3 |
04SEP20:14:28:07 | 09/01/2020 | 3 |
13SEP20:10:11:59 | 09/01/2020 | 3 |
23SEP20:07:07:17 | 09/16/2020 | 1 |
04OCT20:08:20:47 | 10/01/2020 | 3 |
04OCT20:14:24:47 | 10/01/2020 | 3 |
04OCT20:14:45:36 | 10/01/2020 | 3 |
23OCT20:13:41:20 | 10/16/2020 | 6 |
23OCT20:12:19:19 | 10/16/2020 | 6 |
29OCT20:08:15:09 | 10/16/2020 | 6 |
23OCT20:12:31:33 | 10/16/2020 | 6 |
23OCT20:12:46:17 | 10/16/2020 | 6 |
23OCT20:11:24:18 | 10/16/2020 | 6 |
Obs halfmonth count1234
09/01/2020 | 3 |
09/16/2020 | 1 |
10/01/2020 | 3 |
10/16/2020 | 6 |
In this examples, the periods and the counts are show.
data test (keep=bdate Activation_date period);
length period $ 21;
informat Activation_Date anydtdtm30. ;
format Activation_date datetime23.0 bdate date9.;
input Activation_Date;
bdate=intnx('semimonth',datepart(activation_date),0,'b');
period_b=put(intnx('semimonth',datepart(activation_date),0,'b'),date9.);
period_e=put(intnx('semimonth',datepart(activation_date),0,'e'),date9.);
period=period_b||" - "||period_e;
datalines;
2020/10/29:08:15:09 AM
2020/10/23:12:31:33 PM
2020/10/04:02:45:36 PM
2020/10/23:12:46:17 PM
2020/10/04:02:24:47 PM
2020/10/23:01:41:20 PM
2020/10/04:08:20:47 AM
2020/10/23:11:24:18 AM
2020/09/13:10:11:59 AM
2020/09/04:09:15:59 AM
2020/09/23:07:07:17 AM
2020/09/04:02:28:07 PM
2020/10/23:12:19:19 PM
;
run;
proc sql;
create table test2 (keep=period count) as
select distinct period, count(period) as count
from test
group by period
order by bdate;
quit;
proc sgplot data=test2;
series x=period y=count /markers ;
run;
The plot will have period on the x-axis and count on the x-axis
The output will be like this
The plot will be like this
By making the horizontal axis of your plot be a variable that is a text string, if a half-month has zero count in the original data set, then there is no spot on the horizontal axis for that half-month (your tick marks could be non-consecutive half-months). If you use the numeric variable BDATE as the horizontal axis variable, then all half-months between the earliest and latest will appear on the horizontal axis. In either case, you might want to handle the situation where a half-month has zero observations and arrange the plot so that a zero will appear for that half-month.
In general, for almost all problems, I recommend that if a variable represents a date, it should be numeric and not character.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.