I have a dataset called record with patients who activated an online account.
I have a column of dates as follows and I want to count the frequencies that each date occurs such that I can graph the number of activations (as a line over time) that occur every 2 weeks.
How should I go about doing this?
So for example, in Excel, it would look like this:
Okay, thanks. So I used the data step and created the halfmonth variable. Then I created a new table that counts the number of times each date (frequency) appeared in the halfmonth column. Then I created a line chart in proc sgplot.
data record_2; set record; halfmonth=intnx('semimonth',datepart(activation_date),0,'b'); format halfmonth MMDDYY10.; run; /* count the number of time a certain date appears in the halfmonth column */ proc sql; create table record_3 as select halfmonth,count(*) as count FROM record_2 group by halfmonth; quit; proc sgplot data=record_3; vline halfmonth / response=count; yaxis grid; run;
Obs Activation_Date halfmonth count12345678910111213
Obs halfmonth count1234
In this examples, the periods and the counts are show.
data test (keep=bdate Activation_date period); length period $ 21; informat Activation_Date anydtdtm30. ; format Activation_date datetime23.0 bdate date9.; input Activation_Date; bdate=intnx('semimonth',datepart(activation_date),0,'b'); period_b=put(intnx('semimonth',datepart(activation_date),0,'b'),date9.); period_e=put(intnx('semimonth',datepart(activation_date),0,'e'),date9.); period=period_b||" - "||period_e; datalines; 2020/10/29:08:15:09 AM 2020/10/23:12:31:33 PM 2020/10/04:02:45:36 PM 2020/10/23:12:46:17 PM 2020/10/04:02:24:47 PM 2020/10/23:01:41:20 PM 2020/10/04:08:20:47 AM 2020/10/23:11:24:18 AM 2020/09/13:10:11:59 AM 2020/09/04:09:15:59 AM 2020/09/23:07:07:17 AM 2020/09/04:02:28:07 PM 2020/10/23:12:19:19 PM ; run; proc sql; create table test2 (keep=period count) as select distinct period, count(period) as count from test group by period order by bdate; quit; proc sgplot data=test2; series x=period y=count /markers ; run;
The plot will have period on the x-axis and count on the x-axis
The output will be like this
The plot will be like this
By making the horizontal axis of your plot be a variable that is a text string, if a half-month has zero count in the original data set, then there is no spot on the horizontal axis for that half-month (your tick marks could be non-consecutive half-months). If you use the numeric variable BDATE as the horizontal axis variable, then all half-months between the earliest and latest will appear on the horizontal axis. In either case, you might want to handle the situation where a half-month has zero observations and arrange the plot so that a zero will appear for that half-month.
In general, for almost all problems, I recommend that if a variable represents a date, it should be numeric and not character.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.