I am trying to find a more simplistic way to code the following data to calculate the mean average of days per week each hospital submitted a file for the following data (this is for a 6 month period). I know I can write a bunch of if then statements but that would be lines and lines of code. My questions are:
- what type of code would be best to calculate the number of days (which is the field date) for each hospital per week in order to get an average (a week would be 7 days starting from January 3rd) for the number of days a file was submitted.
- would I have to transpose this data in order to write the code? Or can a more simplistic code be written with the data in its original format (sample posted below)
HospitalName | FileDate |
Hospital 1 | 01-Jan-22 |
Hospital 1 | 02-Jan-22 |
Hospital 1 | 03-Jan-22 |
Hospital 1 | 04-Jan-22 |
Hospital 1 | 05-Jan-22 |
Hospital 1 | 06-Jan-22 |
Hospital 1 | 07-Jan-22 |
Hospital 1 | 08-Jan-22 |
Hospital 1 | 09-Jan-22 |
Hospital 1 | 10-Jan-22 |
Hospital 1 | 11-Jan-22 |
Hospital 1 | 12-Jan-22 |
Hospital 1 | 13-Jan-22 |
Hospital 2 | 03-Jan-22 |
Hospital 2 | 04-Jan-22 |
Hospital 2 | 05-Jan-22 |
Hospital 2 | 06-Jan-22 |
Hospital 2 | 07-Jan-22 |
Hospital 2 | 10-Jan-22 |
Hospital 2 | 11-Jan-22 |
Hospital 2 | 14-Jan-22 |
Hospital 2 | 17-Jan-22 |
Hospital 2 | 18-Jan-22 |
We recommend providing data in the form of a data step. First we can then test logic. Second it tells us exactly what type of values your data set has.
If this were my data I would do something like this:
proc summary data=have nway; class hospitalname filedate; format filedate weekv5. ; output out=daycount (drop=_type_); run; proc summary data=daycount nway; class hospitalname; var _freq_; output out=weekmeans (drop=_type_) mean=weeklymean; run;
If your Filedate is actually a SAS date value with a SAS date format attached the above code uses a different format that will will use the V algorithm for determining how to group values of the file date with Monday as the first day of the week. The first proc summary will count how many times each hospital occurs with a value of filedate in a week, which if there are never more than one value for any given date would be a count of days. The second the gets the mean of the days per week.
If you do not want to use the values you show of Jan 1 and 2, which would get treated as a two-day week, then add:
Where filedate ge '03Jan2022'd;
One reason I say "if it were my data" is because I will make sure that the dates are SAS date values. If not, the first thing you need to do is create a date value and then use that variable.
Which would look like (assumes the year is 2022 most likely for your examples)data want;
set have; datevar = input(filedate,date9.); format datevar date9.; ;
Note: 2-digit years are asking for a number of possibly misunderstandings depending on layout of values displayed.
We recommend providing data in the form of a data step. First we can then test logic. Second it tells us exactly what type of values your data set has.
If this were my data I would do something like this:
proc summary data=have nway; class hospitalname filedate; format filedate weekv5. ; output out=daycount (drop=_type_); run; proc summary data=daycount nway; class hospitalname; var _freq_; output out=weekmeans (drop=_type_) mean=weeklymean; run;
If your Filedate is actually a SAS date value with a SAS date format attached the above code uses a different format that will will use the V algorithm for determining how to group values of the file date with Monday as the first day of the week. The first proc summary will count how many times each hospital occurs with a value of filedate in a week, which if there are never more than one value for any given date would be a count of days. The second the gets the mean of the days per week.
If you do not want to use the values you show of Jan 1 and 2, which would get treated as a two-day week, then add:
Where filedate ge '03Jan2022'd;
One reason I say "if it were my data" is because I will make sure that the dates are SAS date values. If not, the first thing you need to do is create a date value and then use that variable.
Which would look like (assumes the year is 2022 most likely for your examples)data want;
set have; datevar = input(filedate,date9.); format datevar date9.; ;
Note: 2-digit years are asking for a number of possibly misunderstandings depending on layout of values displayed.
Here is the format of my data from proc contents;
FileDate | Num | 8 | DATE9. | DATE9. | FileDate |
HospitalName | Char | 62 | $62. | $62. | HospitalName |
The format and type of the date variable is what's important.
@carmong wrote:
Here is the format of my data from proc contents;
Alphabetic List of Variables and Attributes# Variable Type Len Format Informat Label21
FileDate Num 8 DATE9. DATE9. FileDate HospitalName Char 62 $62. $62. HospitalName
Unfortunately, this doesn't help. We can see HospitalName is character. You have not informed us about the other variable in your data set. And anyway, the request was to provide the data as working SAS data step code, this request still awaits your completion. Please do not ignore this request. We are trying to help you but you have to help us.
This code gave me exactly what I was looking for. Thank you so much for your help. Where can I find out more about the proc summary code and its functionality ?
@carmong wrote:
This code gave me exactly what I was looking for. Thank you so much for your help. Where can I find out more about the proc summary code and its functionality ?
I would recommend looking for the documentation.
And then reading it.
@carmong wrote:
This code gave me exactly what I was looking for. Thank you so much for your help. Where can I find out more about the proc summary code and its functionality ?
Proc Summary and Proc Means are basically the same procedure with the main immediate difference is that Means sends output to the results window by default and Summary expects to write to a data set. In ancient days, SAS 5 for me, the two procedures were needed one to write to a text file/printer in Means and Summary to write to disk (or tape but that was slow). The two have basically the same syntax but some differences for output layout.
The first stage counting could also have been done in Proc Freq, Proc SQL or a data step (with more work). The means could be done from the count data set with Proc Report, Tabulate, data step or even Proc Univariate. Just depends on what you want.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.