Hi there,
I have a quick question about manipulating a dataset in SAS. I have included a sample dataset below.
Date Country Weekly Cases
2021-08-09 United States 500
2021-08-10 United States 0
2021-08-11 United States 0
2021-08-12 United States 0
2021-08-13 United States 0
2021-08-14 United States 0
2021-08-15 United States 325
2021-08-16 United States 0
2021-08-17 United States 0
2021-08-18 United States 0
2021-08-19 United States 0
2021-08-20 United States 0
2021-08-21 United States 0
2021-08-10 United Kingdom 100
2021-08-11 United Kingdom 0
2021-08-12 United Kingdom 0
2021-08-13 United Kingdom 0
2021-08-14 United Kingdom 0
2021-08-15 United Kingdom 0
2021-08-16 United Kingdom 400
2021-08-17 United Kingdom 0
2021-08-18 United Kingdom 0
2021-08-19 United Kingdom 0
2021-08-20 United Kingdom 0
2021-08-21 United Kingdom 0
What I am trying to do is report the number of cases by week for each country in my dataset. However, what I noticed is that countries did not always report on the same day of the week. I was wondering if there was a way for me to manipulate the data so that I could divide the weekly count into daily counts (important to note that a weekly count might not always represent a 7 day reporting week, in my example below the first weekly count for US represent 6 days and the 2nd weekly count represents 7 days). Ideally every time a weekly count is presented for each country, I would like that number to be divided equally until the next weekly count is present for that country. I have included what I would like my data to look like instead.
Date Country Weekly Cases Daily Cases
2021-08-09 United States 500 83.3
2021-08-10 United States 0 83.3
2021-08-11 United States 0 83.3
2021-08-12 United States 0 83.3
2021-08-13 United States 0 83.3
2021-08-14 United States 0 83.3
2021-08-15 United States 325 46.4
2021-08-16 United States 0 46.4
2021-08-17 United States 0 46.4
2021-08-18 United States 0 46.4
2021-08-19 United States 0 46.4
2021-08-20 United States 0 46.4
2021-08-21 United States 0 46.4
2021-08-10 United Kingdom 100 16.7
2021-08-11 United Kingdom 0 16.7
2021-08-12 United Kingdom 0 16.7
2021-08-13 United Kingdom 0 16.7
2021-08-14 United Kingdom 0 16.7
2021-08-15 United Kingdom 0 16.7
2021-08-16 United Kingdom 400 66.7
2021-08-17 United Kingdom 0 66.7
2021-08-18 United Kingdom 0 66.7
2021-08-19 United Kingdom 0 66.7
2021-08-20 United Kingdom 0 66.7
2021-08-21 United Kingdom 0 66.7
I have a pretty large dataset (400k observations) with 230 unique countries that span a few years. Again the countries did not always report their weekly count on the same day, and the weekly counts don't always represent a 7 day count. If there is a way I can easily manipulate the dataset to do the above daily count, I would really appreciate it!
Thank you so much for any advice that you can provide!
Assuming your data has been sorted by country and date.
data have;
input Date :yymmdd10. Country & $40. WeeklyCases ;
format date yymmdd10.;
cards;
2021-08-09 United States 500
2021-08-10 United States 0
2021-08-11 United States 0
2021-08-12 United States 0
2021-08-13 United States 0
2021-08-14 United States 0
2021-08-15 United States 325
2021-08-16 United States 0
2021-08-17 United States 0
2021-08-18 United States 0
2021-08-19 United States 0
2021-08-20 United States 0
2021-08-21 United States 0
2021-08-10 United Kingdom 100
2021-08-11 United Kingdom 0
2021-08-12 United Kingdom 0
2021-08-13 United Kingdom 0
2021-08-14 United Kingdom 0
2021-08-15 United Kingdom 0
2021-08-16 United Kingdom 400
2021-08-17 United Kingdom 0
2021-08-18 United Kingdom 0
2021-08-19 United Kingdom 0
2021-08-20 United Kingdom 0
2021-08-21 United Kingdom 0
;
proc sort data=have out=temp;by country date;run;
data temp;
set temp;
by country ;
if first.country or WeeklyCases>0 then group+1;
run;
proc sql;
create table want as
select *,sum(WeeklyCases)/count(*) as DailyCases
from temp
group by group
order by country,date;
quit;
I would think that the easiest thing to do is to turn the date into a week number, and then go ahead and analyze by week number. The format weekv5. is one of many ways to turn dates into week numbers.
Since the data you provided is not in a usable form (examples and instructions), this code is just a general idea of how to do it.
proc summary data=have;
class date;
var weekly_cases;
format date weekv5.;
output out=want sum=;
run;
Hi @PaigeMiller ,
Thank you for the quick reply! I forgot to mention that I had already converted the dates to a week format. This issue is that not all countries reported on the same day of the week, so in some cases we can have two weekly counts in my week variable for a particular country--- in that case i would be overestimating for that particular week and underestimating for the prior week. I have updated the data I had previously provided so that it is a more usable form. The one presented below is what I have :
Date | Week | Country | Weekly_Cases |
2021-08-09 | 1 | United States | 500 |
2021-08-10 | 1 | United States | 0 |
2021-08-11 | 1 | United States | 0 |
2021-08-12 | 1 | United States | 0 |
2021-08-13 | 1 | United States | 0 |
2021-08-14 | 1 | United States | 0 |
2021-08-15 | 2 | United States | 325 |
2021-08-16 | 2 | United States | 0 |
2021-08-17 | 2 | United States | 0 |
2021-08-18 | 2 | United States | 0 |
2021-08-19 | 2 | United States | 0 |
2021-08-20 | 2 | United States | 0 |
2021-08-21 | 2 | United States | 0 |
2021-08-10 | 1 | United Kingdom | 100 |
2021-08-11 | 1 | United Kingdom | 0 |
2021-08-12 | 1 | United Kingdom | 0 |
2021-08-13 | 1 | United Kingdom | 0 |
2021-08-14 | 1 | United Kingdom | 0 |
2021-08-15 | 2 | United Kingdom | 0 |
2021-08-16 | 2 | United Kingdom | 400 |
2021-08-17 | 2 | United Kingdom | 0 |
2021-08-18 | 2 | United Kingdom | 0 |
2021-08-19 | 2 | United Kingdom | 0 |
2021-08-20 | 2 | United Kingdom | 0 |
2021-08-21 | 2 | United Kingdom | 0 |
This is the dataset that I would like instead:
Date | Week | Country | Weekly_Cases | Daily_Cases |
2021-08-09 | 1 | United States | 500 | 83.3 |
2021-08-10 | 1 | United States | 0 | 83.3 |
2021-08-11 | 1 | United States | 0 | 83.3 |
2021-08-12 | 1 | United States | 0 | 83.3 |
2021-08-13 | 1 | United States | 0 | 83.3 |
2021-08-14 | 1 | United States | 0 | 83.3 |
2021-08-15 | 2 | United States | 325 | 46.4 |
2021-08-16 | 2 | United States | 0 | 46.4 |
2021-08-17 | 2 | United States | 0 | 46.4 |
2021-08-18 | 2 | United States | 0 | 46.4 |
2021-08-19 | 2 | United States | 0 | 46.4 |
2021-08-20 | 2 | United States | 0 | 46.4 |
2021-08-21 | 2 | United States | 0 | 46.4 |
2021-08-10 | 1 | United Kingdom | 100 | 16.7 |
2021-08-11 | 1 | United Kingdom | 0 | 16.7 |
2021-08-12 | 1 | United Kingdom | 0 | 16.7 |
2021-08-13 | 1 | United Kingdom | 0 | 16.7 |
2021-08-14 | 1 | United Kingdom | 0 | 16.7 |
2021-08-15 | 2 | United Kingdom | 0 | 16.7 |
2021-08-16 | 2 | United Kingdom | 400 | 66.7 |
2021-08-17 | 2 | United Kingdom | 0 | 66.7 |
2021-08-18 | 2 | United Kingdom | 0 | 66.7 |
2021-08-19 | 2 | United Kingdom | 0 | 66.7 |
2021-08-20 | 2 | United Kingdom | 0 | 66.7 |
2021-08-21 | 2 | United Kingdom | 0 | 66.7 |
Any advice would be greatly appreciated!!
Okay, well I think my approach works fine, and it is certainly less programming that your approach.
Hi @PaigeMiller ,
I attempted your method, but it did not do what I was hoping for. Thank you for the help though!
@jnivi wrote:
Hi @PaigeMiller ,
I attempted your method, but it did not do what I was hoping for. Thank you for the help though!
Saying "it did not do what I was hoping for" does not tell us what we are hoping for.
Please point out instances comparing what you expected vs what was produced by @PaigeMiller's suggestion.
the problem is. Please point out instances in which the result
When I used the code that was suggested with my actual data, the following was the output I received.
My problem is not getting a weekly count for each country. My issue is getting an accurate weekly count for each country. In my original data, countries did not report on the same day of the week. I have a weekly variable that I created in order to report based on certain periods. For example, in the sample data I provided, Week 2 for US is 2021-08-15 to 2021-08-21, and the weekly count is reported on August 15. Week 2 for United Kingdom is 2021-08-15 to 2021-08-21, but the weekly count is reported on August 16 and not the 15 like the US. If I wanted to sum the data for both US and UK for week 2, the weekly total for UK would be an underestimate as it would be missing data from August 15th. Ideally what I want for UK for week 2 would be 400 + (100/6), where 100 was the the weekly count reported for UK in week 1.
Thank you!
The output I was intending to populate was as follows
Assuming your data has been sorted by country and date.
data have;
input Date :yymmdd10. Country & $40. WeeklyCases ;
format date yymmdd10.;
cards;
2021-08-09 United States 500
2021-08-10 United States 0
2021-08-11 United States 0
2021-08-12 United States 0
2021-08-13 United States 0
2021-08-14 United States 0
2021-08-15 United States 325
2021-08-16 United States 0
2021-08-17 United States 0
2021-08-18 United States 0
2021-08-19 United States 0
2021-08-20 United States 0
2021-08-21 United States 0
2021-08-10 United Kingdom 100
2021-08-11 United Kingdom 0
2021-08-12 United Kingdom 0
2021-08-13 United Kingdom 0
2021-08-14 United Kingdom 0
2021-08-15 United Kingdom 0
2021-08-16 United Kingdom 400
2021-08-17 United Kingdom 0
2021-08-18 United Kingdom 0
2021-08-19 United Kingdom 0
2021-08-20 United Kingdom 0
2021-08-21 United Kingdom 0
;
proc sort data=have out=temp;by country date;run;
data temp;
set temp;
by country ;
if first.country or WeeklyCases>0 then group+1;
run;
proc sql;
create table want as
select *,sum(WeeklyCases)/count(*) as DailyCases
from temp
group by group
order by country,date;
quit;
Thank you so much @Ksharp !!! This is exactly what I was trying to do!! Thank you once again!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.