BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
jnivi
Calcite | Level 5

Hi there, 

 

I have a quick question about manipulating a dataset in SAS.  I have included a sample dataset below. 

 

Date                               Country                           Weekly Cases                        

2021-08-09                 United States                    500

2021-08-10                 United States                       0

2021-08-11                  United States                       0

2021-08-12                  United States                      0

2021-08-13                  United States                      0

2021-08-14                 United States                       0

2021-08-15                 United States                    325

2021-08-16                United States                       0

2021-08-17                  United States                      0

2021-08-18                  United States                      0

2021-08-19                  United States                      0

2021-08-20                 United States                      0

2021-08-21                 United States                       0

2021-08-10                United Kingdom                   100

2021-08-11                  United Kingdom                     0

2021-08-12                 United Kingdom                      0

2021-08-13                  United Kingdom                     0

2021-08-14                 United Kingdom                      0

2021-08-15                 United Kingdom                      0

2021-08-16                 United Kingdom                     400

2021-08-17                 United Kingdom                        0

2021-08-18                  United Kingdom                      0

2021-08-19                  United Kingdom                      0

2021-08-20                 United Kingdom                      0

2021-08-21                 United Kingdom                       0

 

What I am trying to do is report the number of cases by week for each country in my dataset. However, what I noticed is that countries did not always report on the same day of the week. I was wondering if there was a way for me to manipulate the data so that I could divide the weekly count into daily counts (important to note that a weekly count might not always represent a 7 day reporting week, in my example below the first weekly count for US represent 6 days and the 2nd weekly count represents 7 days). Ideally every time a weekly count is presented for each country, I would like that number to be divided equally until the next weekly count is present for that country. I have included what I would like my data to look like instead. 

 

Date                               Country                           Weekly Cases                    Daily Cases     

2021-08-09                 United States                    500                                          83.3

2021-08-10                 United States                       0                                              83.3

2021-08-11                  United States                       0                                              83.3

2021-08-12                  United States                      0                                              83.3

2021-08-13                  United States                      0                                             83.3

2021-08-14                 United States                       0                                             83.3

2021-08-15                 United States                    325                                            46.4

2021-08-16                United States                       0                                              46.4

2021-08-17                  United States                      0                                             46.4

2021-08-18                  United States                      0                                              46.4

2021-08-19                  United States                      0                                             46.4

2021-08-20                 United States                      0                                             46.4

2021-08-21                 United States                       0                                             46.4

2021-08-10                United Kingdom                   100                                          16.7

2021-08-11                  United Kingdom                     0                                            16.7

2021-08-12                 United Kingdom                      0                                            16.7

2021-08-13                  United Kingdom                     0                                             16.7

2021-08-14                 United Kingdom                      0                                             16.7

2021-08-15                 United Kingdom                      0                                            16.7

2021-08-16                 United Kingdom                     400                                         66.7

2021-08-17                 United Kingdom                        0                                           66.7

2021-08-18                  United Kingdom                      0                                            66.7

2021-08-19                  United Kingdom                      0                                           66.7

2021-08-20                 United Kingdom                      0                                            66.7

2021-08-21                 United Kingdom                       0                                            66.7

 

I have a pretty large dataset (400k observations) with 230 unique countries that span a few years. Again the countries did not always report their weekly count on the same day, and the weekly counts don't always represent a 7 day count. If there is a way I can easily manipulate the dataset to do the above daily count, I would really appreciate it!

 

Thank you so much for any advice that you can provide!

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

Assuming your data has been sorted by country and date.

data have;
input Date    :yymmdd10.                           Country      & $40.                     WeeklyCases     ;
format date yymmdd10.;
cards; 
2021-08-09                 United States                    500
2021-08-10                 United States                       0
2021-08-11                  United States                       0
2021-08-12                  United States                      0
2021-08-13                  United States                      0
2021-08-14                 United States                       0
2021-08-15                 United States                    325
2021-08-16                United States                       0
2021-08-17                  United States                      0
2021-08-18                  United States                      0
2021-08-19                  United States                      0
2021-08-20                 United States                      0
2021-08-21                 United States                       0
2021-08-10                United Kingdom                   100
2021-08-11                  United Kingdom                     0
2021-08-12                 United Kingdom                      0
2021-08-13                  United Kingdom                     0
2021-08-14                 United Kingdom                      0
2021-08-15                 United Kingdom                      0
2021-08-16                 United Kingdom                     400
2021-08-17                 United Kingdom                        0
2021-08-18                  United Kingdom                      0
2021-08-19                  United Kingdom                      0
2021-08-20                 United Kingdom                      0
2021-08-21                 United Kingdom                       0
;
proc sort data=have out=temp;by country date;run;
data temp;
 set temp;
 by country ;
 if first.country or WeeklyCases>0 then group+1;
 run;
proc sql;
create table want as
select *,sum(WeeklyCases)/count(*) as DailyCases  
 from temp
  group by group
   order by country,date;
quit;

View solution in original post

8 REPLIES 8
PaigeMiller
Diamond | Level 26

I would think that the easiest thing to do is to turn the date into a week number, and then go ahead and analyze by week number. The format weekv5. is one of many ways to turn dates into week numbers.

 

Since the data you provided is not in a usable form (examples and instructions), this code is just a general idea of how to do it.

 

proc summary data=have;
    class date;
    var weekly_cases;
    format date weekv5.;
    output out=want sum=;
run;

 

--
Paige Miller
jnivi
Calcite | Level 5

Hi @PaigeMiller , 

 

Thank you for the quick reply! I forgot to mention that I had already converted the dates to a week format. This issue is that not all countries reported on the same day of the week, so in some cases we can have two weekly counts in my week variable for a particular country--- in that case i would be overestimating for that particular week and underestimating for the prior week.  I have updated the data I had previously provided so that it is a more usable form.  The one presented below is what I have : 

DateWeekCountryWeekly_Cases
2021-08-091United States500
2021-08-101United States0
2021-08-111United States0
2021-08-121United States0
2021-08-131United States0
2021-08-141United States0
2021-08-152United States325
2021-08-162United States0
2021-08-172United States0
2021-08-182United States0
2021-08-192United States0
2021-08-202United States0
2021-08-212United States0
2021-08-101United Kingdom100
2021-08-111United Kingdom0
2021-08-121United Kingdom0
2021-08-131United Kingdom0
2021-08-141United Kingdom0
2021-08-152United Kingdom0
2021-08-162United Kingdom400
2021-08-172United Kingdom0
2021-08-182United Kingdom0
2021-08-192United Kingdom0
2021-08-202United Kingdom0
2021-08-212United Kingdom0


 

This is the dataset that I would like instead: 

DateWeekCountryWeekly_CasesDaily_Cases
2021-08-091United States50083.3
2021-08-101United States083.3
2021-08-111United States083.3
2021-08-121United States083.3
2021-08-131United States083.3
2021-08-141United States083.3
2021-08-152United States32546.4
2021-08-162United States046.4
2021-08-172United States046.4
2021-08-182United States046.4
2021-08-192United States046.4
2021-08-202United States046.4
2021-08-212United States046.4
2021-08-101United Kingdom10016.7
2021-08-111United Kingdom016.7
2021-08-121United Kingdom016.7
2021-08-131United Kingdom016.7
2021-08-141United Kingdom016.7
2021-08-152United Kingdom016.7
2021-08-162United Kingdom40066.7
2021-08-172United Kingdom066.7
2021-08-182United Kingdom066.7
2021-08-192United Kingdom066.7
2021-08-202United Kingdom066.7
2021-08-212United Kingdom066.7

 

Any advice would be greatly appreciated!!

PaigeMiller
Diamond | Level 26

Okay, well I think my approach works fine, and it is certainly less programming that your approach. 

--
Paige Miller
jnivi
Calcite | Level 5

Hi @PaigeMiller , 

 

I attempted your method, but it did not do what I was hoping for. Thank you for the help though!

mkeintz
PROC Star

@jnivi wrote:

Hi @PaigeMiller , 

 

I attempted your method, but it did not do what I was hoping for. Thank you for the help though!


Saying "it did not do what I was hoping for" does not tell us what we are hoping for.

 

Please point out instances comparing what you expected vs what was produced by @PaigeMiller's suggestion. 

 

 

the problem is.  Please point out instances in which the result 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
jnivi
Calcite | Level 5

When I used the code that was suggested with my actual data, the following was the output I received.  

 

jnivi_0-1716853251427.png

 

My problem is not getting a weekly count for each country. My issue is getting an accurate weekly count for each country. In my original data, countries did not report on the same day of the week. I have a weekly variable that I  created in order to report based on certain periods. For example, in the sample data I provided, Week 2 for US is 2021-08-15  to 2021-08-21, and the weekly count is reported on August 15. Week 2 for United Kingdom is  2021-08-15  to 2021-08-21, but the weekly count is reported on August 16 and not the 15 like the US. If I wanted to sum the data for both US and UK for week 2, the weekly total for UK would be an underestimate as it would be missing data from August 15th. Ideally what I want for UK for week 2 would be 400 + (100/6), where 100 was the the weekly count reported for UK in week 1. 

 

Thank you!

 

The output I was intending to populate was as follows 

Ksharp
Super User

Assuming your data has been sorted by country and date.

data have;
input Date    :yymmdd10.                           Country      & $40.                     WeeklyCases     ;
format date yymmdd10.;
cards; 
2021-08-09                 United States                    500
2021-08-10                 United States                       0
2021-08-11                  United States                       0
2021-08-12                  United States                      0
2021-08-13                  United States                      0
2021-08-14                 United States                       0
2021-08-15                 United States                    325
2021-08-16                United States                       0
2021-08-17                  United States                      0
2021-08-18                  United States                      0
2021-08-19                  United States                      0
2021-08-20                 United States                      0
2021-08-21                 United States                       0
2021-08-10                United Kingdom                   100
2021-08-11                  United Kingdom                     0
2021-08-12                 United Kingdom                      0
2021-08-13                  United Kingdom                     0
2021-08-14                 United Kingdom                      0
2021-08-15                 United Kingdom                      0
2021-08-16                 United Kingdom                     400
2021-08-17                 United Kingdom                        0
2021-08-18                  United Kingdom                      0
2021-08-19                  United Kingdom                      0
2021-08-20                 United Kingdom                      0
2021-08-21                 United Kingdom                       0
;
proc sort data=have out=temp;by country date;run;
data temp;
 set temp;
 by country ;
 if first.country or WeeklyCases>0 then group+1;
 run;
proc sql;
create table want as
select *,sum(WeeklyCases)/count(*) as DailyCases  
 from temp
  group by group
   order by country,date;
quit;
jnivi
Calcite | Level 5

Thank you so much @Ksharp !!! This is exactly what I was trying to do!! Thank you once again!

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 582 views
  • 0 likes
  • 4 in conversation