BookmarkSubscribeRSS Feed
Chris_1
Calcite | Level 5
Hello,
I have a dataset which is updated weekly in which there are rows missing depending on the week’s data. I’m trying to get each row to complete and produce a list of which states haven’t reported any data that week. There should be a line for each state for each week. Once I have a row for each of the 52 weeks for each year, I want to fill in the missing rainfall value with a 0. I’ve tried a number of array variations and haven’t been able to get this to work as the values aren’t exactly missing, it’s the whole row that doesn’t exist.
It would be a fairly simple complete step in R but I can’t get it to translate to SAS. The person I inherited this code from has been filling in the missing rows each week manually and I refuse to keep doing this for all 50ish ‘states’ (obviously this is a made up set).

For example right now I have,

State Year Week Rainfall
Illinois 2020 1 2
Illinois 2021 2 5
Illinois 2021 4 6
Missouri 2021 3 8
Missouri 2021 5 1
Missouri 2022 6 3
Missouri 2022 7 2
Wisconsin 2021 1 1
Wisconsin 2021 4 3
Wisconsin 2022 6 2
13 REPLIES 13
mkeintz
PROC Star

Your problem description is not sufficient to give a robust answer.   In particular, are you creating a new dataset each week from two data sources (the year-to-date data and the new single-week data)?  This would make it easy to determine what the maximum week number should be for each state.  On the other hand, if the data are already in a single data set, per your sample, then a bit more work is require to make a program that determination of the miximum valid week number.   

 

I'm going to assume you have two datasets:  YTD (the historical data up to now) and SINGLE_WEEK (the most recent week of rainfall data)

 

Then this should work.  It's untested, because we need sample data in the form of working data steps:

 

data default_rainfall (drop=_:) / view=default_rainfall ;
  if _n_=1 then set single_week (keep=week rename=(week=_most_recent_week));

  /* Now read one obs per state */
  set ytd (keep=state);
  by state;

  retain rainfall 0;
  if first.state then do week=1 to _most_recent_week;
    output;
  end;
run;

data want;
  merge default_rainfall ytd single_week;
  by state week;
run;

This program first creates a complete dummy history called default_rainfall, including the most recent week, but with all rainfall values set to zero.

 

Then a simple SAS merge statement does everything you need.  But note that default_rainfall must be the first names dataset in the merge statement.  That's because merge will overwrite the rainfall values from default_rainfall by values from the other two datasets, whenever there is a match of state and week.  It doesn't matter what order you put the other two datasets in, assuming they current history and the new single week never have the same week present.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Chris_1
Calcite | Level 5

Thanks, that's helpful. Unfortunately it's a single dataset that's updated and imported each week. There's a macro to set the current and previous week numbers. 

%let previous_wk=9; 
%let current_wk=10; 

 

Does this work for the data steps to reproduce?

 

data WORK.RAINFALL;
infile datalines dsd truncover;
input State:$9. Year:32. Week:32. Rainfall:32.;
label State="State" Year="Year" Week="Week" Rainfall="Rainfall";
datalines;
Illinois 2020 9 10
Illinois 2020 10 14
Illinois 2020 25 5
Illinois 2020 34 8
Illinois 2021 3 3
Illinois 2021 5 14
Illinois 2021 8 7
Illinois 2022 1 4
Missouri 2021 5 18
Missouri 2021 20 3
Missouri 2021 28 29
Missouri 2021 32 6
Missouri 2022 1 3
Missouri 2022 3 17
Missouri 2022 5 5
Wisconsin 2020 28 3
Wisconsin 2021 12 18
Wisconsin 2021 20 29
Wisconsin 2021 25 2
Wisconsin 2022 1 18
;;;;

 

 

 

Patrick
Opal | Level 21

Here a SQL based approach. For this code to return your desired result you must have every single week for at least one state data.

 

data have;
  infile datalines truncover;
  input State $ Year Week Rainfall;
  datalines;
Illinois 2020 1 2
Illinois 2021 2 5
Illinois 2021 4 6
Missouri 2021 3 8
Missouri 2021 5 1
Missouri 2022 6 3
Missouri 2022 7 2
Wisconsin 2021 1 1
Wisconsin 2021 4 3
Wisconsin 2022 6 2
;

proc sql;
  create table distinct_state_week as
  select 
    l.state, 
    r.week,
    0 as rainfall
  from
    (select distinct state from have) l,
    (select distinct week from have) r
  ;
quit;

proc sql;
  create table want as
  select 
    l.state,
    l.week,
    coalesce(r.rainfall, l.rainfall) as rainfall
  from 
    distinct_state_week l
    left join
    have r
    on l.state=r.state and l.week=r.week
  ;
quit;

 

 

Amir
PROC Star

Hi,

 

Just working with the one input data set, the following single data step should help.

 

data want;
   set have;
   by state year;
   
   if first.year then
      week_counter = 1;

   if week_counter lt week then
   do;
      original_week     = week;
      original_rainfall = rainfall;      
      rainfall          = 0;
      
      do while (week_counter lt original_week);
         week = week_counter;
         output;
         week_counter + 1;
      end;
      
      week     = original_week;
      rainfall = original_rainfall;
   end;

   output;
   week_counter + 1;
   
   if last.year and week_counter lt 52 then
   do;
      rainfall = 0;
      
      do while (week_counter le 52);
         week = week_counter;
         output;
         week_counter + 1;
      end;
   end;
run;

 

 

 

Kind regards,

Amir.

 

Edited to remove diagnostic statement from code.

Chris_1
Calcite | Level 5
Thanks! I've mostly resolved this issue based on this solution but would you be able to assist with a line for ending the weeks at the current week and also limiting the earliest week?
Amir
PROC Star

Hi,

 

If I have understood your newer requirements, then the following should help.

 

I have updated and revised the code to make use of two macro variables, for earliest and latest week, which I have set to 2 & 5 respectively. This does mean that year 2020 for Illinois is not in the results with the solution presented, but It is not clear to me what you want the results to look like. Please provide the exact results you want for the data in your question, for an earliest and latest week of 2 & 5, plus any other examples you think are pertinent.

 

Lastly I have also commented out the assignment of rainfall to 0 and replaced it with a missing assignment via the use of call missing(), as this will help distinguish between rainfall data that was not received versus a rainfall value of 0 that was received.

 

If you need something different then please supply examples of output data based on the data in your question along with values for earliest and latest week.

 

HTH 

 

data have;
   input
      state    : $char20.
      year     :       8.
      week     :       8.
      rainfall :       8.
   ;

   datalines;
Illinois 2020 1 2
Illinois 2021 2 5
Illinois 2021 4 6
Missouri 2021 3 8
Missouri 2021 5 1
Missouri 2022 6 3
Missouri 2022 7 2
Wisconsin 2021 1 1
Wisconsin 2021 4 3
Wisconsin 2022 6 2
;


%let earliest_week = 2;
%let current_week  = 5;
   

data want;
   set have(where = (week between &earliest_week and &current_week));
   by state year;
   
   if first.year then
      week_counter = &earliest_week;

   if week_counter lt week then
   do;
      original_week     = week;
      original_rainfall = rainfall;      
      /* rainfall          = 0; */
      call missing(rainfall);
      
      do while (week_counter lt original_week);
         week = week_counter;
         output;
         week_counter + 1;
      end;
      
      week     = original_week;
      rainfall = original_rainfall;
   end;

   output;
   week_counter + 1;
   
   if last.year and week lt &current_week then
   do;
      original_week     = week;
      original_rainfall = rainfall;      
      /* rainfall          = 0; */
      call missing(rainfall);
      
      do while (week_counter le &current_week);
         week = week_counter;
         output;
         week_counter + 1;
      end;
   end;
run;

 

 

Kind regards,

Amir.

 

Edit: Remove diagnostic statement from code.

Chris_1
Calcite | Level 5
Thanks! For some reason this isn't picking up 2022 as last.year
The example data I gave isn't great, sorry. I want it to start mid-2020 and extend until the current week.
So earliest.week would be like 20 and current.week is 10
There should be weeks 20-52 for 2020 for all states, all 52 weeks for 2021, and up until the current week for 2022. The dataset will be quite large for 52 weeks for each state so its hard to share the full example.
Kurt_Bremser
Super User

Create a dynamic template for all states and weeks:

data states;
input state $30.;
datalines;
Alabama
Alaska
Arizona
Arkansas
California
Colorado
Connecticut
Delaware
Florida
Georgia
Hawaii
Idaho
Illinois
Indiana
Iowa
Kansas
Kentucky
Louisiana
Maine
Maryland
Massachusetts
Michigan
Minnesota
Mississippi
Missouri
Montana
Nebraska
Nevada
New Jersey
New Hampshire
New Mexico
New York
North Carolina
North Dakota
Ohio
Oklahoma
Oregon
Pennsylvania
Rhode Island
South Carolina
South Dakota
Tennessee
Texas
Utah
Vermont
Virginia
Washington
West Virginia
Wisconsin
Wyoming
;

%let begin=%sysevalf("17may2020"d);

data weeks;
date = &begin.;
do while (date lt today());
  year = year(date);
  week = week(date);
  output;
  date = date + 7;
end;
drop date;
run;

proc sql;
create table template as
  select
    state,
    year,
    week
  from states, weeks
  order by state, year, week
;
quit;

And then check your dataset against that:

proc sort data=have;
by state year week;
run;

data
  want
  miss
;
merge
  have (in=h)
  template (in=t)
;
by state year week;
if not h
then output miss;
output want; /* creates an observation with missing rainfall if not present in have */
run;
Chris_1
Calcite | Level 5
This is replacing some of my existing values with missing. Ie. there's values in my 'have' dataset for weeks 35-40 and its overwriting them to missing.
Kurt_Bremser
Super User

I created an example dataset:

data have;
infile datalines dlm="," dsd;
input state :$30. year week rainfall;
datalines;
Alabama,2021,35,10
Alabama,2021,36,11
Alabama,2021,37,5
Alabama,2021,38,9
Alabama,2021,39,8
Alabama,2021,40,7
;

Then I ran my code against it, and printed a subset of want to show "Alabama" and the relevant weeks, which yields this result:

Beob.	state	year	week	rainfall
64	Alabama	2021	31	.
65	Alabama	2021	32	.
66	Alabama	2021	33	.
67	Alabama	2021	34	.
68	Alabama	2021	35	10
69	Alabama	2021	36	11
70	Alabama	2021	37	5
71	Alabama	2021	38	9
72	Alabama	2021	39	8
73	Alabama	2021	40	7
74	Alabama	2021	41	.
75	Alabama	2021	42	.
76	Alabama	2021	43	.
77	Alabama	2021	44	.
78	Alabama	2021	45	.
79	Alabama	2021	46	.
80	Alabama	2021	47	.
81	Alabama	2021	48	.
82	Alabama	2021	49	.
83	Alabama	2021	50	.
84	Alabama	2021	51	.
85	Alabama	2021	52	.

As you can see, none of the values from have have been changed in any way.

 

So it is ESSENTIAL to see your data as is. We need to know exactly what the types of the variables are, what the raw values are, and which formats (if present) are used to display them.

Tom
Super User Tom
Super User

When using this coding pattern place TEMPLATE before the real dataset in the MERGE statement.

 

That will prevent any extra variables that might have accidentally been added to the TEMPLATE dataset from overwriting the values read from the real dataset.

Ksharp
Super User
data have;
  infile datalines truncover;
  input State $ Year Week Rainfall;
  datalines;
Illinois 2020 1 2
Illinois 2021 2 5
Illinois 2021 4 6
Missouri 2021 3 8
Missouri 2021 5 1
Missouri 2022 6 3
Missouri 2022 7 2
Wisconsin 2021 1 1
Wisconsin 2021 4 3
Wisconsin 2022 6 2
;

proc freq data=have noprint;
table State*Year*Week/out=want sparse;
weight Rainfall;
run;

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 2540 views
  • 0 likes
  • 7 in conversation