Your problem description is not sufficient to give a robust answer. In particular, are you creating a new dataset each week from two data sources (the year-to-date data and the new single-week data)? This would make it easy to determine what the maximum week number should be for each state. On the other hand, if the data are already in a single data set, per your sample, then a bit more work is require to make a program that determination of the miximum valid week number.
I'm going to assume you have two datasets: YTD (the historical data up to now) and SINGLE_WEEK (the most recent week of rainfall data)
Then this should work. It's untested, because we need sample data in the form of working data steps:
data default_rainfall (drop=_:) / view=default_rainfall ;
  if _n_=1 then set single_week (keep=week rename=(week=_most_recent_week));
  /* Now read one obs per state */
  set ytd (keep=state);
  by state;
  retain rainfall 0;
  if first.state then do week=1 to _most_recent_week;
    output;
  end;
run;
data want;
  merge default_rainfall ytd single_week;
  by state week;
run;
This program first creates a complete dummy history called default_rainfall, including the most recent week, but with all rainfall values set to zero.
Then a simple SAS merge statement does everything you need. But note that default_rainfall must be the first names dataset in the merge statement. That's because merge will overwrite the rainfall values from default_rainfall by values from the other two datasets, whenever there is a match of state and week. It doesn't matter what order you put the other two datasets in, assuming they current history and the new single week never have the same week present.
Thanks, that's helpful. Unfortunately it's a single dataset that's updated and imported each week. There's a macro to set the current and previous week numbers.
%let previous_wk=9; 
%let current_wk=10; 
Does this work for the data steps to reproduce?
data WORK.RAINFALL;
infile datalines dsd truncover;
input State:$9. Year:32. Week:32. Rainfall:32.;
label State="State" Year="Year" Week="Week" Rainfall="Rainfall";
datalines;
Illinois 2020 9 10
Illinois 2020 10 14
Illinois 2020 25 5
Illinois 2020 34 8
Illinois 2021 3 3
Illinois 2021 5 14
Illinois 2021 8 7
Illinois 2022 1 4
Missouri 2021 5 18
Missouri 2021 20 3
Missouri 2021 28 29
Missouri 2021 32 6
Missouri 2022 1 3
Missouri 2022 3 17
Missouri 2022 5 5
Wisconsin 2020 28 3
Wisconsin 2021 12 18
Wisconsin 2021 20 29
Wisconsin 2021 25 2
Wisconsin 2022 1 18
;;;;
Here a SQL based approach. For this code to return your desired result you must have every single week for at least one state data.
data have;
  infile datalines truncover;
  input State $ Year Week Rainfall;
  datalines;
Illinois 2020 1 2
Illinois 2021 2 5
Illinois 2021 4 6
Missouri 2021 3 8
Missouri 2021 5 1
Missouri 2022 6 3
Missouri 2022 7 2
Wisconsin 2021 1 1
Wisconsin 2021 4 3
Wisconsin 2022 6 2
;
proc sql;
  create table distinct_state_week as
  select 
    l.state, 
    r.week,
    0 as rainfall
  from
    (select distinct state from have) l,
    (select distinct week from have) r
  ;
quit;
proc sql;
  create table want as
  select 
    l.state,
    l.week,
    coalesce(r.rainfall, l.rainfall) as rainfall
  from 
    distinct_state_week l
    left join
    have r
    on l.state=r.state and l.week=r.week
  ;
quit;
Hi,
Just working with the one input data set, the following single data step should help.
data want;
   set have;
   by state year;
   
   if first.year then
      week_counter = 1;
   if week_counter lt week then
   do;
      original_week     = week;
      original_rainfall = rainfall;      
      rainfall          = 0;
      
      do while (week_counter lt original_week);
         week = week_counter;
         output;
         week_counter + 1;
      end;
      
      week     = original_week;
      rainfall = original_rainfall;
   end;
   output;
   week_counter + 1;
   
   if last.year and week_counter lt 52 then
   do;
      rainfall = 0;
      
      do while (week_counter le 52);
         week = week_counter;
         output;
         week_counter + 1;
      end;
   end;
run;
Kind regards,
Amir.
Edited to remove diagnostic statement from code.
Hi,
If I have understood your newer requirements, then the following should help.
I have updated and revised the code to make use of two macro variables, for earliest and latest week, which I have set to 2 & 5 respectively. This does mean that year 2020 for Illinois is not in the results with the solution presented, but It is not clear to me what you want the results to look like. Please provide the exact results you want for the data in your question, for an earliest and latest week of 2 & 5, plus any other examples you think are pertinent.
Lastly I have also commented out the assignment of rainfall to 0 and replaced it with a missing assignment via the use of call missing(), as this will help distinguish between rainfall data that was not received versus a rainfall value of 0 that was received.
If you need something different then please supply examples of output data based on the data in your question along with values for earliest and latest week.
HTH
data have;
   input
      state    : $char20.
      year     :       8.
      week     :       8.
      rainfall :       8.
   ;
   datalines;
Illinois 2020 1 2
Illinois 2021 2 5
Illinois 2021 4 6
Missouri 2021 3 8
Missouri 2021 5 1
Missouri 2022 6 3
Missouri 2022 7 2
Wisconsin 2021 1 1
Wisconsin 2021 4 3
Wisconsin 2022 6 2
;
%let earliest_week = 2;
%let current_week  = 5;
   
data want;
   set have(where = (week between &earliest_week and ¤t_week));
   by state year;
   
   if first.year then
      week_counter = &earliest_week;
   if week_counter lt week then
   do;
      original_week     = week;
      original_rainfall = rainfall;      
      /* rainfall          = 0; */
      call missing(rainfall);
      
      do while (week_counter lt original_week);
         week = week_counter;
         output;
         week_counter + 1;
      end;
      
      week     = original_week;
      rainfall = original_rainfall;
   end;
   output;
   week_counter + 1;
   
   if last.year and week lt ¤t_week then
   do;
      original_week     = week;
      original_rainfall = rainfall;      
      /* rainfall          = 0; */
      call missing(rainfall);
      
      do while (week_counter le ¤t_week);
         week = week_counter;
         output;
         week_counter + 1;
      end;
   end;
run;
Kind regards,
Amir.
Edit: Remove diagnostic statement from code.
Create a dynamic template for all states and weeks:
data states;
input state $30.;
datalines;
Alabama
Alaska
Arizona
Arkansas
California
Colorado
Connecticut
Delaware
Florida
Georgia
Hawaii
Idaho
Illinois
Indiana
Iowa
Kansas
Kentucky
Louisiana
Maine
Maryland
Massachusetts
Michigan
Minnesota
Mississippi
Missouri
Montana
Nebraska
Nevada
New Jersey
New Hampshire
New Mexico
New York
North Carolina
North Dakota
Ohio
Oklahoma
Oregon
Pennsylvania
Rhode Island
South Carolina
South Dakota
Tennessee
Texas
Utah
Vermont
Virginia
Washington
West Virginia
Wisconsin
Wyoming
;
%let begin=%sysevalf("17may2020"d);
data weeks;
date = &begin.;
do while (date lt today());
  year = year(date);
  week = week(date);
  output;
  date = date + 7;
end;
drop date;
run;
proc sql;
create table template as
  select
    state,
    year,
    week
  from states, weeks
  order by state, year, week
;
quit;And then check your dataset against that:
proc sort data=have;
by state year week;
run;
data
  want
  miss
;
merge
  have (in=h)
  template (in=t)
;
by state year week;
if not h
then output miss;
output want; /* creates an observation with missing rainfall if not present in have */
run;Please supply usable example data (see @Ksharp 's code) that illustrates the issue.
I created an example dataset:
data have;
infile datalines dlm="," dsd;
input state :$30. year week rainfall;
datalines;
Alabama,2021,35,10
Alabama,2021,36,11
Alabama,2021,37,5
Alabama,2021,38,9
Alabama,2021,39,8
Alabama,2021,40,7
;
Then I ran my code against it, and printed a subset of want to show "Alabama" and the relevant weeks, which yields this result:
Beob. state year week rainfall 64 Alabama 2021 31 . 65 Alabama 2021 32 . 66 Alabama 2021 33 . 67 Alabama 2021 34 . 68 Alabama 2021 35 10 69 Alabama 2021 36 11 70 Alabama 2021 37 5 71 Alabama 2021 38 9 72 Alabama 2021 39 8 73 Alabama 2021 40 7 74 Alabama 2021 41 . 75 Alabama 2021 42 . 76 Alabama 2021 43 . 77 Alabama 2021 44 . 78 Alabama 2021 45 . 79 Alabama 2021 46 . 80 Alabama 2021 47 . 81 Alabama 2021 48 . 82 Alabama 2021 49 . 83 Alabama 2021 50 . 84 Alabama 2021 51 . 85 Alabama 2021 52 .
As you can see, none of the values from have have been changed in any way.
So it is ESSENTIAL to see your data as is. We need to know exactly what the types of the variables are, what the raw values are, and which formats (if present) are used to display them.
When using this coding pattern place TEMPLATE before the real dataset in the MERGE statement.
That will prevent any extra variables that might have accidentally been added to the TEMPLATE dataset from overwriting the values read from the real dataset.
data have;
  infile datalines truncover;
  input State $ Year Week Rainfall;
  datalines;
Illinois 2020 1 2
Illinois 2021 2 5
Illinois 2021 4 6
Missouri 2021 3 8
Missouri 2021 5 1
Missouri 2022 6 3
Missouri 2022 7 2
Wisconsin 2021 1 1
Wisconsin 2021 4 3
Wisconsin 2022 6 2
;
proc freq data=have noprint;
table State*Year*Week/out=want sparse;
weight Rainfall;
run;It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
