Hello all,
I tried to find the sum of continous non-missing value in a dataset by group (stn and year), but still have some trouble to do so. I hope someone in the SAS community groups can help. Thanks a lot!
I have a dataset looks like this:
stn | year | month | day | var |
1 | 2000 | 12 | 28 | 0.2 |
1 | 2000 | 12 | 29 | 0.5 |
1 | 2000 | 12 | 30 | . |
1 | 2000 | 12 | 31 | 2 |
1 | 2001 | 1 | 1 | 0.2 |
1 | 2001 | 1 | 2 | 1 |
1 | 2001 | 1 | 3 | . |
2 | 2000 | 12 | 28 | 0.6 |
2 | 2000 | 12 | 29 | . |
2 | 2000 | 12 | 30 | . |
2 | 2000 | 12 | 31 | . |
2 | 2001 | 1 | 1 | 0 |
2 | 2001 | 1 | 2 | 1.5 |
2 | 2001 | 1 | 3 |
3 |
And I want to get the following:
stn | year | month | day | var | sum |
1 | 2000 | 12 | 28 | 0.2 | 0.7 |
1 | 2000 | 12 | 29 | 0.5 | 0.7 |
1 | 2000 | 12 | 30 | . | |
1 | 2000 | 12 | 31 | 2 | 2 |
1 | 2001 | 1 | 1 | 0.2 | 1.2 |
1 | 2001 | 1 | 2 | 1 | 1.2 |
1 | 2001 | 1 | 3 | . | |
2 | 2000 | 12 | 28 | 0.6 | 0.6 |
2 | 2000 | 12 | 29 | . | |
2 | 2000 | 12 | 30 | . | |
2 | 2000 | 12 | 31 | . | |
2 | 2001 | 1 | 1 | 0 | 4.5 |
2 | 2001 | 1 | 2 | 1.5 | 4.5 |
2 | 2001 | 1 | 3 | 3 | 4.5 |
If any one can assist in this problem I would greatly appreciate it!
Thank you and hope for your reply!
Best,
Hua
Here is one way:
data need; set have; _check=missing(var); run; data want (drop=_:); do until(last._check); set need; by stn year _check notsorted; if first._check then call missing(sum); sum+var; end; do until(last._check); set need; by stn year _check notsorted; output; end; run;
Art, CEO, AnalystFinder.com
You can expend your group (stn year) by adding a new variable to it:
data temp;
set have;
by stn year;
retain new_var;
if first.year then new_var=0;
if var=. then new_var+1;
run;
Now use temp dataset to sum by stn year new_var and adjust the sum to each of the new group observation.
If you preffer you can drop the new_var from the final dataset.
Here is one way:
data need; set have; _check=missing(var); run; data want (drop=_:); do until(last._check); set need; by stn year _check notsorted; if first._check then call missing(sum); sum+var; end; do until(last._check); set need; by stn year _check notsorted; output; end; run;
Art, CEO, AnalystFinder.com
Hi, another idea ...
data have;
set have;
new + missing(var) + missing(lag(var));
run;
proc sql;
create table want (drop=new) as select *, sum(var) as sum from have
group by stn, year, month, new
order by stn, year, month, day;
quit;
@MikeZdeb's code would have produced the same result as my code but, contrary to your specs, he included month as a group variable.
The following would produce the same result as my code:
data have; set have; new + missing(var) + missing(lag(var)); run; proc sql; create table want (drop=new) as select *, sum(var) as sum from have group by stn, year, new order by stn, year, month, day; quit;
He used new to create a separate group variable. missing(var) would increase new by 1 each time there was a missing value for var.
missing(lag(var)) checked the value of the previous var, thus initially had a value of 1. That was also essential to increase the group when the previous value of var was missing.
The correct summing was accomplish with the group statement.
Art, CEO, AnalystFinder.com
Hi ... MONTH was NOT used as a GROUP variable, just as an ORDER variable to control the look of the output since SQL goes its own merry way when doing the various calculations ...
proc sql;
create table want (drop=new) as select *, sum(var) as sum from have
group by stn, year, new
order by stn, year, month, day;
quit;
The statement ...
new + missing(var) + missing(lag(var));
creates a new variable named NEW (how prosaic) that is incremented by 1 each time ...
#1 a missing value of VAR is encountered in the current observation being processed in the data step
#2 a missing value of VAR in the previous observation that was processed
The expression MISSING(VAR) is TRUE and evaluates as 1 if the value is missing and evaluates as 1 (otherwise as zero). The same is true for the MISSING(LAG(VAR)) portion (add a 1 or a zero). Stements like that are a bit cryptic if you have not been introduced to using logical expressions (TRUE/FALSE that evaluates as 1/0) in equations ... just saves a lot of typing longer ways to accomplish the same task but I agree it might be hard to follow.
Not sure why it did not work for everything. Send me an example (msz03@albany.edu) and I can help the DEBUG process.
new +
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.