Hi everyone,
I'm facing a problem that I can't resolve in one step.
So I would like to output the sum of volume for each year with using by statement. However I can't do that because the variable year doesn't exist in the input set.
I know I can do it in 2 step but I don't want to because with the real database I'm using it will be too heavy.
Data have;
format date ddmmyy8.;
Date="01mar2015"d;
duration=24;
Volume=10;
run;
Data want;
set have;
by year;
retain Volume_cumul 0;
Volume_cumul=Volume;
do i=0 to Duration;
Year=year(intnx('month',date,i,'s'));
Month=month(intnx('month',date,i,'s'));
Volume_cumul=Volume_cumul+1;
if last.year=1 then output;
end;
run;
Thx for your help
The simple answer is NO. You can simulate the FIRST. flag by using LAG.
x=<some function>;
first_x = x ne lag(x);
But to calculate LAST. flag you have to know what the next value of X, that you have not calculated yet, is going to be.
Why not generate the derived dataset as a view and then use the BY statement when processing the VIEW?
But if your real use case is as simple as your example why not just test if MONTH=12?
do i=0 to Duration;
Year=year(intnx('month',date,i,'s'));
Month=month(intnx('month',date,i,'s'));
Volume_cumul=Volume_cumul+1;
if Month=12 or i=duration then output;
end;
Don't write your own DATA step code to compute sums over groups. SAS has already done this work for you and it is called either PROC MEANS or PROC SUMMARY.
proc summary data=have nway;
class date;
format date year4.;
var volume;
output out=_sums_ sum=;
run;
Hi thank u for u answer but your solution doesn't meet my needs.
Actually I'm performing a lot of calculation and I need to get the cumulative sum at the end of year this way.
"cumulative sum at the end of the year" and sum from the PROC SUMMARY code are identical and equal and have no difference. Or you haven't explained what you want clearly enough.
Can you please post what you ultimately want? I run your code but can't really tell what we're trying to get at.
Hi,
If u run the code I posted earlier , u'll have an error message saying that the By variable Year does not exist in the input data set.
I'd like to get the dataset want2 in one step.
Data have;
format date ddmmyy8.;
Date="01mar2015"d;
duration=24;
Volume=10;
run;
Data want;
set have;
*by year;
retain Volume_cumul 0;
Volume_cumul=Volume;
do i=0 to Duration;
Year=year(intnx('month',date,i,'s'));
Month=month(intnx('month',date,i,'s'));
Volume_cumul=Volume_cumul+1;
output;*if last.year=1 then output;
end;
run;
proc sort data=want;
by year;
run;
Data want2;
set want;
by year;
if last.year=1 then output;
run;
Yes, I understand that, and I did see that error when I ran it. It makes sense that it didn't run.
What I am asking for, however, is your final data set -- what will it look like?
Providing that ensures that we are able to get exactly what you want.
I'd like to get the dataset Want2
Data have;
format date ddmmyy8.;
Date="01mar2015"d;
duration=24;
Volume=10;
run;
Data want;
set have;
*by year;
retain Volume_cumul 0;
Volume_cumul=Volume;
do i=0 to Duration;
Year=year(intnx('month',date,i,'s'));
Month=month(intnx('month',date,i,'s'));
Volume_cumul=Volume_cumul+1;
output;*if last.year=1 then output;
end;
run;
proc sort data=want;
by year;
run;
Data want2;
set want;
by year;
if last.year=1 then output;
run;
Yeah technically it does the same job but I'll need to output the entire database in first step before taking the sum which I don't.
Finally I just want to use the statement last.variable with a variable that doesn't exist in the input dataset.
@adil256 wrote:
Yeah technically it does the same job but I'll need to output the entire database in first step before taking the sum which I don't.
I just don't understand how "output the entire database" relates to the rest of the problem you have described.
You were previously asked to SHOW US what you want (or a small portion of the output), and that hasn't been done either.
The simple answer is NO. You can simulate the FIRST. flag by using LAG.
x=<some function>;
first_x = x ne lag(x);
But to calculate LAST. flag you have to know what the next value of X, that you have not calculated yet, is going to be.
Why not generate the derived dataset as a view and then use the BY statement when processing the VIEW?
But if your real use case is as simple as your example why not just test if MONTH=12?
do i=0 to Duration;
Year=year(intnx('month',date,i,'s'));
Month=month(intnx('month',date,i,'s'));
Volume_cumul=Volume_cumul+1;
if Month=12 or i=duration then output;
end;
😁 I absolutely wanted to use the last.variable statement to output the data that I didn't think at all at this solution.
Thx for ur help
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.