Hello Experts,
My data is :
data donnees;
input ID X2 X3;
cards;
1 10 20
2 38 50
3 45 60
4 70 80
;
run;
I would like to calculate my values with lag from the line _N_-1, but this lags is also calculated. My code is :
data donnees1;
set donnees;
/**Calculation of X4 and X5***/
if ID=1 then
do;
X4=.;
X5=X3;
end;
X5_avant=lag(X5);
if ID=2 then
do;
X4=X5_avant+X2;
X5=X5_avant+X3;
end;
X5_avant=lag(X5);
if ID=3 then
do;
X4=X5_avant+X2;
X5=X5_avant+X3;
end;
X5_avant=lag(X5);
if ID=4 then
do;
X4=X5_avant+X2;
X5=X5_avant+X3;
end;
run;
I wrote a macro where I recreate the data:
/****Solution***/
%macro values;
%do i=1 %to 3;
data donnees;
set donnees;
/**Calculation of X4 and X5***/
if ID=1 then
do;
X4=.;
X5=X3;
end;
X5_avant=lag(X5);
if ID^=1 then
do;
X4=X5_avant+X2;
X5=X5_avant+X3;
end;
X5_avant=lag(X5);
run;
%end;
%mend;
%values;
Do you know please another more efficient algorithme that takes less time ?
Thank you very much !
With your example data, this code creates your expected result:
data want;
set donnees;
retain x4 x5 x5_avant; /* x4 and x5 only to keep order of variables */
x4 = x5_avant + x2; /* creates a missing value in 1st observation */
x5 = sum(x5_avant,x3);
output;
x5_avant = sum(x5_avant,x3);
run;
If you wanted x4 to be equal to x2 in the first observation, use a SUM() function to calculate it.
Is the result of that macro what you expect?
With your example data, this code creates your expected result:
data want;
set donnees;
retain x4 x5 x5_avant; /* x4 and x5 only to keep order of variables */
x4 = x5_avant + x2; /* creates a missing value in 1st observation */
x5 = sum(x5_avant,x3);
output;
x5_avant = sum(x5_avant,x3);
run;
If you wanted x4 to be equal to x2 in the first observation, use a SUM() function to calculate it.
SUM() ignores missing values; it will only result in a missing value (and write the corresponding NOTE to the log) when all its arguments are missing.
That happens because of the RETAIN statement. It prevents that x5_avant is set to missing at the start of a data step iteration.
And the explicit OUTPUT statement keeps the previous value of x5_avant in the dataset, as it happens before the calculation.
Using x5_avant in a simple calculation in observation 1 makes no sense, as it is not set at that time.
But you can expand my code to create non-missing values:
data want;
set donnees;
retain x4 x5 x5_avant;
x4 = sum(x5_avant,ifn(_n_ = 1,x2,x2/10));
x5 = sum(x5_avant,ifn(_n_ = 1,x3,x3/10);
output;
x5_avant = sum(x5_avant,x3);
run;
Hello Kurt,
Sorry, for insisting 🙂 I adopted the algorithm to my data, but I don't have the right results, I'm wondering if I understand your proposition of code. My code is :
I attached the data Variation3 and file with the right results.
Thank you very much for your help !
data Variation4;
set Variation3;
by NO_POLICE;
retain DeltaEAs EAs EAs_avant;
if _n_=1 then do;
DeltaEAs=MT_EVT + 5;
EAs =max(MT_EA,MT_EVT);
end;
if _n_^=1 then do;
if MT_EVT=0 then DeltaEAs=0;
if MT_EVT>0 then DeltaEAs=MT_EVT;
if MT_EVT<0 then DeltaEAs=EAs_avant*(MT_EVT/MT_EA_AVMVT);
EAs=sum(EAs_avant,DeltaEAs);
end;
output;
EAs_avant = sum(EAs_avant,DeltaEAs);
run;
MT_EVT is missing in the first observation, so DeltaEAs will also be missing, causing EAs_avant to be missing also (all arguments of the SUM function are missing).
Since all subsequent MT_EVT are negative, this calculation is done
DeltaEAs=EAs_avant*(MT_EVT/MT_EA_AVMVT)
and will always result in a missing value for DeltaEAs (as long as EAs_avant is missing), which then causes EAs_avant to stay missing.
All this is clearly reflected in the log (Maxim 2!):
73 data Variation4;
74 set Variation3;
75 by NO_POLICE;
76 retain DeltaEAs EAs EAs_avant;
77
78 if _n_=1 then do;
79 DeltaEAs=MT_EVT + 5;
80 EAs =max(MT_EA,MT_EVT);
81 end;
82
83 if _n_^=1 then do;
84 if MT_EVT=0 then DeltaEAs=0;
85 if MT_EVT>0 then DeltaEAs=MT_EVT;
86 if MT_EVT<0 then DeltaEAs=EAs_avant*(MT_EVT/MT_EA_AVMVT);
87 EAs=sum(EAs_avant,DeltaEAs);
88 end;
89
90 output;
91 EAs_avant = sum(EAs_avant,DeltaEAs);
92 run;
NOTE: Missing values were generated as a result of performing an operation on missing values.
Each place is given by: (Number of times) at (Line):(Column).
1 bei 79:19 12 bei 86:38 12 bei 87:9 13 bei 91:15
NOTE: There were 14 observations read from the data set WORK.VARIATION3.
NOTE: The data set WORK.VARIATION4 has 14 observations and 9 variables.
NOTE: Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit):
real time 0.02 seconds
cpu time 0.01 seconds
Thank you, Kurt,
When I change just for _N_=1 the code to initialize the missing values:
if _n_=1 then do;
MT_EVT=0;
MT_EA_AVMVT=0;
DeltaEAs=0;
EAs =max(MT_EA,MT_EVT);
end;
I have the 0 in columns, I can't understand why the value EAs=329577,34745 is not retened in EAs_avant.
Thank you for your help !
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.