Hi,
I am trying to create a new row from an existing row that is effectively 1 point in time divided by the next point in time.
my table looks like this
FieldA | FieldB | FieldC | T0 | PQ1 | PQ2 | PQ3 | .... | PQ32 |
Value | Value2 | Balance | 100 | 98 | 97 | 95 | ... | 2 |
what I would like to do is create a new row where FieldA and FieldB are unchanged, FieldC will now be 'Decay' and we start dividing PQ1/T0 PQ2/PQ1 ...
FieldA | FieldB | FieldC | T0 | PQ1 | PQ2 | PQ3 | .... | PQ32 |
Value | Value2 | Decay | PQ1/T0 | PQ2/PQ1 | PQ3/PQ2 | ... | PQ32/PQ31 |
aka
FieldA | FieldB | FieldC | T0 | PQ1 | PQ2 | PQ3 | .... | PQ32 |
Value | Value2 | Decay | 98/100 | 97/98 | 95/97 | ... | 2/somethign |
and the final output would look like
FieldA | FieldB | FieldC | T0 | PQ1 | PQ2 | PQ3 | .... | PQ32 |
Value | Value2 | Balance | 100 | 98 | 97 | 95 | ... | 2 |
Value | Value2 | Decay | . | 0.98 | 0.989795918 | 0.979381443 | something |
could you please help me figure out how to iterate through this process.
Best,
Chris w.
Hi @cwhit,
@cwhit wrote:
I created a new column PQ0 = T0 so the code would work
I think @Reeza meant T0--PQ32 (double dash), i.e. the list of variables from T0 through PQ32 in the program data vector (PDV). To be on the safe side (not assuming a certain variable order in the PDV), I would prefer T0 PQ1-PQ32. Either way, a new column is not necessary.
@cwhit wrote:
I just realized something strange is happening with the units ...
the output seems to be correct but the units are changing every other column. for example. Decay at PQ2 is 96.938 which is the right number but is somehow is not in the same units as Decay at PQ1 or PQ3 or PQ5.. very strange Any ideas as to why this is the case?
FieldA FieldB FieldC T0 PQ1 PQ2 PQ3 PQ4 PQ5 PQ6 PQ0 a b balance 100 98 95 93 92 85 70 100 a b decay 100 0.98 96.93878 0.959368 95.89642 0.886373 78.97352 100
It's always good to investigate strange results. In this case, I'm afraid, it turns out that most of the results are incorrect (and not only some off by a factor of 100 as it may seem at first glance), e.g. PQ3=(93/95)*(98/100)=0.959368... and not 93/95=0.978947... what it should be.
Reason: The formula _rvars(i) = _rvars(i) / _rvars(i-1) is recursive. Currently, it overwrites the value which is needed as the denominator in the next iteration of the DO loop. To avoid this, you can perform the iteration in descending order (PQ32=..., PQ31=..., ...) or save the value (which would be lost otherwise) in a temporary variable, say h, and use the LAG function to return the previous value of h in each iteration. In the latter case I'd let the DO loop start at i=1 in order to get a nonmissing value from the LAG function for i=2 already. Variable T0 will then be set to missing because of the missing value from the LAG function in the first iteration, but this is exactly what you specified in your initial post. By using the DIVIDE function you can avoid the note "Missing values were generated ..." in the log.
Option 1: iteration in descending order:
data want(drop=i);
set have;
output;
FieldC = 'Decay';
array _rvars(*) T0 PQ1-PQ32;
do i=dim(_rvars) to 2 by -1;
_rvars(i) = _rvars(i) / _rvars(i-1);
end;
_rvars(1)=.;
output;
run;
Option 2: LAG function with temporary variable:
data want(drop=i h);
set have;
output;
FieldC = 'Decay';
array _rvars(*) T0 PQ1-PQ32;
do i=1 to dim(_rvars);
h=_rvars(i); /* save existing value for the next iteration */
_rvars(i) = divide(_rvars(i),lag(h));
end;
output;
run;
data want;
set have;
FieldC = 'Balance';
output;
FieldC = 'Decay';
array _rvars(*) T0-PQ32;
do i=2 to dim(_rvars);
_rvars(i) = _rvars(i) / _rvars(i-1);
end;
output;
run;
I would highly recommend you reconsider this format and create a long format with one column for balance and a second for decay.
SAS loops automatically and its worth taking advantage of that.
Tutorial on Arrays if you have question about the code above:
https://stats.idre.ucla.edu/sas/seminars/sas-arrays/
@cwhit wrote:
Hi,
I am trying to create a new row from an existing row that is effectively 1 point in time divided by the next point in time.
my table looks like this
FieldA FieldB FieldC T0 PQ1 PQ2 PQ3 .... PQ32 Value Value2 Balance 100 98 97 95 ... 2
what I would like to do is create a new row where FieldA and FieldB are unchanged, FieldC will now be 'Decay' and we start dividing PQ1/T0 PQ2/PQ1 ...
FieldA FieldB FieldC T0 PQ1 PQ2 PQ3 .... PQ32 Value Value2 Decay PQ1/T0 PQ2/PQ1 PQ3/PQ2 ... PQ32/PQ31
aka
FieldA FieldB FieldC T0 PQ1 PQ2 PQ3 .... PQ32 Value Value2 Decay 98/100 97/98 95/97 ... 2/somethign
and the final output would look like
FieldA FieldB FieldC T0 PQ1 PQ2 PQ3 .... PQ32 Value Value2 Balance 100 98 97 95 ... 2 Value Value2 Decay . 0.98 0.989795918 0.979381443 something
could you please help me figure out how to iterate through this process.
Best,
Chris w.
Hi,
This was almost the solution - I just realized something strange is happening with the units (note, I created a new column PQ0 = T0 so the code would work)
the output seems to be correct but the units are changing every other column. for example. Decay at PQ2 is 96.938 which is the right number but is somehow is not in the same units as Decay at PQ1 or PQ3 or PQ5.. very strange Any ideas as to why this is the case?
FieldA | FieldB | FieldC | T0 | PQ1 | PQ2 | PQ3 | PQ4 | PQ5 | PQ6 | PQ0 |
a | b | balance | 100 | 98 | 95 | 93 | 92 | 85 | 70 | 100 |
a | b | decay | 100 | 0.98 | 96.93878 | 0.959368 | 95.89642 | 0.886373 | 78.97352 | 100 |
Hi @cwhit,
@cwhit wrote:
I created a new column PQ0 = T0 so the code would work
I think @Reeza meant T0--PQ32 (double dash), i.e. the list of variables from T0 through PQ32 in the program data vector (PDV). To be on the safe side (not assuming a certain variable order in the PDV), I would prefer T0 PQ1-PQ32. Either way, a new column is not necessary.
@cwhit wrote:
I just realized something strange is happening with the units ...
the output seems to be correct but the units are changing every other column. for example. Decay at PQ2 is 96.938 which is the right number but is somehow is not in the same units as Decay at PQ1 or PQ3 or PQ5.. very strange Any ideas as to why this is the case?
FieldA FieldB FieldC T0 PQ1 PQ2 PQ3 PQ4 PQ5 PQ6 PQ0 a b balance 100 98 95 93 92 85 70 100 a b decay 100 0.98 96.93878 0.959368 95.89642 0.886373 78.97352 100
It's always good to investigate strange results. In this case, I'm afraid, it turns out that most of the results are incorrect (and not only some off by a factor of 100 as it may seem at first glance), e.g. PQ3=(93/95)*(98/100)=0.959368... and not 93/95=0.978947... what it should be.
Reason: The formula _rvars(i) = _rvars(i) / _rvars(i-1) is recursive. Currently, it overwrites the value which is needed as the denominator in the next iteration of the DO loop. To avoid this, you can perform the iteration in descending order (PQ32=..., PQ31=..., ...) or save the value (which would be lost otherwise) in a temporary variable, say h, and use the LAG function to return the previous value of h in each iteration. In the latter case I'd let the DO loop start at i=1 in order to get a nonmissing value from the LAG function for i=2 already. Variable T0 will then be set to missing because of the missing value from the LAG function in the first iteration, but this is exactly what you specified in your initial post. By using the DIVIDE function you can avoid the note "Missing values were generated ..." in the log.
Option 1: iteration in descending order:
data want(drop=i);
set have;
output;
FieldC = 'Decay';
array _rvars(*) T0 PQ1-PQ32;
do i=dim(_rvars) to 2 by -1;
_rvars(i) = _rvars(i) / _rvars(i-1);
end;
_rvars(1)=.;
output;
run;
Option 2: LAG function with temporary variable:
data want(drop=i h);
set have;
output;
FieldC = 'Decay';
array _rvars(*) T0 PQ1-PQ32;
do i=1 to dim(_rvars);
h=_rvars(i); /* save existing value for the next iteration */
_rvars(i) = divide(_rvars(i),lag(h));
end;
output;
run;
Thanks @FreelanceReinh, good catch on my mistakes there 🙂
@cwhit please change the correct answer to @FreelanceReinh
Thank you both very much. This was very helpful and has saved me a lot of unnecessary data steps.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.