Solved: divide one column by another over time

cwhit · Posted 10-04-2019 07:25 PM

Hi,

I am trying to create a new row from an existing row that is effectively 1 point in time divided by the next point in time.

my table looks like this

FieldA	FieldB	FieldC	T0	PQ1	PQ2	PQ3	....	PQ32
Value	Value2	Balance	100	98	97	95	...	2

what I would like to do is create a new row where FieldA and FieldB are unchanged, FieldC will now be 'Decay' and we start dividing PQ1/T0 PQ2/PQ1 ...

FieldA	FieldB	FieldC	T0	PQ1	PQ2	PQ3	....	PQ32
Value	Value2	Decay		PQ1/T0	PQ2/PQ1	PQ3/PQ2	...	PQ32/PQ31

aka

FieldA	FieldB	FieldC	T0	PQ1	PQ2	PQ3	....	PQ32
Value	Value2	Decay		98/100	97/98	95/97	...	2/somethign

and the final output would look like

FieldA	FieldB	FieldC	T0	PQ1	PQ2	PQ3	....	PQ32
Value	Value2	Balance	100	98	97	95	...	2
Value	Value2	Decay	.	0.98	0.989795918	0.979381443		something

could you please help me figure out how to iterate through this process.

Best,

Chris w.

FreelanceReinh · Posted 10-06-2019 06:56 PM

Hi @cwhit,

@cwhit wrote:

I created a new column PQ0 = T0 so the code would work

I think @Reeza meant T0--PQ32 (double dash), i.e. the list of variables from T0 through PQ32 in the program data vector (PDV). To be on the safe side (not assuming a certain variable order in the PDV), I would prefer T0 PQ1-PQ32. Either way, a new column is not necessary.

@cwhit wrote:

I just realized something strange is happening with the units ...

the output seems to be correct but the units are changing every other column. for example. Decay at PQ2 is 96.938 which is the right number but is somehow is not in the same units as Decay at PQ1 or PQ3 or PQ5.. very strange Any ideas as to why this is the case?

FieldA FieldB FieldC T0 PQ1 PQ2 PQ3 PQ4 PQ5 PQ6 PQ0

a b balance 100 98 95 93 92 85 70 100

a b decay 100 0.98 96.93878 0.959368 95.89642 0.886373 78.97352 100

It's always good to investigate strange results. In this case, I'm afraid, it turns out that most of the results are incorrect (and not only some off by a factor of 100 as it may seem at first glance), e.g. PQ3=(93/95)*(98/100)=0.959368... and not 93/95=0.978947... what it should be.

Reason: The formula _rvars(i) = _rvars(i) / _rvars(i-1) is recursive. Currently, it overwrites the value which is needed as the denominator in the next iteration of the DO loop. To avoid this, you can perform the iteration in descending order (PQ32=..., PQ31=..., ...) or save the value (which would be lost otherwise) in a temporary variable, say h, and use the LAG function to return the previous value of h in each iteration. In the latter case I'd let the DO loop start at i=1 in order to get a nonmissing value from the LAG function for i=2 already. Variable T0 will then be set to missing because of the missing value from the LAG function in the first iteration, but this is exactly what you specified in your initial post. By using the DIVIDE function you can avoid the note "Missing values were generated ..." in the log.

Option 1: iteration in descending order:

data want(drop=i);
set have;
output;
FieldC = 'Decay';
array _rvars(*) T0 PQ1-PQ32;

do i=dim(_rvars) to 2 by -1;
  _rvars(i) = _rvars(i) / _rvars(i-1);
end;
_rvars(1)=.;

output;
run;

Option 2: LAG function with temporary variable:

data want(drop=i h);
set have;
output;
FieldC = 'Decay';
array _rvars(*) T0 PQ1-PQ32;

do i=1 to dim(_rvars);
  h=_rvars(i); /* save existing value for the next iteration */
  _rvars(i) = divide(_rvars(i),lag(h));
end;

output;
run;

View solution in original post

Reeza · Posted 10-04-2019 10:23 PM

data want;
set have;

FieldC = 'Balance';
output;

FieldC = 'Decay';
array _rvars(*) T0-PQ32;


do i=2 to dim(_rvars);
_rvars(i) = _rvars(i) / _rvars(i-1);
end;

output;

run;

I would highly recommend you reconsider this format and create a long format with one column for balance and a second for decay.

SAS loops automatically and its worth taking advantage of that.

Tutorial on Arrays if you have question about the code above:

https://stats.idre.ucla.edu/sas/seminars/sas-arrays/

@cwhit wrote:

Hi,

I am trying to create a new row from an existing row that is effectively 1 point in time divided by the next point in time.

my table looks like this

FieldA FieldB FieldC T0 PQ1 PQ2 PQ3 .... PQ32

Value Value2 Balance 100 98 97 95 ... 2

what I would like to do is create a new row where FieldA and FieldB are unchanged, FieldC will now be 'Decay' and we start dividing PQ1/T0 PQ2/PQ1 ...

FieldA FieldB FieldC T0 PQ1 PQ2 PQ3 .... PQ32

Value Value2 Decay PQ1/T0 PQ2/PQ1 PQ3/PQ2 ... PQ32/PQ31

aka

FieldA FieldB FieldC T0 PQ1 PQ2 PQ3 .... PQ32

Value Value2 Decay 98/100 97/98 95/97 ... 2/somethign

and the final output would look like

FieldA FieldB FieldC T0 PQ1 PQ2 PQ3 .... PQ32

Value Value2 Balance 100 98 97 95 ... 2

Value Value2 Decay . 0.98 0.989795918 0.979381443 something

could you please help me figure out how to iterate through this process.

Best,

Chris w.

cwhit · Posted 10-06-2019 08:46 AM

Hi,

This was almost the solution - I just realized something strange is happening with the units (note, I created a new column PQ0 = T0 so the code would work)

the output seems to be correct but the units are changing every other column. for example. Decay at PQ2 is 96.938 which is the right number but is somehow is not in the same units as Decay at PQ1 or PQ3 or PQ5.. very strange Any ideas as to why this is the case?

FieldA	FieldB	FieldC	T0	PQ1	PQ2	PQ3	PQ4	PQ5	PQ6	PQ0
a	b	balance	100	98	95	93	92	85	70	100
a	b	decay	100	0.98	96.93878	0.959368	95.89642	0.886373	78.97352	100

FreelanceReinh · Posted 10-06-2019 06:56 PM

Hi @cwhit,

@cwhit wrote:

I created a new column PQ0 = T0 so the code would work

I think @Reeza meant T0--PQ32 (double dash), i.e. the list of variables from T0 through PQ32 in the program data vector (PDV). To be on the safe side (not assuming a certain variable order in the PDV), I would prefer T0 PQ1-PQ32. Either way, a new column is not necessary.

@cwhit wrote:

I just realized something strange is happening with the units ...

the output seems to be correct but the units are changing every other column. for example. Decay at PQ2 is 96.938 which is the right number but is somehow is not in the same units as Decay at PQ1 or PQ3 or PQ5.. very strange Any ideas as to why this is the case?

FieldA FieldB FieldC T0 PQ1 PQ2 PQ3 PQ4 PQ5 PQ6 PQ0

a b balance 100 98 95 93 92 85 70 100

a b decay 100 0.98 96.93878 0.959368 95.89642 0.886373 78.97352 100

It's always good to investigate strange results. In this case, I'm afraid, it turns out that most of the results are incorrect (and not only some off by a factor of 100 as it may seem at first glance), e.g. PQ3=(93/95)*(98/100)=0.959368... and not 93/95=0.978947... what it should be.

Reason: The formula _rvars(i) = _rvars(i) / _rvars(i-1) is recursive. Currently, it overwrites the value which is needed as the denominator in the next iteration of the DO loop. To avoid this, you can perform the iteration in descending order (PQ32=..., PQ31=..., ...) or save the value (which would be lost otherwise) in a temporary variable, say h, and use the LAG function to return the previous value of h in each iteration. In the latter case I'd let the DO loop start at i=1 in order to get a nonmissing value from the LAG function for i=2 already. Variable T0 will then be set to missing because of the missing value from the LAG function in the first iteration, but this is exactly what you specified in your initial post. By using the DIVIDE function you can avoid the note "Missing values were generated ..." in the log.

Option 1: iteration in descending order:

data want(drop=i);
set have;
output;
FieldC = 'Decay';
array _rvars(*) T0 PQ1-PQ32;

do i=dim(_rvars) to 2 by -1;
  _rvars(i) = _rvars(i) / _rvars(i-1);
end;
_rvars(1)=.;

output;
run;

Option 2: LAG function with temporary variable:

data want(drop=i h);
set have;
output;
FieldC = 'Decay';
array _rvars(*) T0 PQ1-PQ32;

do i=1 to dim(_rvars);
  h=_rvars(i); /* save existing value for the next iteration */
  _rvars(i) = divide(_rvars(i),lag(h));
end;

output;
run;

Reeza · Posted 10-06-2019 07:02 PM

Thanks @FreelanceReinh, good catch on my mistakes there 🙂

@cwhit please change the correct answer to @FreelanceReinh

cwhit · Posted 10-06-2019 08:38 PM

Thank you both very much. This was very helpful and has saved me a lot of unnecessary data steps.

divide one column by another over time

Re: divide one column by another over time

Re: divide one column by another over time

Re: divide one column by another over time

Re: divide one column by another over time

Re: divide one column by another over time

Re: divide one column by another over time

divide one column by another over time

Re: divide one column by another over time

Re: divide one column by another over time

Re: divide one column by another over time

Re: divide one column by another over time

Re: divide one column by another over time

Re: divide one column by another over time

SAS Innovate 2025: Register Now