Fluorite | Level 6

## Subtracting columns to get difference

I am new to programming and I don't really understand Arrays. I am trying to calculate the day on day variance of the below dataset

 Country Segment Mar_30 Apr_2 Apr_3 Apr_4 Singapore Corp 126,196 126,196 126,196 126,196 malaysia Ret 70,521,201 70,562,347 70,455,005 70,556,299

so the expected output is to have the same number of records but have day on day variance i.e for malaysia the Apr2 balance would be Apr2 balance minus Mar 30 balance , something like this,

 Country Segment Apr_2 Apr_3 Apr_4 Singapore Corp - - - malaysia Ret 41,145 (107,342) 101,294

This is the code I am tusing but not getting expected output.I am using base SAS 9.3

data check_op;
set check_op_e;
array nums[*] _numeric_;
do i=2 to dim(nums);
end;
run;

I hope someone can help me on this problem. Thanks in advance.

1 ACCEPTED SOLUTION

Accepted Solutions
Super User

## Re: Subtracting columns to get difference

Since you are overwriting member 2 in the first iteration, your calculation in the second iteration will be done with the new value, causing unexpected results.

Once again, an example why wide datasets are a PITA to work with:

``````data check_op_e;
input Country :\$15. Segment \$ (Mar_30 Apr_2 Apr_3 Apr_4) (:comma12.);
format Mar_30 Apr_2 Apr_3 Apr_4 comma12.;
cards;
Singapore Corp 126,196 126,196 126,196 126,196
malaysia Ret 70,521,201 70,562,347 70,455,005 70,556,299
;
run;

proc transpose data=check_op_e out=check;
by country segment;
var _numeric_;
run;

data want;
set check;
by country notsorted segment notsorted;
diff = col1 - lag(col1);
if first.segment then diff = .;
run;

proc transpose data=want out=want_wide (drop=_name_);
by country segment;
var diff;
id _name_;
run;``````

See how simple the subtraction step is?

Also note how I presented your example data in a form that makes it easy to use (datastep with datalines). Also see my footnotes.

8 REPLIES 8
Diamond | Level 26

## Re: Subtracting columns to get difference

The problem is you have fallen into the common "Excel way of thinking" trap.  Having columns with dates, or any actual "data" is really not a good idea in any sense.  Your array is failing likely because the system does not recognise that Mar_30 occurs before Apr_02 - these date concepts are not logical and the computer only understands logic.

Now let me point one thing out before you say you "can't change anything" in that what you do internally to program this, does need to be what is output!

So, I would normalise the data, so you would have something like:
COUNTRY  SEGMENT   PERIOD        RESULT

Singapore   Corp             30MAR2018  126,196

Singapore   Corp             02APR2018  126,196

Singapore   Corp             03APR2018  126,196

...

With this data structure it is very simple to get differences, there is even a function created for it:

```data want;
set have;
by country;
if first.segment then deff_res=.;
else diff_res=dif(result);
run;```

http://support.sas.com/documentation/cdl/en/etsug/66100/HTML/default/viewer.htm#etsug_tsdata_sect057...

There are many ways to do this with the structure I give, and if you need the output transposed, do this at the end when you output the data with a proc transpose, best of both worlds.

Fluorite | Level 6

## Re: Subtracting columns to get difference

Thanks for your input. Yes, I was trapped into the excel way of thinking but now I know I need to ignore my instincts to follow excel methods.
Super User

## Re: Subtracting columns to get difference

Since you are overwriting member 2 in the first iteration, your calculation in the second iteration will be done with the new value, causing unexpected results.

Once again, an example why wide datasets are a PITA to work with:

``````data check_op_e;
input Country :\$15. Segment \$ (Mar_30 Apr_2 Apr_3 Apr_4) (:comma12.);
format Mar_30 Apr_2 Apr_3 Apr_4 comma12.;
cards;
Singapore Corp 126,196 126,196 126,196 126,196
malaysia Ret 70,521,201 70,562,347 70,455,005 70,556,299
;
run;

proc transpose data=check_op_e out=check;
by country segment;
var _numeric_;
run;

data want;
set check;
by country notsorted segment notsorted;
diff = col1 - lag(col1);
if first.segment then diff = .;
run;

proc transpose data=want out=want_wide (drop=_name_);
by country segment;
var diff;
id _name_;
run;``````

See how simple the subtraction step is?

Also note how I presented your example data in a form that makes it easy to use (datastep with datalines). Also see my footnotes.

Fluorite | Level 6

## Re: Subtracting columns to get difference

Thanks a lot , Kurt. This is exactly what I was looking for and now I see what I was doing wrong. Will try to avoid having wide data as much as possible.

Tourmaline | Level 20

## Re: Subtracting columns to get difference

@Din4 Your array required a minor adjustment with a self imposed interleave, should you wanna have fun playing with it

``````data check_op_e;
input Country :\$15. Segment \$ (Mar_30 Apr_2 Apr_3 Apr_4) (:comma12.);
format Mar_30 Apr_2 Apr_3 Apr_4 comma12.;
cards;
Singapore Corp 126,196 126,196 126,196 126,196
malaysia Ret 70,521,201 70,562,347 70,455,005 70,556,299
;
run;

data check_op;
set check_op_e check_op_e;
by country Segment;
array nums[*] _numeric_;
do i=2 to dim(nums);
_k=lag(nums(i-1));
if last.country then nums(i)=nums(i)-_k;
end;
if last.country;
drop _k i;
run;``````

Fluorite | Level 6

## Re: Subtracting columns to get difference

Thanks @novinosrin for the solution. Now I see what was incorrect in my initial code. The dataset I am working on has about a million records, do you think it is a good idea to impose interleave on this dataset? I tried your solution on a subset of the data and it works well, would adding the entire dataset make the whole process really slow?
Tourmaline | Level 20

## Re: Subtracting columns to get difference

@Din4 Thank you for your feedback. It's difficult to tell whether or not performance will take a hit until comprehensive testing is done. Also, there are couple of things to consider.

1. Would the code be developed and maintained by you?

2. Or would the code be developed by you, moved to production and outsourced to a third party vendor for maintenance and support.

If it's the former, I see no harm in testing extensively with larger samples go with a one pass solution i.e array solution of yours(only corrected by me). You deserve the credit for your initial approach indeed.

If it's the latter, I would be wary of the skills of the third party vendor unless I know them well. In that case, I would rather stick to Kurt's solution that is easy to flex, reading ease plus a CR(change request can be done fairly easily) and the last thing you want is going back and forth ding dong. I have had this experience before. Trust me!

Fluorite | Level 6

## Re: Subtracting columns to get difference

@novinosrin I will be the one maintaining the code, so as you suggested I will do some extensive testing before I put into production. Thanks again for your help. Really appreciate it.
Discussion stats
• 8 replies
• 7485 views
• 2 likes
• 4 in conversation