- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I am new to programming and I don't really understand Arrays. I am trying to calculate the day on day variance of the below dataset
Country | Segment | Mar_30 | Apr_2 | Apr_3 | Apr_4 |
Singapore | Corp | 126,196 | 126,196 | 126,196 | 126,196 |
malaysia | Ret | 70,521,201 | 70,562,347 | 70,455,005 | 70,556,299 |
so the expected output is to have the same number of records but have day on day variance i.e for malaysia the Apr2 balance would be Apr2 balance minus Mar 30 balance , something like this,
Country | Segment | Apr_2 | Apr_3 | Apr_4 |
Singapore | Corp | - | - | - |
malaysia | Ret | 41,145 | (107,342) | 101,294 |
This is the code I am tusing but not getting expected output.I am using base SAS 9.3
data check_op;
set check_op_e;
array nums[*] _numeric_;
array aday[*] _numeric_;
do i=2 to dim(nums);
aday(i)=nums(i)-nums(i-1);
end;
run;
I hope someone can help me on this problem. Thanks in advance.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Since you are overwriting member 2 in the first iteration, your calculation in the second iteration will be done with the new value, causing unexpected results.
Once again, an example why wide datasets are a PITA to work with:
data check_op_e;
input Country :$15. Segment $ (Mar_30 Apr_2 Apr_3 Apr_4) (:comma12.);
format Mar_30 Apr_2 Apr_3 Apr_4 comma12.;
cards;
Singapore Corp 126,196 126,196 126,196 126,196
malaysia Ret 70,521,201 70,562,347 70,455,005 70,556,299
;
run;
proc transpose data=check_op_e out=check;
by country segment;
var _numeric_;
run;
data want;
set check;
by country notsorted segment notsorted;
diff = col1 - lag(col1);
if first.segment then diff = .;
run;
proc transpose data=want out=want_wide (drop=_name_);
by country segment;
var diff;
id _name_;
run;
See how simple the subtraction step is?
Also note how I presented your example data in a form that makes it easy to use (datastep with datalines). Also see my footnotes.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The problem is you have fallen into the common "Excel way of thinking" trap. Having columns with dates, or any actual "data" is really not a good idea in any sense. Your array is failing likely because the system does not recognise that Mar_30 occurs before Apr_02 - these date concepts are not logical and the computer only understands logic.
Now let me point one thing out before you say you "can't change anything" in that what you do internally to program this, does need to be what is output!
So, I would normalise the data, so you would have something like:
COUNTRY SEGMENT PERIOD RESULT
Singapore Corp 30MAR2018 126,196
Singapore Corp 02APR2018 126,196
Singapore Corp 03APR2018 126,196
...
With this data structure it is very simple to get differences, there is even a function created for it:
data want; set have; by country; if first.segment then deff_res=.; else diff_res=dif(result); run;
There are many ways to do this with the structure I give, and if you need the output transposed, do this at the end when you output the data with a proc transpose, best of both worlds.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Since you are overwriting member 2 in the first iteration, your calculation in the second iteration will be done with the new value, causing unexpected results.
Once again, an example why wide datasets are a PITA to work with:
data check_op_e;
input Country :$15. Segment $ (Mar_30 Apr_2 Apr_3 Apr_4) (:comma12.);
format Mar_30 Apr_2 Apr_3 Apr_4 comma12.;
cards;
Singapore Corp 126,196 126,196 126,196 126,196
malaysia Ret 70,521,201 70,562,347 70,455,005 70,556,299
;
run;
proc transpose data=check_op_e out=check;
by country segment;
var _numeric_;
run;
data want;
set check;
by country notsorted segment notsorted;
diff = col1 - lag(col1);
if first.segment then diff = .;
run;
proc transpose data=want out=want_wide (drop=_name_);
by country segment;
var diff;
id _name_;
run;
See how simple the subtraction step is?
Also note how I presented your example data in a form that makes it easy to use (datastep with datalines). Also see my footnotes.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks a lot , Kurt. This is exactly what I was looking for and now I see what I was doing wrong. Will try to avoid having wide data as much as possible.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Din4 Your array required a minor adjustment with a self imposed interleave, should you wanna have fun playing with it
data check_op_e;
input Country :$15. Segment $ (Mar_30 Apr_2 Apr_3 Apr_4) (:comma12.);
format Mar_30 Apr_2 Apr_3 Apr_4 comma12.;
cards;
Singapore Corp 126,196 126,196 126,196 126,196
malaysia Ret 70,521,201 70,562,347 70,455,005 70,556,299
;
run;
data check_op;
set check_op_e check_op_e;
by country Segment;
array nums[*] _numeric_;
do i=2 to dim(nums);
_k=lag(nums(i-1));
if last.country then nums(i)=nums(i)-_k;
end;
if last.country;
drop _k i;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Din4 Thank you for your feedback. It's difficult to tell whether or not performance will take a hit until comprehensive testing is done. Also, there are couple of things to consider.
1. Would the code be developed and maintained by you?
2. Or would the code be developed by you, moved to production and outsourced to a third party vendor for maintenance and support.
If it's the former, I see no harm in testing extensively with larger samples go with a one pass solution i.e array solution of yours(only corrected by me). You deserve the credit for your initial approach indeed.
If it's the latter, I would be wary of the skills of the third party vendor unless I know them well. In that case, I would rather stick to Kurt's solution that is easy to flex, reading ease plus a CR(change request can be done fairly easily) and the last thing you want is going back and forth ding dong. I have had this experience before. Trust me!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content