BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Din4
Fluorite | Level 6

I am new to programming and I don't really understand Arrays. I am trying to calculate the day on day variance of the below dataset

 

CountrySegmentMar_30Apr_2Apr_3Apr_4
SingaporeCorp                 126,196                 126,196                 126,196                 126,196
malaysiaRet             70,521,201             70,562,347             70,455,005             70,556,299

 

so the expected output is to have the same number of records but have day on day variance i.e for malaysia the Apr2 balance would be Apr2 balance minus Mar 30 balance , something like this,

 

CountrySegmentApr_2Apr_3Apr_4
SingaporeCorp                          -                            -                            -  
malaysiaRet                   41,145                (107,342)                 101,294

 

This is the code I am tusing but not getting expected output.I am using base SAS 9.3

 

data check_op;
set check_op_e;
array nums[*] _numeric_;
array aday[*] _numeric_;
do i=2 to dim(nums);
aday(i)=nums(i)-nums(i-1);
end;
run;

 

I hope someone can help me on this problem. Thanks in advance.

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

Since you are overwriting member 2 in the first iteration, your calculation in the second iteration will be done with the new value, causing unexpected results.

Once again, an example why wide datasets are a PITA to work with:

data check_op_e;
input Country :$15. Segment $ (Mar_30 Apr_2 Apr_3 Apr_4) (:comma12.);
format Mar_30 Apr_2 Apr_3 Apr_4 comma12.;
cards;
Singapore Corp 126,196 126,196 126,196 126,196
malaysia Ret 70,521,201 70,562,347 70,455,005 70,556,299
;
run;

proc transpose data=check_op_e out=check;
by country segment;
var _numeric_;
run;

data want;
set check;
by country notsorted segment notsorted;
diff = col1 - lag(col1);
if first.segment then diff = .;
run;

proc transpose data=want out=want_wide (drop=_name_);
by country segment;
var diff;
id _name_;
run;

See how simple the subtraction step is?

 

Also note how I presented your example data in a form that makes it easy to use (datastep with datalines). Also see my footnotes.

 

View solution in original post

8 REPLIES 8
RW9
Diamond | Level 26 RW9
Diamond | Level 26

The problem is you have fallen into the common "Excel way of thinking" trap.  Having columns with dates, or any actual "data" is really not a good idea in any sense.  Your array is failing likely because the system does not recognise that Mar_30 occurs before Apr_02 - these date concepts are not logical and the computer only understands logic.  

Now let me point one thing out before you say you "can't change anything" in that what you do internally to program this, does need to be what is output!

So, I would normalise the data, so you would have something like:
COUNTRY  SEGMENT   PERIOD        RESULT

Singapore   Corp             30MAR2018  126,196

Singapore   Corp             02APR2018  126,196

Singapore   Corp             03APR2018  126,196

...

 

With this data structure it is very simple to get differences, there is even a function created for it:

data want;
  set have;
  by country;
  if first.segment then deff_res=.;
  else diff_res=dif(result);
run;

http://support.sas.com/documentation/cdl/en/etsug/66100/HTML/default/viewer.htm#etsug_tsdata_sect057...

 

There are many ways to do this with the structure I give, and if you need the output transposed, do this at the end when you output the data with a proc transpose, best of both worlds.

Din4
Fluorite | Level 6
Thanks for your input. Yes, I was trapped into the excel way of thinking but now I know I need to ignore my instincts to follow excel methods.
Kurt_Bremser
Super User

Since you are overwriting member 2 in the first iteration, your calculation in the second iteration will be done with the new value, causing unexpected results.

Once again, an example why wide datasets are a PITA to work with:

data check_op_e;
input Country :$15. Segment $ (Mar_30 Apr_2 Apr_3 Apr_4) (:comma12.);
format Mar_30 Apr_2 Apr_3 Apr_4 comma12.;
cards;
Singapore Corp 126,196 126,196 126,196 126,196
malaysia Ret 70,521,201 70,562,347 70,455,005 70,556,299
;
run;

proc transpose data=check_op_e out=check;
by country segment;
var _numeric_;
run;

data want;
set check;
by country notsorted segment notsorted;
diff = col1 - lag(col1);
if first.segment then diff = .;
run;

proc transpose data=want out=want_wide (drop=_name_);
by country segment;
var diff;
id _name_;
run;

See how simple the subtraction step is?

 

Also note how I presented your example data in a form that makes it easy to use (datastep with datalines). Also see my footnotes.

 

Din4
Fluorite | Level 6

Thanks a lot , Kurt. This is exactly what I was looking for and now I see what I was doing wrong. Will try to avoid having wide data as much as possible. 

novinosrin
Tourmaline | Level 20

@Din4 Your array required a minor adjustment with a self imposed interleave, should you wanna have fun playing with it

 

data check_op_e;
input Country :$15. Segment $ (Mar_30 Apr_2 Apr_3 Apr_4) (:comma12.);
format Mar_30 Apr_2 Apr_3 Apr_4 comma12.;
cards;
Singapore Corp 126,196 126,196 126,196 126,196
malaysia Ret 70,521,201 70,562,347 70,455,005 70,556,299
;
run;

data check_op;
set check_op_e check_op_e;
by country Segment;
array nums[*] _numeric_;
do i=2 to dim(nums);
_k=lag(nums(i-1));
if last.country then nums(i)=nums(i)-_k;
end;
if last.country;
drop _k i;
run;

 

Din4
Fluorite | Level 6
Thanks @novinosrin for the solution. Now I see what was incorrect in my initial code. The dataset I am working on has about a million records, do you think it is a good idea to impose interleave on this dataset? I tried your solution on a subset of the data and it works well, would adding the entire dataset make the whole process really slow? 
novinosrin
Tourmaline | Level 20

@Din4 Thank you for your feedback. It's difficult to tell whether or not performance will take a hit until comprehensive testing is done. Also, there are couple of things to consider.

 

1. Would the code be developed and maintained by you?

2. Or would the code be developed by you, moved to production and outsourced to a third party vendor for maintenance and support.

 

If it's the former, I see no harm in testing extensively with larger samples go with a one pass solution i.e array solution of yours(only corrected by me). You deserve the credit for your initial approach indeed.

 

If it's the latter, I would be wary of the skills of the third party vendor unless I know them well. In that case, I would rather stick to Kurt's solution that is easy to flex, reading ease plus a CR(change request can be done fairly easily) and the last thing you want is going back and forth ding dong. I have had this experience before. Trust me!

Din4
Fluorite | Level 6
@novinosrin I will be the one maintaining the code, so as you suggested I will do some extensive testing before I put into production. Thanks again for your help. Really appreciate it.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 6917 views
  • 2 likes
  • 4 in conversation