11-09-2015 06:00 PM
data rfm162.fy16_array2;
retain
booked_date1-booked_date32
dmd1-dmd32
mbtwn1-mbtwn32
;
set rfm162.fy16_array;
by curr_wg_buid booked_date;
format booked_date1-booked_date32 datetime20.;
array bd(32) booked_date1-booked_date32;
array money(32) dmd1-dmd32;
array mbtwn(32) mbtwn1-mbtwn32;
do i=1 to 32;
if i>count then continue; else
if i=count then do;
bd(i)=booked_date;
money(i)=dmd;
end;
if i>=2 then do;
m=i-1;
mbtwn(i)=((bd(i)-bd(m))/2592000);
end;
end;
dmd_total=sum(of dmd1-dmd32);
drop dmd1-dmd32;
if last.curr_wg_buid;
run;
Hey all,
I'm still relatively new to SAS. I have been generally successful at troubleshooting my code, but this has me stumped. I've got "long" dataset sorted by business unit identifier (curr_wg_buid), and booked date (date of purchase). What I want is a "wide" dataset, that takes all of the booked dates and puts them on one row for each business unit identifier. I also want to calculate the number of months between the dates (mbtwn) and the total value of the revenue generated in the orders (dmd_total).
This code is getting awful close, but is simply returning the wrong info. I've got a count variable (count) on the original data file (fy16_array) that counts by 1 the number of times a BUID appears along with booked_date and dmd (revenue) variables for that transaction. When looking at the output file, there are many BUIDs with count values of 1, with several reported purchase dates, so I know that something is wrong.
Any hints? Thank you,
11-10-2015 09:27 AM - edited 11-10-2015 09:40 AM
So the basic problem with your posted code is that you are retaining the values in the array, but not resetting them when starting a new group. Placing the DO loop around the SET statement eliminates the need for the RETAIN statement and will eliminate this problem. You data already has a COUNT variable that could be used as the array index. Since you don't want to keep the variables used in the MONEY() array then perhaps you can make it a temporary array? However TEMPORARY arrays are retained so you will need to reset those variables.
data rfm162.fy16_array2;
do until (last.curr_wg_buid) ;
set rfm162.fy16_array;
by curr_wg_buid booked_date;
array bd(32) booked_date1-booked_date32;
array money(32) _temporary_ ;
array mbtwn(32) mbtwn1-mbtwn32;
format booked_date1-booked_date32 datetime20.;
bd(count)=booked_date;
money(count)=dmd;
if count > 1 then
mbtwn(count)=((bd(count)-bd(count-1))/2592000)
;
end;
dmd_total=sum(of money(*));
call missing(of money(*));
run;
11-09-2015 06:06 PM
11-09-2015 06:08 PM
With something like this it would really help to have some example data. It need not have as many rows to transpose but should show the same behavior. Also some result should be provided.
Matthew wrote:
This code is getting awful close, but is simply returning the wrong info. I've got a count variable (count) on the original data file (fy16_array) that counts by 1 the number of times a BUID appears along with booked_date and dmd (revenue) variables for that transaction. When looking at the output file, there are many BUIDs with count values of 1, with several reported purchase dates, so I know that something is wrong.
Any hints? Thank you,
I can think of a couple of different ways to interpret "by 1 the number of times a BUID appears along with booked_date and dmd (revenue) variables for that transaction", so you might want to post how that count was done. How you used either Booked_date or BUID in that counting are likley places to look.
11-09-2015 06:12 PM
11-09-2015 07:03 PM
Guessing that this is what you are hoping to achieve:
data want;
i=0;
dmd_total=0;
array bd (32) booked_date1 - booked_date32;
array mbtwn (32);
do until (last.curr_wg_buid);
set rfm162.fy16_array;
by curr_wg_buid;
i + 1;
dmd_total + dmd;
bd(i) = booked_date;
if i >= 2 then mbtwn(i) = (bd(i)-bd(i-1))/2592000;
end;
run;
This will run into trouble (fixable trouble) if you ever have more than 32 observations for the same CURR_WG_BUID.
11-09-2015 07:53 PM
Wow, quick responses!
My rationale behind not terminating at "while i<=count" is because I'd done a proc freq on count values before this step just to see what the max number of transactions by a single customer was. I will try some of these solutions when I'm back in the office in the morning.
11-10-2015 06:46 AM
You need to give us some sample data. It's not that hard to solve your problem but the solution will depend on how your source data looks like - and it's not conclusive based on the code you've posted.
11-10-2015 09:00 AM
I've attached my source dataset here. As you can see, every time a BUID makes a purchase, I have a row for that transaction, sorted by BUID and booked_date.
11-10-2015 09:27 AM - edited 11-10-2015 09:40 AM
So the basic problem with your posted code is that you are retaining the values in the array, but not resetting them when starting a new group. Placing the DO loop around the SET statement eliminates the need for the RETAIN statement and will eliminate this problem. You data already has a COUNT variable that could be used as the array index. Since you don't want to keep the variables used in the MONEY() array then perhaps you can make it a temporary array? However TEMPORARY arrays are retained so you will need to reset those variables.
data rfm162.fy16_array2;
do until (last.curr_wg_buid) ;
set rfm162.fy16_array;
by curr_wg_buid booked_date;
array bd(32) booked_date1-booked_date32;
array money(32) _temporary_ ;
array mbtwn(32) mbtwn1-mbtwn32;
format booked_date1-booked_date32 datetime20.;
bd(count)=booked_date;
money(count)=dmd;
if count > 1 then
mbtwn(count)=((bd(count)-bd(count-1))/2592000)
;
end;
dmd_total=sum(of money(*));
call missing(of money(*));
run;
11-10-2015 09:53 AM
Tom,
With your advice I've been able to make my code work correctly by clearing the arrays, but running the code you provided does not yield the same result. At this point I'm just curious as to why it doesn't, because it seems that it should. My code, essentially the same as before, but with commands to clear the arrays if i>count:
data rfm162.fy16_array2;
retain
booked_date1-booked_date32
dmd1-dmd32
mbtwn1-mbtwn32
;
set rfm162.fy16_array;
by curr_wg_buid booked_date;
format booked_date1-booked_date32 datetime20.;
array bd(32) booked_date1-booked_date32;
array money(32) dmd1-dmd32;
array mbtwn(32) mbtwn1-mbtwn32;
do i=1 to 32;
if i>count then do;
bd(i)=.;
money(i)=.;
mbtwn(i)=.;
continue;
end;
if i=count then do;
bd(i)=booked_date;
money(i)=dmd;
end;
if i>=2 then do;
m=i-1;
mbtwn(i)=((bd(i)-bd(m))/2592000);
end;
end;
dmd_total=sum(of dmd1-dmd32);
drop dmd1-dmd32;
if last.curr_wg_buid;
run;
--------------------
Comparing this to your code, it seems to me that the array variables must be retained from one iteration to the next. Your code returns for me only values for the first booked_date of each BUID, even though the count values for these rows suggest that it has dropped all but the last.curr_wg_buid, as it should. This only half makes sense to me. If it was going to only return one of the booked_dates, I would expect it to be that of the last booked_date for that BUID on record, which I would also expect to be placed in the booked_date(n) column associated with that count value. It is currently only populating booked_date1.
11-10-2015 10:34 AM
11-10-2015 11:23 AM - edited 11-10-2015 11:45 AM
Why do you have the DO loop at all if you are using FIRST./LAST. processing? If you just want to find the total of DMD you do not need an array.
data rfm162.fy16_array2;
retain
booked_date1-booked_date32
mbtwn1-mbtwn32
;
set rfm162.fy16_array;
by curr_wg_buid booked_date;
format booked_date1-booked_date32 datetime20.;
array bd booked_date1-booked_date32;
array mbtwn mbtwn1-mbtwn32;
if first.curr_wg_buid then call missing(of bd(*) mbtwn(*));
if first.curr_wg_buid then i=0;
i+1;
if i ne count then put 'NOTE: I and COUNT are not equal.' (_n_ i count) (=);
bd(i)=booked_date;
if i>=2 then do;
mbtwn(i)=((bd(i)-bd(i-1))/2592000);
end;
retain dmd_total ;
if first.curr_wg_buid then dmd_total=0;
dmd_total=sum(dmd_total,dmd);
if last.curr_wg_buid then output ;
run;
Need further help from the community? Please ask a new question.