Hi,
I have a dataset, work.PIC, which contains transactions ($ amounts) per Address and Date. This dataset is sorted by Address and Date.
Sample Dataset:
Address Date Amount
Property 1 31/01/2014 -770
Property 1 31/01/2014 -550
Property 1 31/01/2014 -20000
Property 1 31/01/2014 -52329.6
Property 1 31/07/2014 -660
Property 1 31/10/2014 -1738
Property 1 30/06/2015 -845.32
Property 1 30/06/2015 845.32
Property 1 30/06/2015 -845.32
Property 1 31/12/2015 845.32
Property 1 31/12/2015 -845.32
Property 1 31/12/2015 845.32
Property 2 31/01/2014 -1320
Property 2 31/01/2014 -2895.99
Property 2 31/01/2014 -653.32
Property 2 31/01/2014 -40000
Property 2 31/01/2014 -121954.8
Property 2 30/09/2015 -3500
Property 3 31/01/2014 -1200
Property 3 31/01/2014 -28080
Property 3 31/01/2014 -130483
Property 3 30/11/2015 -885
Property 3 31/12/2015 885
For each address, I would like to create a running total per Address, with output at each Date. I have accomplished this through the code below:
Current code:
data work.PIC_agg (keep = Address Date Value); set work.PIC; by Address Date; if First.Address then Value = 0; value + Amount; if Last.Date then output; run;
Problem:
This issue with the above is that it then only contains information for each property on a date where transactions occurred e.g. Property 1 will only have entries for Jan14, Jul14, Oct14, Jun15, and Dec15.
I would like to have a value for each date i.e. from Jan14 through to Dec15. If there is no transaction in a given month, then the value equals the previous month's total.
Proposed methodology:
Store each unique element of the Date variable in an array (DateArray)
Create a new variable for each element in DateArray
For each element in DataArray, sum over all transactions for a particular Address, where the Date<= DateArray
Store the resultant number in the correct variable for each Address
Any ideas on where the above methodology is sound, and, if so, how to implement. If not, could someone perhaps suggest a better method?
Please let me know if the above is unclear.
Many thanks!
Hi,
This newly modified code, has a running total (value), as well as, original monthy total
data have;
length address $10 date 8 amount 8;
format date ddmmyy10.;
INPUT @1 address $10. @12 date :ddmmyy10. amount :best.;
datalines;
Property 1 31/01/2014 -770
Property 1 31/01/2014 -550
Property 1 31/01/2014 -20000
Property 1 31/01/2014 -52329.6
Property 1 31/07/2014 -660
Property 1 31/10/2014 -1738
Property 1 30/06/2015 -845.32
Property 1 30/06/2015 845.32
Property 1 30/06/2015 -845.32
Property 1 31/12/2015 845.32
Property 1 31/12/2015 -845.32
Property 1 31/12/2015 845.32
Property 2 31/01/2014 -1320
Property 2 31/01/2014 -2895.99
Property 2 31/01/2014 -653.32
Property 2 31/01/2014 -40000
Property 2 31/01/2014 -121954.8
Property 2 30/09/2015 -3500
Property 3 31/01/2014 -1200
Property 3 31/01/2014 -28080
Property 3 31/01/2014 -130483
Property 3 30/11/2015 -885
Property 3 31/12/2015 885
;
run;
/* Create aggregate by address,date */
proc summary data=have nway;
class address date;
var amount;
output out=have_sum(drop=_:) sum=;
run;
/* For every address, ensure there are no monthly gaps */
data want(KEEP=address date amount value);
length curr_dt curr_amt curr_val prev_dt prev_amt value prev_val 8;
retain curr_dt curr_amt curr_val prev_dt prev_amt value prev_val;
format curr_dt prev_dt ddmmyy10.;
do until (last.address);
SET have_sum;
by address date;
if (first.address) then
do;
value = amount;
output; /* ensure it doesn't get left out */
end;
else if (first.date) then
value + amount;
prev_dt= lag(date);
prev_val= lag(value);
prev_amt= lag(amount);
put prev_dt= prev_val= prev_amt=;
/* Find the monthly gap */
diff= INTCK('month',prev_dt,date);
*put diff=;
if (diff > 0) then
do; /* Fill-in missing months */
/* Preserve current record values */
curr_dt = date;
curr_val = value;
curr_amt = amount;
/* Output missing months */
do i=1 to (diff-1) by 1;
date=intnx('month',prev_dt,i,'E');
amount=prev_amt;
value=prev_val;
OUTPUT;
end;
/* Don't forget current record */
date=curr_dt;
amount=curr_amt;
value=curr_val;
OUTPUT;
end; /* Fill-in missing months */
end;
run;
Proc print data=want; run;
Hope if gives you what you are looking for,
Ahmed
You say you have a variable PERIOD, but you haven't shown us anything about that variable. You'll likely get more responses if you show the forum your example have and want datasets .. each in the form of datasteps and posted using the {i} icon.
Art, CEO, AnalystFinder.com
Hi, this uses an array but not in the way (I think) that you proposed. There probably is a more efficient INTNX-related metod, but this fills in the missing months ...
* aggregate amounts by address and date;
proc summary data=x nway;
class address date;
var amount;
output out=xx (drop=_:) sum=;
run;
* fill in the missing months;
data y;
array _a(0:1000);
do j=1 by 1 until (last.address);
set xx;
by address;
if first.address then first = date;
_a(date - first) = amount;
end;
do j=0 to (date - first);
value = sum(value,_a(j));
date = first+j;
if day(date+1) eq 1 then output;
end;
keep address date value;
run;
Portion of output ...
DATA SET: y
address date value
Property1 31/01/2014 -73649.60
Property1 28/02/2014 -73649.60
Property1 31/03/2014 -73649.60
Property1 30/04/2014 -73649.60
Property1 31/05/2014 -73649.60
Property1 30/06/2014 -73649.60
Property1 31/07/2014 -74309.60
Property1 31/08/2014 -74309.60
Property1 30/09/2014 -74309.60
Property1 31/10/2014 -76047.60
Property1 30/11/2014 -76047.60
Property1 31/12/2014 -76047.60
Property1 31/01/2015 -76047.60
Property1 28/02/2015 -76047.60
Property1 31/03/2015 -76047.60
Property1 30/04/2015 -76047.60
Property1 31/05/2015 -76047.60
Property1 30/06/2015 -76892.92
Property1 31/07/2015 -76892.92
Property1 31/08/2015 -76892.92
Property1 30/09/2015 -76892.92
Property1 31/10/2015 -76892.92
Property1 30/11/2015 -76892.92
Property1 31/12/2015 -76047.60
For maximum date in the data set for all PROPERTIES ...
data y;
retain max_date;
array _a(0:1000);
do j=1 by 1 until (last.address);
set xx;
by address;
if first.address then first = date;
_a(date - first) = amount;
max_date = max(max_date, date);
end;
do j=0 to max_date-first;
value = sum(value,_a(j));
date = first+j;
if day(date+1) eq 1 then output;
end;
keep address date value;
run;
Send me an email ... msz03albany.edu ... and I'll do the walk through for you.
Hi, still needs a couple fixes, only worked since the max date is in the first group of addresses.
Will fix later and send to you. My email is msz03@albany.edu
Hi,
I'm looking at this again today, and just noticed I'm getting some strange results.
If I use the dataset below:
Address Date Amount
1000 Sumac Drive 31/05/2014 250 1000 Sumac Drive 30/06/2014 184252 1000 Sumac Drive 31/07/2014 4100 1000 Sumac Drive 31/08/2014 2572 1000 Sumac Drive 28/02/2015 402.98 1000 Sumac Drive 31/08/2015 -933.87 1000 Sumac Drive 30/09/2015 6520.72 1000 Sumac Drive 31/12/2015 -2632.98
I'm getting the following output:
1000 Sumac Drive 31/05/2014 -2382.98 1000 Sumac Drive 30/06/2014 181869.02 1000 Sumac Drive 31/07/2014 185969.02 1000 Sumac Drive 31/08/2014 188541.02 1000 Sumac Drive 30/09/2014 188541.02 1000 Sumac Drive 31/10/2014 188541.02 1000 Sumac Drive 30/11/2014 188541.02 1000 Sumac Drive 31/12/2014 188541.02 1000 Sumac Drive 31/01/2015 188541.02 1000 Sumac Drive 28/02/2015 188944 1000 Sumac Drive 31/03/2015 188944 1000 Sumac Drive 30/04/2015 188944 1000 Sumac Drive 31/05/2015 188944 1000 Sumac Drive 30/06/2015 188944 1000 Sumac Drive 31/07/2015 188944 1000 Sumac Drive 31/08/2015 188010.13 1000 Sumac Drive 30/09/2015 194530.85 1000 Sumac Drive 31/10/2015 194530.85 1000 Sumac Drive 30/11/2015 194530.85 1000 Sumac Drive 31/12/2015 191897.87 1000 Sumac Drive 31/01/2016 191897.87 1000 Sumac Drive 29/02/2016 191897.87 1000 Sumac Drive 31/03/2016 191897.87 1000 Sumac Drive 30/04/2016 191897.87 1000 Sumac Drive 31/05/2016 191897.87 1000 Sumac Drive 30/06/2016 191897.87 1000 Sumac Drive 31/07/2016 191897.87 1000 Sumac Drive 31/08/2016 191897.87 1000 Sumac Drive 30/09/2016 191897.87 1000 Sumac Drive 31/10/2016 191897.87 1000 Sumac Drive 30/11/2016 191897.87 1000 Sumac Drive 31/12/2016 191897.87 1000 Sumac Drive 31/01/2017 191897.87 1000 Sumac Drive 28/02/2017 191897.87 1000 Sumac Drive 31/03/2017 191897.87 1000 Sumac Drive 30/04/2017 191897.87 1000 Sumac Drive 31/05/2017 191897.87 1000 Sumac Drive 30/06/2017 191897.87 1000 Sumac Drive 31/07/2017 191897.87
I can't work out why it is misbehaving?
Any insight would be amazing!
Thanks!
Hi Mike,
Would the following work for you?
data have; length address $10 date 8 amount 8; format date ddmmyy10.; INPUT @1 address $10. @12 date :ddmmyy10. amount :best.; datalines; Property 1 31/01/2014 -770 Property 1 31/01/2014 -550 Property 1 31/01/2014 -20000 Property 1 31/01/2014 -52329.6 Property 1 31/07/2014 -660 Property 1 31/10/2014 -1738 Property 1 30/06/2015 -845.32 Property 1 30/06/2015 845.32 Property 1 30/06/2015 -845.32 Property 1 31/12/2015 845.32 Property 1 31/12/2015 -845.32 Property 1 31/12/2015 845.32 Property 2 31/01/2014 -1320 Property 2 31/01/2014 -2895.99 Property 2 31/01/2014 -653.32 Property 2 31/01/2014 -40000 Property 2 31/01/2014 -121954.8 Property 2 30/09/2015 -3500 Property 3 31/01/2014 -1200 Property 3 31/01/2014 -28080 Property 3 31/01/2014 -130483 Property 3 30/11/2015 -885 Property 3 31/12/2015 885 ; run; /* Using the /SPARSE option to fill-in the missing dates for the properties */ proc freq data=have; table address*date / SPARSE out=all list noprint; run; /* Combine the original data with only the filler records (count=0) */ data final_v; SET have all(where=(count=0)); run; /* Produce the final aggregation table */ proc summary data=final_v nway; class address date; var amount; output out=want(drop=_:) sum=; run;
Hope this helps,
Ahmed
@MikeFranz wrote:
Hey, thanks for the response. It says "Proc freq not found". Is it part of Base SAS?
Yes it is. Check your code for other errors.
Hi,
My code is:
proc freq data=work.pic; table address*Date / SPARSE out=all list noprint; run;
which is copy paste from the solution, with only the dataset name changed.
I'm getting the following errors:
ERROR: The FREQ procedure cannot be run. ERROR: Procedure FREQ not found.
I have been not been able to run PROC FREQ in the past. I had just assumed it was because it wasn't part of Base SAS.
If it helps, running PROC SETINIT yields the following:
1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK; 55 56 PROC SETINIT; 57 58 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK; Original site validation data Current version: 9.04.01M3P062415 System birthday: 19JAN2017. Operating System: WX64_SV . Product expiration dates: ---Base SAS Software 14JAN2018 ---SAS/GRAPH 14JAN2018 ---SAS Integration Technologies 14JAN2018 ---SAS/Secure 168-bit 14JAN2018 ---SAS/Secure Windows 14JAN2018 ---SAS/ACCESS Interface to PC Files 14JAN2018 ---SAS/ACCESS Interface to ODBC 14JAN2018 ---SAS Workspace Server for Local Access 14JAN2018 ---High Performance Suite 14JAN2018 ---SAS Search and Indexing Server 14JAN2018 ---SAS Web Crawler Server 14JAN2018 ---SAS LASR Analytic Server 14JAN2018 ---SAS Visual Analytics Hub 14JAN2018 ---SAS Visual Analytics Services 14JAN2018 ---SAS Visual Analytics Base 13JAN2018 ---Advanced Programming for LASR Analytic Server 14JAN2018 ---SAS Visual Analytics Server Components 14JAN2018 ---Visual Analytics Explorer 14JAN2018
Thanks!
Contact SAS tech support. They'll probably just walk you through re-installing SAS, but you definitey should have access to proc freq. I have to wonder what else you might be missing.
Art, CEO, AnalystFinder.com
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.