Hi,
Would anyone know how to loop through dates in an array with formats such as 201501 for January 2015.
I am trying to loop through 6 years and get a total balance for all accounts for each year, please see the following code.
%let from=200701; %let to=201508;
data newdata (keep= _:);
set olddata end=Lastobs;
array month{*} balance_&from.-balance_&to.;
array out{*} _&from.-_&to.;
do i=1 to 104;
out{i}+sum(of month{i});
end;
if Lastobs;
run;
Hi @PriyaL,
The statement var &names; in my solution restricts the summations to variables with the prefix 'account_balance' and a year/month suffix within the specified date range. If there was no need to have the restriction to the date range (because all 'account_balance' variables in the input dataset are within that range), you could write var account_balance:; instead.
Simple answer here is don't (i.e. don't put data in column headers). The YYMM is data, you don;t want to be using that as variable names. You will just set yourself up for a lot of coding to handle this. Show an example (in the form of a datastep) some test data, and what you want out. I would personally suggest you use a normalised data structure, i.e. long rather than wide, and if you need to transpose later on, e.g.
... MONTH BALANCE
... 2015-01 2100
... 2015-02 2300
... 2015-03 4300
...
Using the above you can then simple use aggregates on the data to sum things up. Obviously without seeing the data its hard to say.
Here is an example of working with normalised data:
data have; informat timepoint yymmn6.; format timepoint yymmn6.; input timepoint balance; datalines; 201501 2300 201504 4600 201505 5100 201506 100 201601 9300 201602 800 ; run; proc sql; create table WANT as select distinct year(TIMEPOINT) as YEAR, sum(BALANCE) as SUM_BALANCE from HAVE group by year(TIMEPOINT); quit;
The colomn headers simply depict the data that is in the month, so for coloumn "balance_201305" that will be a coloumn of all customer balances in May 2013.
So I guess you are suggesting I change the layout of the inital data and sum through the row and create a Sum Total field at the end.
Can this be achieved with the Proc Transpose function in SAS?
Thanks
The key here is the step before this. I.e. you don't want to end up with a dataste with data in column headings. What does your data look like before summing up, post a test example as a datastep with some made up bits if needed.
The reason being is that Balance just references a column of balance data, the 201305 is data itself that you need to use. If you look at transposed dataset, you will see they generally have the form of <prefix><incrementor>, so COL1, where in this example COL is the prefix and 1 is in the incrementor and could be 2,3 etc. Then when dealing with columns you can select the whole range by using COL:, or COL1-COLx. You cannot do this directly with data in thee top as it may not be sequential or logical.
Here is an idea of the data
Account_Number | account_balance201404 | account_balance201405 | account_balance201406 | account_balance201407 | account_balance201408 | account_balance201409 |
134897578 | 70077.71 | 69721.07 | 69363.77 | 69005.82 | 68647.21 | 68287.94 |
198994889 | 200682.6 | 200682.5 | 200682.4 | 200682.3 | 200682.19 | 200682.09 |
388499933 | 126449.9 | 125801.3 | 125151.6 | 124500.6 | 123848.46 | 123195.11 |
384902999 | 106735.5 | 106735.5 | 106735.5 | 106735.5 | 106735.49 | 106735.49 |
238499399 | 35464.27 | 35465.45 | 35466.64 | 35467.83 | 35469.02 | 35470.21 |
. | . | . | . | . | . | . |
. | . | . | . | . | . | . |
. | . | . | . | . | . | . |
There are several rows, and i am just looking to make 1 table with 2 rows so the first row is the Overall sums per month and the second row is the final sum.
So for example:
Jan-07 | Feb-07 | Mar-07 | Apr-07 | May-07 | Jun-07 | Jul-07 |
389488577 | 389497072 | 3.9E+08 | 3.9E+08 | 3.9E+08 | 389531052 | 389539547 |
You're already summing all accounts per month so I'm not sure how your 2nd row should look like (overall sum). Should this be the sum per year (=additional columns not additional row).
If you transpose your data from a wide to a long format then there are procedures like Proc Tabulate and Proc Report which provide a wide variety of options to give you the report you want.
data olddata;
input Account_Number:$16. account_balance200612 account_balance200701 account_balance201406 account_balance201407
account_balance201508 account_balance201509;
cards;
134897578 70077.71 69721.07 69363.77 69005.82 68647.21 68287.94
198994889 200682.6 200682.5 200682.4 200682.3 200682.19 200682.09
388499933 126449.9 125801.3 125151.6 124500.6 123848.46 123195.11
384902999 106735.5 106735.5 106735.5 106735.5 106735.49 106735.49
238499399 35464.27 35465.45 35466.64 35467.83 35469.02 35470.21
;
run;
data long(drop= _: account_balance:);
set olddata;
format balance_dt date9. acc_amt best16.;
array acc_bal {*} account_balance:;
do _i=1 to dim(acc_bal);
acc_amt=acc_bal[_i];
balance_dt=input(compress(vname(acc_bal[_i]),,'kd'),yymmn6.);
format balance_dt date9.;
output;
end;
run;
Hi, @PriyaL,
I fully agree with @RW9 and @Reeza that you should really transpose your "OLDDATA" into a vertical structure in the future.
That said, if you want to summarize the existing OLDDATA, why don't you just apply PROC SUMMARY? If the only problem is the selection of columns within a date range, you could try this:
/* Create test data (please note that the first and last account balance are outside your date range) */
data olddata;
input Account_Number account_balance200612 account_balance200701 account_balance201406 account_balance201407 account_balance201508 account_balance201509;
cards;
134897578 70077.71 69721.07 69363.77 69005.82 68647.21 68287.94
198994889 200682.6 200682.5 200682.4 200682.3 200682.19 200682.09
388499933 126449.9 125801.3 125151.6 124500.6 123848.46 123195.11
384902999 106735.5 106735.5 106735.5 106735.5 106735.49 106735.49
238499399 35464.27 35465.45 35466.64 35467.83 35469.02 35470.21
;
/* Specify the date range (in YYYYMM format) */
%let from=200701; %let to=201508;
/* Create the list of variable names "within" the date range (macro var. NAMES) */
proc sql noprint;
select name
into :names separated by ' '
from dictionary.columns
where libname='WORK' & memname='OLDDATA' & upcase(name) eqt 'ACCOUNT_BALANCE'
& input("&from",yymmn6.) <= input(substr(name,16),yymmn6.) <= input("&to",yymmn6.);
quit;
/* Create a dataset with monthly totals */
proc summary data=olddata;
var &names;
output out=totals(drop=_:) sum=;
quit;
proc print data=totals;
run;
Your original data step code suggests that you would like to have simplified column names in the TOTALS dataset. But I think you will be better off if you don't rename but just label your variables accordingly. This can be achieved as follows:
/* Create label assignments (macro var. LABELS) */
proc sql noprint;
select catx('=',name,quote(put(input(substr(name,16),yymmn6.),monyy5.)))
into :labels separated by ' '
from dictionary.columns
where libname='WORK' & memname='TOTALS' & upcase(name) eqt 'ACCOUNT_BALANCE';
quit;
/* Apply label assignments to dataset TOTALS */
proc datasets lib=work nolist;
modify totals;
label &labels;
quit;
proc print data=totals label;
run;
It's not quite clear to me if this is all you need.
(Edit: Replaced tabs by spaces in the first data step to avoid issues depending on Enhanced Editor options.)
I think you are right as a proc summary is the easiest way as my data is already in the columns by month. So i would just need to sum through the columns I want to make a final dataset with 2 rows which contains the sum total of all account balances for all acounts in that month.
Is there a way to filter on proc summary just to keep the variables with the prefix 'account_balance' as my datasets contains several other variables because it is a master dataset.
Thanks.
Hi @PriyaL,
The statement var &names; in my solution restricts the summations to variables with the prefix 'account_balance' and a year/month suffix within the specified date range. If there was no need to have the restriction to the date range (because all 'account_balance' variables in the input dataset are within that range), you could write var account_balance:; instead.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.