Here's a sample table:
acct Transcode Amount
1234 400 5000
1234 110 2000
1234 410 400
1234 1160 5
desired output:
acct Calc
1234 5000
The goal is to create a new calculated field based not only on information within the row, but also on info in other rows within a class variable (here, Acct). There are many accounts, "Acct" being a unique ID for one. This table houses all transactions taking place within an account. I need to sum up the amounts based on information in other rows of that same acct. For example, subtract amount of TransCode 110 from amount of TransCode 400 only when amount of TransCode 410 >= amount of TransCode 400. I.e. Take 5000-2000 if (400>=5000) else take 5000.
There are millions of unique accounts and about ten other similar conditions. Another example would be to add in the amount for 1160 only if no TransCode 400 is found. All of these additions and subtractions will come together in one field. That one number is then linked to the unique acct number and joined to another table where the acct number is not duplicated. So, this one acct 1234 would be reduced to one row with one value for the new calculated field.
Any help is appreciated. I don't have beginning code because I can't think of a reasonable way to begin. The concept is very doable on paper, but I cannot think of a way to apply this on a grand scale. Well, I can, but it involves many separate table pulls/merges and greatly increasing the number of fields.
Using SAS Enterprise Guide 7.13
Below code transposes your data to a single row per account. It stores the transaction amounts in an array of variables with a naming pattern of amt_<transcode>. This will allow you to directly use transaction code specific variable names in your conditions and formulas.
Having everything in a single row will also give you direct access to all the relevant data for your calculation at once.
I haven't fully understood how you're defining your conditions and if this is some sort of a decision tree or just a set of mutually exclusive rules. But having all the data on a single line might already be a step forward for you.
data have;
infile datalines truncover;
input acct $ Transcode $ Amount;
datalines;
1234 400 5000
1234 110 2000
1234 410 400
1234 1160 5
5678 400 200
5678 110 100
5678 110 99
;
run;
/* 1 row per account and transaction */
proc sql;
create view v_sumByTcode as
select
acct,
transcode,
sum(amount) as amount
from have
group by acct, transcode
;
quit;
/* transpose to 1 row per account; variable with amount has transaction code in the name */
proc transpose data=v_sumByTcode out=inter(drop=_:) prefix=amt_;
by acct;
id Transcode;
var Amount;
run;
data want;
set inter;
/* only apply rule if account got transactions for all codes involved in rule */
if nmiss(amt_100, amt_400, amt_410) = 0 then
do;
/***
For example, subtract amount of TransCode 110 from amount of TransCode 400
only when amount of TransCode 410 >= amount of TransCode 400.
I.e. Take 5000-2000 if (400>=5000) else take 5000.
**/
end;
run;
People are hesitant to open unknown attachements these days and your attachment won't open for me in any event.
Perhaps you could post some sample data and the desired result from that data?
I do understand the attachment concern. Edited.
Below code transposes your data to a single row per account. It stores the transaction amounts in an array of variables with a naming pattern of amt_<transcode>. This will allow you to directly use transaction code specific variable names in your conditions and formulas.
Having everything in a single row will also give you direct access to all the relevant data for your calculation at once.
I haven't fully understood how you're defining your conditions and if this is some sort of a decision tree or just a set of mutually exclusive rules. But having all the data on a single line might already be a step forward for you.
data have;
infile datalines truncover;
input acct $ Transcode $ Amount;
datalines;
1234 400 5000
1234 110 2000
1234 410 400
1234 1160 5
5678 400 200
5678 110 100
5678 110 99
;
run;
/* 1 row per account and transaction */
proc sql;
create view v_sumByTcode as
select
acct,
transcode,
sum(amount) as amount
from have
group by acct, transcode
;
quit;
/* transpose to 1 row per account; variable with amount has transaction code in the name */
proc transpose data=v_sumByTcode out=inter(drop=_:) prefix=amt_;
by acct;
id Transcode;
var Amount;
run;
data want;
set inter;
/* only apply rule if account got transactions for all codes involved in rule */
if nmiss(amt_100, amt_400, amt_410) = 0 then
do;
/***
For example, subtract amount of TransCode 110 from amount of TransCode 400
only when amount of TransCode 410 >= amount of TransCode 400.
I.e. Take 5000-2000 if (400>=5000) else take 5000.
**/
end;
run;
Thanks! This is where my thoughts were going - to get to one row for each account by creating all of those new columns. I was afraid of that necessity because of how many accounts I'll be working with...and many more transcodes/corresponding amounts than I mentioned here conceptually. Processing time is a concern because this will eventually become a daily load.
Also yes, this will be based on a bunch of mutually exclusive rules. Anyway, thanks; this definitely helps get my mind wrapped around what I'll have to do.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.