Solved: Re: Calculated field based on row values within class variable

jess11111111 · Posted 09-15-2017 03:08 PM

Here's a sample table:

acct Transcode Amount

1234 400 5000

1234 110 2000

1234 410 400

1234 1160 5

desired output:

acct Calc

1234 5000

The goal is to create a new calculated field based not only on information within the row, but also on info in other rows within a class variable (here, Acct). There are many accounts, "Acct" being a unique ID for one. This table houses all transactions taking place within an account. I need to sum up the amounts based on information in other rows of that same acct. For example, subtract amount of TransCode 110 from amount of TransCode 400 only when amount of TransCode 410 >= amount of TransCode 400. I.e. Take 5000-2000 if (400>=5000) else take 5000.

There are millions of unique accounts and about ten other similar conditions. Another example would be to add in the amount for 1160 only if no TransCode 400 is found. All of these additions and subtractions will come together in one field. That one number is then linked to the unique acct number and joined to another table where the acct number is not duplicated. So, this one acct 1234 would be reduced to one row with one value for the new calculated field.

Any help is appreciated. I don't have beginning code because I can't think of a reasonable way to begin. The concept is very doable on paper, but I cannot think of a way to apply this on a grand scale. Well, I can, but it involves many separate table pulls/merges and greatly increasing the number of fields.

Using SAS Enterprise Guide 7.13

Patrick · Posted 09-15-2017 10:40 PM

@jess11111111

Below code transposes your data to a single row per account. It stores the transaction amounts in an array of variables with a naming pattern of amt_<transcode>. This will allow you to directly use transaction code specific variable names in your conditions and formulas.

Having everything in a single row will also give you direct access to all the relevant data for your calculation at once.

I haven't fully understood how you're defining your conditions and if this is some sort of a decision tree or just a set of mutually exclusive rules. But having all the data on a single line might already be a step forward for you.

data have;
 infile datalines truncover;
 input acct $ Transcode $ Amount;
 datalines;
1234 400 5000
1234 110 2000
1234 410 400
1234 1160 5
5678 400 200
5678 110 100
5678 110 99
;
run;

/* 1 row per account and transaction */
proc sql;
  create view v_sumByTcode as
    select 
      acct,
      transcode,
      sum(amount) as amount
    from have
    group by acct, transcode
  ;
quit;

/* transpose to 1 row per account; variable with amount has transaction code in the name */
proc transpose data=v_sumByTcode out=inter(drop=_:) prefix=amt_;
  by acct;
  id Transcode;
  var Amount;
run;


data want;
  set inter;

  /* only apply rule if account got transactions for all codes involved in rule */
  if nmiss(amt_100, amt_400, amt_410) = 0 then
    do;
      /***
      For example, subtract amount of TransCode 110 from amount of TransCode 400 
      only when amount of TransCode 410 >= amount of TransCode 400. 
      I.e. Take 5000-2000 if (400>=5000) else take 5000. 
      **/
    end;

run;

View solution in original post

HB · Posted 09-15-2017 04:11 PM

People are hesitant to open unknown attachements these days and your attachment won't open for me in any event.

Perhaps you could post some sample data and the desired result from that data?

jess11111111 · Posted 09-15-2017 05:14 PM

I do understand the attachment concern. Edited.

Patrick · Posted 09-15-2017 10:40 PM

@jess11111111

Below code transposes your data to a single row per account. It stores the transaction amounts in an array of variables with a naming pattern of amt_<transcode>. This will allow you to directly use transaction code specific variable names in your conditions and formulas.

Having everything in a single row will also give you direct access to all the relevant data for your calculation at once.

I haven't fully understood how you're defining your conditions and if this is some sort of a decision tree or just a set of mutually exclusive rules. But having all the data on a single line might already be a step forward for you.

data have;
 infile datalines truncover;
 input acct $ Transcode $ Amount;
 datalines;
1234 400 5000
1234 110 2000
1234 410 400
1234 1160 5
5678 400 200
5678 110 100
5678 110 99
;
run;

/* 1 row per account and transaction */
proc sql;
  create view v_sumByTcode as
    select 
      acct,
      transcode,
      sum(amount) as amount
    from have
    group by acct, transcode
  ;
quit;

/* transpose to 1 row per account; variable with amount has transaction code in the name */
proc transpose data=v_sumByTcode out=inter(drop=_:) prefix=amt_;
  by acct;
  id Transcode;
  var Amount;
run;


data want;
  set inter;

  /* only apply rule if account got transactions for all codes involved in rule */
  if nmiss(amt_100, amt_400, amt_410) = 0 then
    do;
      /***
      For example, subtract amount of TransCode 110 from amount of TransCode 400 
      only when amount of TransCode 410 >= amount of TransCode 400. 
      I.e. Take 5000-2000 if (400>=5000) else take 5000. 
      **/
    end;

run;

jess11111111 · Posted 09-17-2017 03:18 PM

Thanks! This is where my thoughts were going - to get to one row for each account by creating all of those new columns. I was afraid of that necessity because of how many accounts I'll be working with...and many more transcodes/corresponding amounts than I mentioned here conceptually. Processing time is a concern because this will eventually become a daily load.

Also yes, this will be based on a bunch of mutually exclusive rules. Anyway, thanks; this definitely helps get my mind wrapped around what I'll have to do.

Calculated field based on row values within class variable

Re: Calculated field based on row values within class variable

Re: Calculated field based on row values within class variable

Re: Calculated field based on row values within class variable

Re: Calculated field based on row values within class variable

Re: Calculated field based on row values within class variable

Click image to register for webinar

Classroom Training Available!