Hello,
I was asked to calculate the sum of var1 through var6 and that per category. But now they want the sum of var5 and the sum of var6 and they want that sum reported in CAT2 and CBT1 respectively.
Is there a more efficient way to carry out that task
see the code below.
data test1;
infile datalines delimiter=',';
input category $ TransType $ var1 var2 var3 var4 var5 var6 ;
datalines;
CAT1, DIR, 100.00, 8.49, 0.08, 1.00, 0.00, 0.00
CAT2, DIR, 200.00, 2.12, 0.08, 1.00, 0.20, 0.02
CAT3, DIR, 800.00, 67.95, 0.08, 1.00, 0.20, 0.02
CAT4, DIR, 100.00, 10.19, 0.08, 1.00, 0.00, 0.00
CAT5, DIR, 700.00, 63.70, 0.08, 1.00, 0.20, 0.02
CBT1, DIR, 50.00, 4.25, 0.08, 1.00, 0.00, 0.00
CBT2, DIR, 300.00, 25.48, 0.08, 1.00, 0.25, 0.02
CBT3, DIR, 900.00, 76.44, 0.08, 1.00, 0.25, 0.02
CBT4, DIR, 100.00, 8.49, 0.08, 1.00, 0.00, 0.00
;
run;
data test1;
format category $8. TransType $8. mjcategory $8. var1 var2 var3 var4 var5 var6 f7.2 ;
set test1;
if substr(category,1,2) eq 'CA' then mjcategory = 'CAT2';
if substr(category,1,2) eq 'CB' then mjcategory = 'CBT1';
run;
proc sql;
create table test2 as
select a.*
,sum(var5) as var7
,sum(var6) as var8
from test1 as a
group by mjcategory;
quit;
Data test2 (drop= var7 var8 mjcategory);
set test2;
if category eq mjcategory then
do;
var5=var7;
var6=var8;
end;
Else
do;
var5=0;
var6=0;
end;
run;
I do not understand what you are trying to do. Is there some business logic behind this?
If you want to treat CAT1,CAT2,... as one category and CBT1,CBT2,.. as other then simply using a format would be simpler.
data have;
input category $ TransType $ var1 var2 var3 var4 var5 var6 ;
datalines;
CAT1 DIR 100.00 8.49 0.08 1.00 0.00 0.00
CAT2 DIR 200.00 2.12 0.08 1.00 0.20 0.02
CAT3 DIR 800.00 67.95 0.08 1.00 0.20 0.02
CAT4 DIR 100.00 10.19 0.08 1.00 0.00 0.00
CAT5 DIR 700.00 63.70 0.08 1.00 0.20 0.02
CBT1 DIR 50.00 4.25 0.08 1.00 0.00 0.00
CBT2 DIR 300.00 25.48 0.08 1.00 0.25 0.02
CBT3 DIR 900.00 76.44 0.08 1.00 0.25 0.02
CBT4 DIR 100.00 8.49 0.08 1.00 0.00 0.00
;
proc summary data=have nway;
class category / mlf ;
format category $3. ;
var var1-var6 ;
output out=two_categories sum=;
run;
Results
Obs category _TYPE_ _FREQ_ var1 var2 var3 var4 var5 var6 1 CAT 1 5 1900 152.45 0.40 5 0.6 0.06 2 CBT 1 4 1350 114.66 0.32 4 0.5 0.04
Did you leave out PROC FORMAT from your code?
I agree, a custom format is the best way to do this.
For now I am just use the built in $3 format.
I could not figure out what the heck the original posted actually wanted.
Is there some business logic behind this? I am affraid not. I was asked to calculate the sum of var1 - var6 and that, per category. Then because they don't like fraction in var5 and var6, their requirement is now to rollup the sum for var5 and var6 and to report it as cat2 and cbt1. (see the obtained dataset below)
category | TransType | var1 | var2 | var3 | var4 | var5 | var6 |
CAT3 | DIR | 800.00 | 67.95 | 0.08 | 1.00 | 0.00 | 0.00 |
CAT5 | DIR | 700.00 | 63.70 | 0.08 | 1.00 | 0.00 | 0.00 |
CAT2 | DIR | 200.00 | 2.12 | 0.08 | 1.00 | 0.60 | 0.06 |
CAT1 | DIR | 100.00 | 8.49 | 0.08 | 1.00 | 0.00 | 0.00 |
CAT4 | DIR | 100.00 | 10.19 | 0.08 | 1.00 | 0.00 | 0.00 |
CBT2 | DIR | 300.00 | 25.48 | 0.08 | 1.00 | 0.00 | 0.00 |
CBT1 | DIR | 50.00 | 4.25 | 0.08 | 1.00 | 0.50 | 0.04 |
CBT4 | DIR | 100.00 | 8.49 | 0.08 | 1.00 | 0.00 | 0.00 |
CBT3 | DIR | 900.00 | 76.44 | 0.08 | 1.00 | 0.00 | 0.00 |
So you have the detailed summary already:
data summary;
input category $ TransType $ var1 var2 var3 var4 var5 var6 ;
datalines;
CAT1 DIR 100.00 8.49 0.08 1.00 0.00 0.00
CAT2 DIR 200.00 2.12 0.08 1.00 0.20 0.02
CAT3 DIR 800.00 67.95 0.08 1.00 0.20 0.02
CAT4 DIR 100.00 10.19 0.08 1.00 0.00 0.00
CAT5 DIR 700.00 63.70 0.08 1.00 0.20 0.02
CBT1 DIR 50.00 4.25 0.08 1.00 0.00 0.00
CBT2 DIR 300.00 25.48 0.08 1.00 0.25 0.02
CBT3 DIR 900.00 76.44 0.08 1.00 0.25 0.02
CBT4 DIR 100.00 8.49 0.08 1.00 0.00 0.00
;
And you want to collapse VAR5 and VAR6 to have only one non-zero value that is the sum of all the values in that group.
You can get there directly with PROC SQL code like this:
proc sql ;
create table want as
select category, transtype
, var1 ,var2 ,var3 ,var4
, case when (category in ('CAT2' 'CBT1')) then sum(var5) else 0 end as var5
, case when (category in ('CAT2' 'CBT1')) then sum(var6) else 0 end as var6
from summary
group by substr(category,1,3)
order by category
;
quit;
Which takes advantage of the fact that PROC SQL will re-merge the aggregate values onto all of the observations for you. As noted in the log:
NOTE: The query requires remerging summary statistics back with the original data.
Results
Trans Obs category Type var1 var2 var3 var4 var5 var6 1 CAT1 DIR 100 8.49 0.08 1 0.0 0.00 2 CAT2 DIR 200 2.12 0.08 1 0.6 0.06 3 CAT3 DIR 800 67.95 0.08 1 0.0 0.00 4 CAT4 DIR 100 10.19 0.08 1 0.0 0.00 5 CAT5 DIR 700 63.70 0.08 1 0.0 0.00 6 CBT1 DIR 50 4.25 0.08 1 0.5 0.04 7 CBT2 DIR 300 25.48 0.08 1 0.0 0.00 8 CBT3 DIR 900 76.44 0.08 1 0.0 0.00 9 CBT4 DIR 100 8.49 0.08 1 0.0 0.00
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Check out this tutorial series to learn how to build your own steps in SAS Studio.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.