Beginning SAS programmer.
Given data like (region populated after a merge):
identifer | payout | region |
---|---|---|
1 | 10 | 1 |
2 | 20 | 2 |
3 | 10 | 1 |
4 | 20 | 3 |
5 | 10 | 1 |
6 | 20 | 2 |
7 | 10 | 3 |
8 | 20 | 3 |
9 | 10 | 2 |
10 | 20 | 4 |
11 | 10 | 4 |
12 | 20 | 2 |
I want output like:
region | count identifier | sum payout |
---|---|---|
1 | 3 | 30 |
2 | 4 | 70 |
3 | 3 | 50 |
4 | 2 | 30 |
I know SQL and I can do this easily with Proc SQL:
PROC SQL;
SELECT region, COUNT(identifier), SUM(payout)
FROM my_data
GROUP BY region
ORDER BY region;
QUIT;
But I want to do it the proper SAS way and use Proc Means or Proc Tabulate or Proc something.
Can someone please give me the lesson?
And, in the interest of completeness, PROC REPORT and PROC TABULATE will also give you equivalent results. If you want a SAS dataset instead of just a report, then you can use the OUT= option with those two procedures, as well (and the OUTPUT statement with PROC MEANS/SUMMARY, as shown by Art's example). The difference between Art's example, and these is that my procedures create reports in the HTML file. These procedures could create output datasets, as well.
cynthia
data mydata;
infile datalines;
input identifer payout region ;
return;
datalines;
1 10 1
2 20 2
3 10 1
4 20 3
5 10 1
6 20 2
7 10 3
8 20 3
9 10 2
10 20 4
11 10 4
12 20 2
;
run;
ods html file='c:\temp\mytables.html';
proc means data=mydata sum;
class region;
var payout;
run;
proc tabulate data=mydata f=6. out=work.tabout;
class region;
var payout;
tables region, payout*(N sum);
run;
proc report data=mydata nowd out=work.repout;
column region payout payout=paysum;
define region / group style(column)=Header;
define payout / n 'Count Indicator';
define paysum / sum 'Payout Sum';
run;
ods html close;
proc sql is proper sas. However, to get proc summary to produce the same file you could use:
data my_data;
input identifer payout region;
cards;
1 10 1
2 20 2
3 10 1
4 20 3
5 10 1
6 20 2
7 10 3
8 20 3
9 10 2
10 20 4
11 10 4
12 20 2
;
proc summary data=my_data nway;
var payout;
class region;
output out=want (drop=_:)
n=count_identifier
sum=sum_payout;
run;
And, in the interest of completeness, PROC REPORT and PROC TABULATE will also give you equivalent results. If you want a SAS dataset instead of just a report, then you can use the OUT= option with those two procedures, as well (and the OUTPUT statement with PROC MEANS/SUMMARY, as shown by Art's example). The difference between Art's example, and these is that my procedures create reports in the HTML file. These procedures could create output datasets, as well.
cynthia
data mydata;
infile datalines;
input identifer payout region ;
return;
datalines;
1 10 1
2 20 2
3 10 1
4 20 3
5 10 1
6 20 2
7 10 3
8 20 3
9 10 2
10 20 4
11 10 4
12 20 2
;
run;
ods html file='c:\temp\mytables.html';
proc means data=mydata sum;
class region;
var payout;
run;
proc tabulate data=mydata f=6. out=work.tabout;
class region;
var payout;
tables region, payout*(N sum);
run;
proc report data=mydata nowd out=work.repout;
column region payout payout=paysum;
define region / group style(column)=Header;
define payout / n 'Count Indicator';
define paysum / sum 'Payout Sum';
run;
ods html close;
Hi Art,
We came up with almost identicle code.
data have;
input identifer payout region;
cards;
1 10 1
2 20 2
3 10 1
4 20 3
5 10 1
6 20 2
7 10 3
8 20 3
9 10 2
10 20 4
11 10 4
12 20 2
;
proc summary nway;
class region;
var payout;
output out=want(drop=_:) n=count_identifer sum=sum_payout;
run;
proc print data=want;run;
Obs region identifer payout
1 1 3 30
2 2 4 70
3 3 3 50
4 4 2 30
Linlin: be careful or you might end up becoming a psychologist
I am beginer to SAS and you just solved my problem. Thanks for this code.
Use proc summary. The nway options gives only the region total, without you a grand total would be created as we
Of course, if we're going for compleness, there is also the datastep approach:
proc sort data=my_data;
by region;
run;
data want (drop=identifer payout);
set my_data;
by region;
if first.region then do;
sum_payout=payout;
count_identifier=1;
end;
else do;
sum_payout+payout;
count_identifier+1;
end;
if last.region then output;
run;
proc print;
run;
True! I generally go there when REPORT, TABULATE and MEANS won't do what I want. 😉
cynthia
Okay, in the rough order received:.
art:
proc summary data=my_data nway;
var payout;
class region;
output out=want (drop=_:)
n=count_identifier
sum=sum_payout;
run;
This generated no output and was frustrating. See below.
cynthia:
proc report data=mydata nowd out=work.repout;
column region payout payout=paysum;
define region / group style(column)=Header;
define payout / n 'Count Indicator';
define paysum / sum 'Payout Sum';
run;
This introduced a whole new window to me. Interesting. Helpful- introduces me to something new.
cynthia:
proc tabulate data=mydata f=6. out=work.tabout;
class region;
var payout;
tables region, payout*(N sum);
run;
This is what I had in mind and thought was the answer. Full marks. I used (N*F=COMMA12. sum*F=DOLLAR12.)
linlin:
art'a post plus
proc print data=want;
run;
Aha! Output! I told you I was beginner. Helpful. Makes Art's answer helpful.
art:
data want (drop=identifer payout);
set my_data;
by region;
if first.region then do;
sum_payout=payout;
count_identifier=1;
end;
else do;
sum_payout+payout;
count_identifier+1;
end;
if last.region then output;
run;
proc print;
run;
Oh man! This is code that looks like something I could get behind and understand. I haven't been able to make it work but I'm looking at it. Helpful- introduces me to something new.
Thanks to all.
What would be the easiest way to do it if you have more fields?
I have station, ssn, invoice number, line number and amount
I want to end up with each unique station number with a count of ssn's, a count of their invoices, a count of line items, and a total cost?
I'm not having much luck with it
List these variables:
proc summary data=my_data nway;
var payout;
class region;
output out=want (drop=_:)
n(ssn num)=ssn_x num_x
sum(ssn num)=ssn_sum num_sum;
run;
Hi Marilyn,
This code may helps to answer your question.
proc means data = have nway noprint;
var payout ssn invoice_num line_num;
class region;
output out= want (drop= _:)
n(ssn invoice_num line_num) = ssn_count invoice_count line_count
sum(amount) = cost;
run;
It will create only output dataset not the report (printed report). If you want report then remove the "noprint" option in proc means statement.
Hope this works fine!!
Thanks,
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.