Not applicable
Posts: 0

How to output a new data set contains sum of certain variables only

I got hundreds rows of raw data,
used sum function and
by statement (by two criteria),
how could I get a new data set only containing the total of each
variables,
and get rid of hundreds rows of raw data?

thanks a lot
SAS Super FREQ
Posts: 9,371

Re: How to output a new data set contains sum of certain variables only

Hi:
It sort of depends on what you mean by "only containing the total of each variable"?? For example, let's say I have this data:
[pre]
Name Sex Age Height Weight

Alfred M 14 69.0 112.5
Alice F 13 56.5 84.0
Barbara F 13 65.3 98.0
Carol F 14 62.8 102.5
Henry M 14 63.5 102.5
James M 12 57.3 83.0
Jane F 12 59.8 84.5
Janet F 15 62.5 112.5
Jeffrey M 13 62.5 84.0
John M 12 59.0 99.5
Joyce F 11 51.3 50.5
Judy F 14 64.3 90.0
Louise F 12 56.3 77.0
Mary F 15 66.5 112.0
Philip M 16 72.0 150.0
Robert M 12 64.8 128.0
Ronald M 15 67.0 133.0
Thomas M 11 57.5 85.0
William M 15 66.5 112.0
[/pre]

which actually happens to be the variables and observations in SASHELP.CLASS. How can you describe what you want in terms of these 19 observations??? Would you want to see the total of AGE, HEIGHT and WEIGHT across all observations -- which would be a summary dataset of 1 observation)??? Or, the total of AGE, HEIGHT and WEIGHT for each value of the SEX variable??? (resulting in a summary dataset of 2 observations)??

Different ways to create summary datasets are with PROC MEANS, PROC TABULATE, and PROC REPORT for just a start.

cynthia
Not applicable
Posts: 0

Re: How to output a new data set contains sum of certain variables only

Thanks Cynthia,

Let me use my example to make my question clearly.
for example, I have data set like this,
I want the report to contain the sum of duration based on each day and each EID like the example listed as following:

example of output:
EID date sumscale
982000099158158 5/12/2008 *****
982000099158158 5/13/2008 *****
982000099158165 5/27/2008 *****
982000099158165 5/28/2008 *****

Data set:
EID date scale duration headdown intake
982000099158158 5/12/2008 4 8 8 30
982000099158158 5/12/2008 3 220 120 60
982000099158158 5/12/2008 4 61 28 10
982000099158158 5/12/2008 3 43 18 0
982000099158158 5/12/2008 3 92 32 10
982000099158158 5/12/2008 3 66 22 0
982000099158158 5/12/2008 4 140 36 20
982000099158158 5/12/2008 4 68 60 10
982000099158158 5/13/2008 4 51 22 10
982000099158158 5/13/2008 4 16 10 120
982000099158158 5/13/2008 3 31 12 170
982000099158158 5/13/2008 3 12 10 60
982000099158158 5/13/2008 3 11 6 20
982000099158165 5/22/2008 27 84 56 0
982000099158165 5/23/2008 27 287 104 0
982000099158165 5/23/2008 27 93 52 0
982000099158165 5/23/2008 27 29 12 100
982000099158165 5/23/2008 27 92 74 310
982000099158165 5/23/2008 27 259 142 490
982000099158165 5/23/2008 27 242 138 630
982000099158165 5/23/2008 27 154 104 210
982000099158165 5/23/2008 27 129 74 450
982000099158165 5/23/2008 27 318 198 890
982000099158165 5/23/2008 27 546 350 1310
982000099158165 5/23/2008 27 53 38 240
982000099158165 5/23/2008 27 54 34 290
982000099158165 5/23/2008 27 402 260 1000
982000099158165 5/23/2008 27 492 282 800
982000099158165 5/24/2008 27 308 140 390
982000099158165 5/24/2008 27 1092 344 0
982000099158165 5/24/2008 27 199 94 3320
982000099158165 5/24/2008 27 525 150 530
SAS Super FREQ
Posts: 9,371

Re: How to output a new data set contains sum of certain variables only

Hi,
Just curious -- what does the ***** for SUMSCALE represent in your post??? Only the sum of DURATION??? Or, could that be the sum of the SCALE variable? You say you want the sum of DURATION, but what about the SCALE, HEADDOWN and INTAKE variables? So you want to see their sums as well???

Have you tried any SAS procedures already?? Do you already have a SAS dataset??? Is the date value a character variable or a numeric variable??

cynthia
Not applicable
Posts: 0

Re: How to output a new data set contains sum of certain variables only

hi, thanks for ur reply. the ***stand for the sum of the column based same date same eid, it could be sum of duration, sum of scale, sum of intake, sum of headdown. date is input as numeric variale. i used

proc print;
by eid date;
run;

got a output with the grand i want, but with the original data printed out as well.

SAS Super FREQ
Posts: 9,371

Re: How to output a new data set contains sum of certain variables only

Hi:
Have you looked at any of the following procedures: PROC MEANS, PROC TABULATE, PROC REPORT???

They will all create output datasets, summarized, as you desire. The syntax of each procedure is slightly different, but there are many examples in the documentation and in previous forum postings, such as this one, which showed how to get the SUM and the MEDIAN statistic:
or these:

cynthia
Not applicable
Posts: 0

Re: How to output a new data set contains sum of certain variables only

Hey, Cynthia,

I followed the example you gave me in the first link;
I used the proc report as following:

proc report data=amanagsdata nowd split='/'
out=work.report;
column EID date duration headdown consumed;
define EID /group;
define date /group;
define duration/sum 'sum/duration';
define consumed/sum 'sum/consumed';
run;

it works for me.
Thanks very much!!!!!
Have a good day and thanksgiving Message was edited by: Sasha
SAS Super FREQ
Posts: 9,371

Re: How to output a new data set contains sum of certain variables only

Hi:
I'm glad you got PROC REPORT working. PROC MEANS could also give you multiple summed variables, as shown in the code below (along with a TABULATE for comparison of what the different syntax and output datasets look like).

Just an FYI -- the label that you specify for the PROC REPORT step does NOT get used in WORK.REPORT when it gets created.

cynthia
[pre]
proc means data=amanagsdata nway sum;
title 'PROC MEANS';
class EID date;
output out=work.mnout sum=sumscale sumdur sumhdwn sumcons;
format date mmddyy10.;
run;

proc tabulate data=amanagsdata f=comma9.
title 'PROC TABULATE';
class EID date;
table EID*date,
sum*(scale duration headdown consumed) / rts=50;
format date mmddyy10.;
run;

proc report data=amanagsdata nowd
out=work.repout;
title 'PROC REPORT';
column EID date scale duration headdown consumed;
define EID / group;
define date / group order=internal f=mmddyy10.;
define scale / sum;
define duration / sum;
define consumed / sum;
run;
[/pre]
Super Contributor
Posts: 366

Re: How to output a new data set contains sum of certain variables only

Hello Sasha,

Assume that a name for your input dataset is A then possible solution for sum of Scale grouped by EID and Date could be like this:
[pre]
proc means data=a nway noprint;
output out=sumscale SUM=sumscale;
var scale;
class EID date;
run;
[/pre]
Output data set is SUMSCALE.
Sincerely,
SPR
Not applicable
Posts: 0

Re: How to output a new data set contains sum of certain variables only

Hi SPR,

thanks a lot!. That works, but I guess if I want to sumof scale, sum of intake, sum of duration showing up at same time, I need do three proc means, and get three output data set, correct?
Super Contributor
Posts: 366

Re: How to output a new data set contains sum of certain variables only

It depends... You can have a single proc MEANS with

var SCALE INTAKE DURATION;

Then you will get all three sums in one datasset.

SPR
Valued Guide
Posts: 2,191

Re: How to output a new data set contains sum of certain variables only

Sasha
as SPR says yu can get all your totals in one pass of proc means.

If your request to reduce the data to totals and "by variables" then try:
proc means data= your.data missing nway ;
class by_var1 by_var2 ;
var _numeric_ ;
output out= totals_dataset sum= ;
run ;
You have to replace your.data and by_var1 by_var2 with your context.
You do not need to name each numeric variable if all are required to be totalled, but replace _numeric_ with a reduced list if you want only a few of the numerics totalled. (class vars won't be totalled anyway)
That provides an output data set the example names totals_dataset, but change that to whatever you prefer.
Fundamentally simple for simple requests. - that's SAS

peterC
Discussion stats
• 11 replies
• 179 views
• 0 likes
• 4 in conversation