turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- General Programming
- /
- summing values down observations

Topic Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

08-17-2014 05:44 PM

I would like to sum values for a set of observations, for a specific column based on a specific identifier. For example, suppose i have data like the below

A 4 5 6

B 3 3 2

A 3 4 2

C 3 2 0

B 3 7 3

B 2 4 1

Suppose i want to sum all of the values by the identifier in column 1, so i have totals for A, B and C by a specific column of my choice (2, 3 or 4) Then in a separate data set just output either A, B or C that is beyond a certain criteria.

For example, i want the sums of column 4 (based on identifier in column 1) which are above the value of 1, my output data set should return A = 8, B = 6, and obviously nothing for C as its zero.

I am open to Proc SQL, or the data step, in fact would be useful to know both methods.

Thanks in advance.

Accepted Solutions

Solution

08-17-2014
09:26 PM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to kkzs

08-17-2014 09:26 PM

Both solutions above will drop the Var1=C case. If you want to see it in the result, try one of these:

**data have;**

**input var1 $ var2 var3 var4;**

**datalines;**

**A 4 5 6**

**B 3 3 2**

**A 3 4 2**

**C 3 2 0**

**B 3 7 3**

**B 2 4 1**

**;**

**proc sql;**

**create table want as**

**select var1, sum(case when var4>1 then var4 else 0 end) as totalVar4**

**from have**

**group by var1;**

**select * from want;**

**quit;**

**proc sort data=have; by var1; run;**

**data want2;**

**totalVar4 = 0;**

**do until(last.var1);**

** set have; by var1;**

** if var4>1 then totalVar4 + var4;**

** end;**

**keep var1 totalVar4;**

**run;**

**proc print data=want2 noobs; run;**

PG

PG

All Replies

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to kkzs

08-17-2014 06:18 PM

Use proc means or summary.

Proc means data= have sum;

where var4>1;

class var1;

var var4;

output out=want sum=;

run;

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to kkzs

08-17-2014 06:47 PM

Using Proc SQL

data have;

input var1 $ var2 var3 var4;

datalines;

A 4 5 6

B 3 3 2

A 3 4 2

C 3 2 0

B 3 7 3

B 2 4 1

;

proc sql;

create table want as

select var1,sum(var4) as total_var4

from have

where var4>1

group by var1;

quit;

Solution

08-17-2014
09:26 PM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to kkzs

08-17-2014 09:26 PM

Both solutions above will drop the Var1=C case. If you want to see it in the result, try one of these:

**data have;**

**input var1 $ var2 var3 var4;**

**datalines;**

**A 4 5 6**

**B 3 3 2**

**A 3 4 2**

**C 3 2 0**

**B 3 7 3**

**B 2 4 1**

**;**

**proc sql;**

**create table want as**

**select var1, sum(case when var4>1 then var4 else 0 end) as totalVar4**

**from have**

**group by var1;**

**select * from want;**

**quit;**

**proc sort data=have; by var1; run;**

**data want2;**

**totalVar4 = 0;**

**do until(last.var1);**

** set have; by var1;**

** if var4>1 then totalVar4 + var4;**

** end;**

**keep var1 totalVar4;**

**run;**

**proc print data=want2 noobs; run;**

PG

PG

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to kkzs

08-18-2014 05:08 AM

Hi,

The below gives you 1 summary row from the data. Cnts1-3 is the sum for each a,b,c for column1, cnts4-6 is the sum for each column2, and same for column3. Could do the id in several ways, format for instance and maybe use a macro variable for the 3, so to allow any number of columns.

data have;

attrib var1 format=$1. var2-var4 format=best.;

infile datalines;

input var1 $ var2-var4;

datalines;

A 4 5 6

B 3 3 2

A 3 4 2

C 3 2 0

B 3 7 3

B 2 4 1

;

run;

data want (drop=id i);

set have end=last;

select(var1);

when ('A') id=1;

when ('B') id=2;

when ('C') id=3;

end;

retain cnts1-cnts9;

array cols{3} var2-var4;

array cnts{3,3} 8.;

do i=1 to 3;

cnts{id,i}=sum(cnts{id,i},cols{i});

end;

if last then output;

run;

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to kkzs

08-29-2014 05:08 AM

hi all, a note to say thanks for your help and time, appreciated!