BookmarkSubscribeRSS Feed
namrata
Fluorite | Level 6

https://communities.sas.com/message/151801#151801

I want a slight change in the output. I am using the dataset in the previous thread for reference.

ID     Year              Bill        Charlie

1       2000                25          1 

1       2000                35          2

1       2001                25          3 

2       2001                11          4 

2       2001                23          5

4       2000                 2          6

4       2000                 1          7

4       2000                 3          8

I would like my output in this manner (a variable equal to the sum of observations for the same ID and year in Bill & Charlie)

ID     Year              Bill1        Charlie1

1       2000             60          3

1       2001             25          3 

2       2001             34          9 

4       2000              6          21

The codes used are:

proc sql;

create table base1 as

select *, sum(bill) as bill1,sum(Charlie) as charlie1

from have

group by ID, year;

quit;


Now, I use the nodupkey option get my result but is there a more efficient way- to use only Proc SQL or only Data Step?

proc sort data=base1 nodupkey;

by id year;

run;

 

Thanks!
Namrata

9 REPLIES 9
Tom
Super User Tom
Super User

Not sure of the question here. If you want PROC SQL to print the result instead of making a dataset then just remove the "create table base1 as" part of the statement.  Or add a PROC PRINT step.

If you want to make a report then a reporting tool like PROC REPORT might be what you want.

proc report data=have headline nofs ;

column id year bill charlie ;

define id / group ;

define year /group;

define bill / sum ;

define charlie / sum;

run;

namrata
Fluorite | Level 6

What I am trying to get is an output that contains only one observation(sum of all obs for the same year). My proc sql does not give me the output in the way I desire. What I get is:

ID     Year              Bill        Charlie       Bill1  Charlie1

1       2000                25          1            60       3       

1       2000                35          2            60       3

1       2001                25          3           25       3

2       2001                11          4           34       9

2       2001                23          5           34       9

4       2000                 2          6              6       21

4       2000                 1          7              6       21

4       2000                 3          8              6       21

And that is precisely why I thought I could just use nodupkey to delete repetitions in the years for each ID.

your codes work but I need to retain a couple more variables from my original dataset.

Tom
Super User Tom
Super User

You are getting the extra rows because of the extra columns you have selected.

Instead of

select *, sum(bill) as bill1,sum(Charlie) as charlie1

You should replace the * (which means ALL columns) with just the columns in your GROUP BY statement.

select ID, year, sum(bill) as bill1,sum(Charlie) as charlie1

namrata
Fluorite | Level 6

Aahh! That makes sense.

I am sure that is what Loko's codes were pointing at. I had tried his codes but kept most of the other variables(including the ones I am summing) instead of *.

Thanks a ton, Tom!

RCW
Calcite | Level 5 RCW
Calcite | Level 5

I meant to add to check out the ID statement - it lets you retain variables.

RaviKommuri
Fluorite | Level 6

I guess, you don't need to use nodupkey while printing the data base1.

You Proc sql is giving the result without any duplicates.

Loko
Barite | Level 11

Hello,

Adjust a little bit the sql code and it will be allright:

proc sql;

create table base1 as

select ID, year, sum(bill) as bill1,sum(Charlie) as charlie1

from have

group by ID, year;

quit;

namrata
Fluorite | Level 6

I tried your codes. It returns a sum column(as I have shown in my reply above) but that is not what I want.

rcwright
Calcite | Level 5

assuming your data set is named t:

PROC MEANS data=T nway noprint;

CLASS id year;

VAR bill charlie;

OUTPUT OUT=sum_t sum(bill charlie)=;

RUN;

no need to sort, class statement does it for you.

output dataset is sum_t

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 2459 views
  • 3 likes
  • 6 in conversation