BookmarkSubscribeRSS Feed
namrata
Fluorite | Level 6

https://communities.sas.com/message/151801#151801

I want a slight change in the output. I am using the dataset in the previous thread for reference.

ID     Year              Bill        Charlie

1       2000                25          1 

1       2000                35          2

1       2001                25          3 

2       2001                11          4 

2       2001                23          5

4       2000                 2          6

4       2000                 1          7

4       2000                 3          8

I would like my output in this manner (a variable equal to the sum of observations for the same ID and year in Bill & Charlie)

ID     Year              Bill1        Charlie1

1       2000             60          3

1       2001             25          3 

2       2001             34          9 

4       2000              6          21

The codes used are:

proc sql;

create table base1 as

select *, sum(bill) as bill1,sum(Charlie) as charlie1

from have

group by ID, year;

quit;


Now, I use the nodupkey option get my result but is there a more efficient way- to use only Proc SQL or only Data Step?

proc sort data=base1 nodupkey;

by id year;

run;

 

Thanks!
Namrata

9 REPLIES 9
Tom
Super User Tom
Super User

Not sure of the question here. If you want PROC SQL to print the result instead of making a dataset then just remove the "create table base1 as" part of the statement.  Or add a PROC PRINT step.

If you want to make a report then a reporting tool like PROC REPORT might be what you want.

proc report data=have headline nofs ;

column id year bill charlie ;

define id / group ;

define year /group;

define bill / sum ;

define charlie / sum;

run;

namrata
Fluorite | Level 6

What I am trying to get is an output that contains only one observation(sum of all obs for the same year). My proc sql does not give me the output in the way I desire. What I get is:

ID     Year              Bill        Charlie       Bill1  Charlie1

1       2000                25          1            60       3       

1       2000                35          2            60       3

1       2001                25          3           25       3

2       2001                11          4           34       9

2       2001                23          5           34       9

4       2000                 2          6              6       21

4       2000                 1          7              6       21

4       2000                 3          8              6       21

And that is precisely why I thought I could just use nodupkey to delete repetitions in the years for each ID.

your codes work but I need to retain a couple more variables from my original dataset.

Tom
Super User Tom
Super User

You are getting the extra rows because of the extra columns you have selected.

Instead of

select *, sum(bill) as bill1,sum(Charlie) as charlie1

You should replace the * (which means ALL columns) with just the columns in your GROUP BY statement.

select ID, year, sum(bill) as bill1,sum(Charlie) as charlie1

namrata
Fluorite | Level 6

Aahh! That makes sense.

I am sure that is what Loko's codes were pointing at. I had tried his codes but kept most of the other variables(including the ones I am summing) instead of *.

Thanks a ton, Tom!

RCW
Calcite | Level 5 RCW
Calcite | Level 5

I meant to add to check out the ID statement - it lets you retain variables.

RaviKommuri
Fluorite | Level 6

I guess, you don't need to use nodupkey while printing the data base1.

You Proc sql is giving the result without any duplicates.

Loko
Barite | Level 11

Hello,

Adjust a little bit the sql code and it will be allright:

proc sql;

create table base1 as

select ID, year, sum(bill) as bill1,sum(Charlie) as charlie1

from have

group by ID, year;

quit;

namrata
Fluorite | Level 6

I tried your codes. It returns a sum column(as I have shown in my reply above) but that is not what I want.

rcwright
Calcite | Level 5

assuming your data set is named t:

PROC MEANS data=T nway noprint;

CLASS id year;

VAR bill charlie;

OUTPUT OUT=sum_t sum(bill charlie)=;

RUN;

no need to sort, class statement does it for you.

output dataset is sum_t

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1854 views
  • 3 likes
  • 6 in conversation