https://communities.sas.com/message/151801#151801
I want a slight change in the output. I am using the dataset in the previous thread for reference.
ID Year Bill Charlie
1 2000 25 1
1 2000 35 2
1 2001 25 3
2 2001 11 4
2 2001 23 5
4 2000 2 6
4 2000 1 7
4 2000 3 8
I would like my output in this manner (a variable equal to the sum of observations for the same ID and year in Bill & Charlie)
ID Year Bill1 Charlie1
1 2000 60 3
1 2001 25 3
2 2001 34 9
4 2000 6 21
The codes used are:
proc sql;
create table base1 as
select *, sum(bill) as bill1,sum(Charlie) as charlie1
from have
group by ID, year;
quit;
Now, I use the nodupkey option get my result but is there a more efficient way- to use only Proc SQL or only Data Step?
proc sort data=base1 nodupkey;
by id year;
run;
Thanks!
Namrata
Not sure of the question here. If you want PROC SQL to print the result instead of making a dataset then just remove the "create table base1 as" part of the statement. Or add a PROC PRINT step.
If you want to make a report then a reporting tool like PROC REPORT might be what you want.
proc report data=have headline nofs ;
column id year bill charlie ;
define id / group ;
define year /group;
define bill / sum ;
define charlie / sum;
run;
What I am trying to get is an output that contains only one observation(sum of all obs for the same year). My proc sql does not give me the output in the way I desire. What I get is:
ID Year Bill Charlie Bill1 Charlie1
1 2000 25 1 60 3
1 2000 35 2 60 3
1 2001 25 3 25 3
2 2001 11 4 34 9
2 2001 23 5 34 9
4 2000 2 6 6 21
4 2000 1 7 6 21
4 2000 3 8 6 21
And that is precisely why I thought I could just use nodupkey to delete repetitions in the years for each ID.
your codes work but I need to retain a couple more variables from my original dataset.
You are getting the extra rows because of the extra columns you have selected.
Instead of
select *, sum(bill) as bill1,sum(Charlie) as charlie1
You should replace the * (which means ALL columns) with just the columns in your GROUP BY statement.
select ID, year, sum(bill) as bill1,sum(Charlie) as charlie1
Aahh! That makes sense.
I am sure that is what Loko's codes were pointing at. I had tried his codes but kept most of the other variables(including the ones I am summing) instead of *.
Thanks a ton, Tom!
I meant to add to check out the ID statement - it lets you retain variables.
I guess, you don't need to use nodupkey while printing the data base1.
You Proc sql is giving the result without any duplicates.
Hello,
Adjust a little bit the sql code and it will be allright:
proc sql;
create table base1 as
select ID, year, sum(bill) as bill1,sum(Charlie) as charlie1
from have
group by ID, year;
quit;
assuming your data set is named t:
PROC MEANS data=T nway noprint;
CLASS id year;
VAR bill charlie;
OUTPUT OUT=sum_t sum(bill charlie)=;
RUN;
no need to sort, class statement does it for you.
output dataset is sum_t
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.