Desktop productivity for business analysts and programmers

Summary by variable

Accepted Solution Solved
Reply
Contributor
Posts: 58
Accepted Solution

Summary by variable

Hi, I need a particular kind of summary, I hope that it would be possible to do:

 

I have this datatable:

ID MTO DATE
1 100 jul-15
2 120 jun-15
3 130 may-15
1 90 may-15
2 80 jun-15
3 100 may-15
1 200 may-15
2 110 jul-15
3 100 jul-15

 I need to get kind of summary, but using data step cause I have too many obs:

ID Sum(MTO) may-15 Sum(MTO) jun-15 Sum(MTO) jul-15
1 330 80 200
2 190 120 110

 

I was using proc summary, but I could only get a sum without the group by date.


Accepted Solutions
Solution
‎10-22-2015 12:29 PM
Contributor
Posts: 58

Re: Summary by variable

[ Edited ]

Hi Jerry, thanks for your answer, but there is not totally right:

 

proc summary output:

id date sum_mto
0 OCT14 0.2611745
0 NOV14 0.01
0 DEC14 0.08014064
0 FEB15 0.47
0 MAR15 0.12968991
0 APR15 0.37
0 MAY15 0.02316756
0 JUN15 0.05
0 JUL15 0.05017248
0 AUG15 0.26142127
0 SEP15 0.01620732
2 APR15 15.044794
2 JUN15 0.0317058
83 DEC14 300001.67
262 SEP15 31.0078
310 NOV14 3257
361 JUN15 15000
382 OCT14 20.5578741
526 MAR15 0.8397922
583 DEC14 0.00669792
584 SEP15 2.96077683
603 APR15 35.11013
603 JUN15 69.369564
609 OCT14 68.6342
693 JAN15 5.609508
693 JUN15 25.26848
710 JUN15 1734.04944
718 JUL15 0.00313578
727 MAY15 0.00316756
742 JAN15 23.25582
793 MAY15 0.00633512
881 SEP15 80.5151477
921 AUG15 12.31
935 MAY15 63.7552
948 DEC14 353.77335
948 MAR15 18.398973

 

 

I added var to your code, is it right? BUt, here is the problem, the output is strange:

 

proc transpose data=summ out=want(drop=_: )  prefix=mto_ ;

   by  id;

   var id  date;

run;

 

The output of this code:

id mto_1 mto_2 mto_3 mto_4 mto_5 mto_6 mto_7 mto_8 mto_9 mto_10 mto_11 mto_12
0 0 0 0 0 0 0 0 0 0 0 0 .
0 19997 20028 20058 20121 20148 20179 20209 20240 20270 20301 20332 .
2 2 2 . . . . . . . . . .
2 20179 20240 . . . . . . . . . .
83 83 . . . . . . . . . . .
83 20058 . . . . . . . . . . .
262 262 . . . . . . . . . . .
262 20332 . . . . . . . . . . .
310 310 . . . . . . . . . . .
310 20028 . . . . . . . . . . .
361 361 . . . . . . . . . . .
361 20240 . . . . . . . . . . .
382 382 . . . . . . . . . . .
382 19997 . . . . . . . . . . .
526 526 . . . . . . . . . . .
526 20148 . . . . . . . . . . .
583 583 . . . . . . . . . . .
583 20058 . . . . . . . . . . .
584 584 . . . . . . . . . . .
584 20332 . . . . . . . . . . .

 

The date was transformed from a ddmmyy format, could it be the problem? And, you know why there are replicated ids? Thanks!

 

View solution in original post


All Replies
Frequent Contributor
Posts: 85

Re: Summary by variable

[ Edited ]

You shouldn't need a Data step for this. 

 

Try something like this...

 

proc summary data=have nway;

   class id date;

   var  mto;

   output out=summ(drop=_: )  sum=;

run;

 

/* Then to have the month sums as columns...  */

proc transpose data=summ out=want(drop=_: )  prefix=mto_ ;

   by  id;

   id  date;

run;

 

Hope this helps.

 

Solution
‎10-22-2015 12:29 PM
Contributor
Posts: 58

Re: Summary by variable

[ Edited ]

Hi Jerry, thanks for your answer, but there is not totally right:

 

proc summary output:

id date sum_mto
0 OCT14 0.2611745
0 NOV14 0.01
0 DEC14 0.08014064
0 FEB15 0.47
0 MAR15 0.12968991
0 APR15 0.37
0 MAY15 0.02316756
0 JUN15 0.05
0 JUL15 0.05017248
0 AUG15 0.26142127
0 SEP15 0.01620732
2 APR15 15.044794
2 JUN15 0.0317058
83 DEC14 300001.67
262 SEP15 31.0078
310 NOV14 3257
361 JUN15 15000
382 OCT14 20.5578741
526 MAR15 0.8397922
583 DEC14 0.00669792
584 SEP15 2.96077683
603 APR15 35.11013
603 JUN15 69.369564
609 OCT14 68.6342
693 JAN15 5.609508
693 JUN15 25.26848
710 JUN15 1734.04944
718 JUL15 0.00313578
727 MAY15 0.00316756
742 JAN15 23.25582
793 MAY15 0.00633512
881 SEP15 80.5151477
921 AUG15 12.31
935 MAY15 63.7552
948 DEC14 353.77335
948 MAR15 18.398973

 

 

I added var to your code, is it right? BUt, here is the problem, the output is strange:

 

proc transpose data=summ out=want(drop=_: )  prefix=mto_ ;

   by  id;

   var id  date;

run;

 

The output of this code:

id mto_1 mto_2 mto_3 mto_4 mto_5 mto_6 mto_7 mto_8 mto_9 mto_10 mto_11 mto_12
0 0 0 0 0 0 0 0 0 0 0 0 .
0 19997 20028 20058 20121 20148 20179 20209 20240 20270 20301 20332 .
2 2 2 . . . . . . . . . .
2 20179 20240 . . . . . . . . . .
83 83 . . . . . . . . . . .
83 20058 . . . . . . . . . . .
262 262 . . . . . . . . . . .
262 20332 . . . . . . . . . . .
310 310 . . . . . . . . . . .
310 20028 . . . . . . . . . . .
361 361 . . . . . . . . . . .
361 20240 . . . . . . . . . . .
382 382 . . . . . . . . . . .
382 19997 . . . . . . . . . . .
526 526 . . . . . . . . . . .
526 20148 . . . . . . . . . . .
583 583 . . . . . . . . . . .
583 20058 . . . . . . . . . . .
584 584 . . . . . . . . . . .
584 20332 . . . . . . . . . . .

 

The date was transformed from a ddmmyy format, could it be the problem? And, you know why there are replicated ids? Thanks!

 

Frequent Contributor
Posts: 85

Re: Summary by variable

Not sure why you needed to add the VAR statement.   Its implied, but to be explicit, you'd write as:

 

 

proc summary data=have nway;
   class id date;
   var  mto;
   output out=summ(drop=_: )  sum=;
run;

/* Then to have the month sums as columns...  */
proc transpose data=summ out=want(drop=_: )  prefix=mto_ ;
   by  id;
   id  date;
   var mto;
run;

 

 

And this is the result (for me):

ID

mto_MAY15

mto_JUL15

mto_JUN15

1

290

100

.

2

.

110

200

3

230

100

.

 

 

 

 

Contributor
Posts: 58

Re: Summary by variable

My bad, I just see my error, id is a parameter, I was confused because I have the same name for my variable...it works perfect!
Respected Advisor
Posts: 4,606

Re: Summary by variable

One simple way, assuming your dates are SAS dates. 

 

data have;
input ID	MTO	DATE :monyy.;
format date monyy.;
datalines;
1	100	jul-15
2	120	jun-15
3	130	may-15
1	90	may-15
2	80	jun-15
3	100	may-15
1	200	may-15
2	110	jul-15
3	100	jul-15
;

proc sql;
create table sumList as
select id, 
    put(date,monyy7.) as date,
    cats("Sum(MTO)", put(date,monyy7.)) as label, sum(mto) as sumMTO
from have
group by id, calculated date, label;
quit;

proc transpose data=sumList out=sumTable(drop=_name_);
by id; 
var sumMTO;
id date;
idlabel label;
run;

proc print data=sumTable noobs label; run;
PG
Contributor
Posts: 58

Re: Summary by variable

Hi PG, a friend told me that proc sql is a method that take more time, is that truth? What would you choose between proc summary and proc sql?? Because I have about 500 million obs.
Respected Advisor
Posts: 4,606

Re: Summary by variable

Yes, SQL's flexibility comes at a price. If you can get what you want with proc summary then it is probably faster. I wouldn't expect a huge difference on such a simple task however.

PG
Contributor
Posts: 58

Re: Summary by variable

Thanks PG. I couldn't select your answer as solution. It seems you could choose only one.
Contributor
Posts: 58

Re: Summary by variable

Pg, thanks, I tried your solution and it works perfectly!
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 606 views
  • 2 likes
  • 3 in conversation