DATA Step, Macro, Functions and more

Summarize step

Accepted Solution Solved
Reply
Super Contributor
Posts: 1,040
Accepted Solution

Summarize step

Hi Team,

I have a dataset like this.

Description   code      cases     denom    numer     triplet

infections      702         277          179      15            702

infections      703         288            280      25           702_703

infections      704         125            122      7             702_703_704  

I want as shown below...

Basically to summarize the cases, denom and numer......and to take the last one in the triplet

Description      cases    denom     numer    triplet

infections          690        581        47           702_703_704       

Regards


Accepted Solutions
Solution
‎12-27-2012 11:05 AM
PROC Star
Posts: 7,360

Re: Summarize step

Karun,

Here is one way (assuming your data are already organized such that all description records are together and the last record in each group is the one that contains the desired triplet):

data want (drop=_Smiley Happy;

  set have;

  retain _cases _denom _numer;

  by description notsorted;

  _cases=ifn(first.description,cases,sum(cases,_cases));

  _denom=ifn(first.description,denom,sum(denom,_denom));

  _numer=ifn(first.description,numer,sum(numer,_numer));

  if last.description then do;

    cases=_cases;

    denom=_denom;

    numer=_numer;

    output;

  end;

run;

View solution in original post


All Replies
Solution
‎12-27-2012 11:05 AM
PROC Star
Posts: 7,360

Re: Summarize step

Karun,

Here is one way (assuming your data are already organized such that all description records are together and the last record in each group is the one that contains the desired triplet):

data want (drop=_Smiley Happy;

  set have;

  retain _cases _denom _numer;

  by description notsorted;

  _cases=ifn(first.description,cases,sum(cases,_cases));

  _denom=ifn(first.description,denom,sum(denom,_denom));

  _numer=ifn(first.description,numer,sum(numer,_numer));

  if last.description then do;

    cases=_cases;

    denom=_denom;

    numer=_numer;

    output;

  end;

run;

Super Contributor
Posts: 1,636

Re: Summarize step

or

data have;

input Description $10.   code      cases     denom    numer;

cards;

infections      702         277          179      15           

infections      703         288            280      25          

infections      704         125            122      7 

;

data want;

  set have end=last;

  length triplet $ 20;

retain  newcases newdenom newnumer triplet;

newcases+cases;

newdenom+denom;

newnumer+numer;

  triplet=catx('_',triplet,code);

  if last ;

proc print;run;

Super Contributor
Posts: 1,040

Re: Summarize step

Thanks so much. It works well

Regards

Super Contributor
Posts: 1,040

Re: Summarize step

Linlin,

I think your code works only for only the example i put up.

if there are several repeats of the Description then Art's code works perfectly fine

Sorry for not telling it in my question that there are several of those repeatsSmiley Sad

Regards

Super Contributor
Posts: 1,636

Re: Summarize step

Hi Karun,

below is the modified code:

data have;

input Description $10.   code      cases     denom    numer;

cards;

infections      702         277          179      15          

infections      703         288            280      25         

infections      704         125            122      7

vnfections      802         277          179      15          

vnfections      803         888            880      85         

vnfections      804         125            122      7

;

data want;

  set have ;

  by Description;

  length triplet $ 20;

retain  newcases newdenom newnumer triplet;

newcases+cases;

newdenom+denom;

newnumer+numer;

  triplet=catx('_',triplet,code);

  if last.Description then do ;

  output;

  call missing(of new:,triplet);

  end;

proc print;run;

Super Contributor
Posts: 1,040

Re: Summarize step

Hi Thanks a ton for your time.

I understood everything but

call missing(of new:,triplet);

and why this statement has to be after the output stmnt??

Regards

Super User
Posts: 17,784

Re: Summarize step

call missing sets all the variables in the list to missing, so they are reset for the new loop

Super Contributor
Posts: 1,040

Re: Summarize step

Hi,

When you set the triplet to missing in each loop along with new:

triplet=catx('_',triplet,code);

is is like ""_code=code???(when a missing char value concatenates with a value it would be just the value and not the _code?????

Description=infectio code=702 cases=227 denom=179 numer=15 FIRST.Description=1 LAST.Description=0

triplet=702 newcases=227 newdenom=179 newnumer=15 _ERROR_=0 _N_=1

Description=infectio code=703 cases=288 denom=281 numer=26 FIRST.Description=0 LAST.Description=0

triplet=702_703 newcases=515 newdenom=460 newnumer=41 _ERROR_=0 _N_=2

Description=infectio code=704 cases=125 denom=122 numer=7 FIRST.Description=0 LAST.Description=1

triplet=  newcases=. newdenom=. newnumer=. _ERROR_=0 _N_=3

Description=vnfectio code=802 cases=277 denom=179 numer=15 FIRST.Description=1 LAST.Description=0

triplet=802 newcases=277 newdenom=179 newnumer=15 _ERROR_=0 _N_=4

Description=vnfectio code=803 cases=888 denom=880 numer=85 FIRST.Description=0 LAST.Description=0

triplet=802_803 newcases=1165 newdenom=1059 newnumer=100 _ERROR_=0 _N_=5

Description=vnfectio code=804 cases=125 denom=122 numer=7 FIRST.Description=0 LAST.Description=1

triplet=  newcases=. newdenom=. newnumer=. _ERROR_=0 _N_=6

Regards

Super Contributor
Posts: 1,040

Re: Summarize step

Hi Team,

Good Morning

Could you please help me understand this logic???

Regards

PROC Star
Posts: 7,360

Re: Summarize step

Karun,

You are getting the values you showed, I presume, by using a put _all_ statement in your code AFTER the call missing statement.

If so, you are not getting the values that are being output as the output statement is BEFORE the call missing statement.

Add another put statement before the call missing statement and I think you will be able to understand what the code is doing.

Super Contributor
Posts: 1,040

Re: Summarize step


Thanks ART,

Helped me to understand better.

Regards

Respected Advisor
Posts: 3,124

Re: Summarize step

Here is a Proc SQL approach:

proc sql;

  select description,sum(cases) as cases, sum(denom) as denom, sum(numer) as numer, triplet

  from have

group by description

having length(triplet)=max(length(triplet));

quit;

Haikuo

Frequent Contributor
Posts: 102

Re: Summarize step

max(Triplet) may also work with this SQL syntax provided the example structure holds for other cases.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 13 replies
  • 379 views
  • 10 likes
  • 6 in conversation