Desktop productivity for business analysts and programmers

Number repetitions max value into group

Reply
Contributor
Posts: 34

Number repetitions max value into group

Hi ! 

I have a Sas new user’s question.

I have a dataset with this structure and I want to get the number of repetitions of the max value for each Vle_a’s value.

 

Vle_aVle_b
A5
A10
A9
A8
A10
A2
A10
B9
B1
B3
B9

 

 

I need to get a dataset like this :

 

Vle_aNb_repetitions
A3
B2

 

 

I tried this code with proc freq but does not consider the max value and I get the number of observation for each Vle_a’s value.

 

proc freq data=test noprint;
	tables vle_a /  NOCOL NOROW NOPERCENT OUT=want;
	by vle_a ;  
	where vle_b = max(vle_b); 
run ; 

I could do that whit proc sql but I would prefer to do it with a data step.

 

Somebody could help me please ?

 

Thank you very much !

 

Best regards

Lucia

 

Super User
Super User
Posts: 9,855

Re: Number repetitions max value into group

Posted in reply to luciacossaro

Not tested (as I am not typing in test data - post test data in the form of a datastep!):

proc sql;
  create table want as
  select  vle_a,
          count(*) as nb_repetitions
  from    have
  group by vla_a
  having vle_b=max(vle_b);
run;
Contributor
Posts: 34

Re: Number repetitions max value into group

Thank you. But there is a way to do that with a data step ?

 

Sorry, i had forgotten this : 

 

data test;
input vle_a $ vle_b ;
datalines ;
A 5
A 10
A 9
A 8
A 10
A 2
A 10
B 9
B 1
B 3
B 9
;
run ;
Super User
Posts: 10,594

Re: Number repetitions max value into group

Posted in reply to luciacossaro

@luciacossaro wrote:

Thank you. But there is a way to do that with a data step ?

 



Of course there is. The data step language is considered Turing-complete, so almost anything is possible. But is it worth the hassle?

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Super User
Super User
Posts: 9,855

Re: Number repetitions max value into group

Posted in reply to luciacossaro

Well, its a bit more convoluted:

data test;
  input vle_a $ vle_b ;
datalines ;
A 5
A 10
A 9
A 8
A 10
A 2
A 10
B 9
B 1
B 3
B 9
;
run;

proc sort data=test out=top;
  by vle_a descending vle_b;
run;

data top;
  set top;
  by vle_a;
  if first.vle_a;
run;

data want (keep=vle_a nb_repetitions);
  merge test top (rename=(vle_b=max));
  by vle_a;
  retain nb_repetitions;
  if first.vle_a then nb_repetitions=0;
  if vle_b=max then nb_repetitions=sum(nb_repetitions,1);
  if last.vle_a then output;
run;
Contributor
Posts: 34

Re: Number repetitions max value into group

I agree that it is a bit more convoluted, thank you for the help ! 

Super User
Posts: 10,594

Re: Number repetitions max value into group

[ Edited ]
Posted in reply to luciacossaro

What's so bad about SQL?

proc sql;
create table want as
select distinct vle_a, count(distinct vle_b) as count
from have
group by vle_a
having vle_b = max(vle_b)
;
quit;

Simple and straightforward.

Correction: the SQL needs a sub-select for the max:

proc sql;
create table want as
select vle_a, count(vle_a) as count
from have a
where a.vle_b = (select max(b.vle_b) from have b where b.vle_a = a.vle_a group by b.vle_a)
group by vle_a
;
quit;
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Contributor
Posts: 34

Re: Number repetitions max value into group

[ Edited ]
Posted in reply to KurtBremser

I had the impression that in some cases my sql code did not work well, maybe it's the way I did it. Should I get the same results with my sql code as with yours? I wanted to know how to do it with a data step to learn more about SAS.

 

this is the sql code that I had done :

 

Proc sql ; 
	Create table max_b as
	Select * from test group by Vle_a
	Having vle_b = max(vle_b) 
	Order by vle_a ;
Quit ; 

Proc sql ; 
	Create table nb_repet_max as select vle_a, vle_b, sum(1) as nb_repet 
        From max_b group by vle_a, vle_b
        Order by vle_a ; 
Quit ;

 

 

Super User
Posts: 10,594

Re: Number repetitions max value into group

Posted in reply to luciacossaro

@luciacossaro wrote:

I had the impression that in some cases my sql code did not work well, maybe it's the way I did it. Should I get the same results with my sql code as with yours? I wanted to know how to do it with a data step to learn more about SAS.

 

this is the sql code that I had done :

 

Proc sql ; 
	Create table max_b as
	Select * from test group by Vle_a
	Having vle_b = max(vle_b) 
	Order by vle_a ;
Quit ; 

Proc sql ; 
	Create table nb_repet_max as select vle_a, vle_b, sum(1) as nb_repet 
        From max_b group by vle_a, vle_b
        Order by vle_a ; 
Quit ;

 

 


See the addition to my previous post with the sub-select.

 

Another data step approach is a double do loop:

data want;
maxval = 0;
do until (last.vle_a);
  set have;
  by vle_a;
  maxval = max(maxval,vle_b);
end;
count = 0;
do until (last.vle_a);
  set have;
  by vle_a;
  if vle_b = maxval then count + 1;
end;
keep vle_a count;
run;
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Super User
Posts: 6,935

Re: Number repetitions max value into group

Posted in reply to luciacossaro

Given that you simplify your PROC FREQ, it becomes easy to use its output data set:

 

proc freq data=test noprint;

   tables vle_b / noprint out=want;

   by vle_a;

run;

 

data really_want;

   set want;

   by vle_a;

   if last.vle_a;

   keep vle_a vle_b count;

   rename count=Nb_repetitions;

run;

Super User
Posts: 10,850

Re: Number repetitions max value into group

Posted in reply to luciacossaro
data test;
  input vle_a $ vle_b ;
datalines ;
A 5
A 10
A 9
A 8
A 10
A 2
A 10
B 9
B 1
B 3
B 9
;
run;

proc sort data=test out=top;
  by vle_a descending vle_b;
run;

data temp;
  set top;
  by vle_a descending vle_b;
  if first.vle_a then n=0;
  n+first.vle_b;
  if n=1;
run;
proc freq data=temp noprint;
table vle_a/list out=want;
run;
Trusted Advisor
Posts: 1,327

Re: Number repetitions max value into group

Posted in reply to luciacossaro

Hi @luciacossaro,

 

In theory, this one-step solution should work:

 

ods select none;
ods output ExtremeValues=want(keep=vle_a highfreq rename=(highfreq=Nb_repetitions));
proc univariate data=test nextrval=1;
by vle_a;
var vle_b;
run;
ods select all;

And indeed it does work for your test data.

 

 

However, there seems to be a bug (!) in SAS, at least in my version 9.4 TS1M2 (Windows 7, 64 bit), which creates a special missing value (._) for highfreq (plus an incorrect value for lowfreq) if the respective BY group does not contain any duplicate value of vle_b.

 

Example:

ods select none;
ods output ExtremeValues=buggy;
proc univariate data=test(obs=10) nextrval=1;
by vle_a;
var vle_b;
run;
ods select all;

proc print data=buggy;
run;

Result:

                 Var        Low              Low      High              High
Obs    vle_a    Name      Order    Low      Freq     Order    High      Freq

 1       A      vle_b         1     2          1         5     10          3
 2       B      vle_b         1     1          9         3      9          _

It seems that for BY group 'B' LowFreq has been set to the value of High, which makes no sense at all. In listing output the columns "Freq" in section "Extreme Values" are suppressed (without a good reason).

 

@all: Does anybody have a more recent release (ideally TS1M5) to check if this has been fixed already?

Ask a Question
Discussion stats
  • 11 replies
  • 140 views
  • 0 likes
  • 6 in conversation