Solved: Re: Number repetitions max value into group

luciacossaro · Posted 08-09-2018 05:28 AM

Hi !

I have a Sas new user’s question.

I have a dataset with this structure and I want to get the number of repetitions of the max value for each Vle_a’s value.

Vle_a	Vle_b
A	5
A	10
A	9
A	8
A	10
A	2
A	10
B	9
B	1
B	3
B	9

I need to get a dataset like this :

Vle_a	Nb_repetitions
A	3
B	2

I tried this code with proc freq but does not consider the max value and I get the number of observation for each Vle_a’s value.

proc freq data=test noprint;
	tables vle_a /  NOCOL NOROW NOPERCENT OUT=want;
	by vle_a ;  
	where vle_b = max(vle_b); 
run ;

I could do that whit proc sql but I would prefer to do it with a data step.

Somebody could help me please ?

Thank you very much !

Best regards

Lucia

Kurt_Bremser · Posted 08-09-2018 05:44 AM

What's so bad about SQL?

proc sql;
create table want as
select distinct vle_a, count(distinct vle_b) as count
from have
group by vle_a
having vle_b = max(vle_b)
;
quit;

Simple and straightforward.

Correction: the SQL needs a sub-select for the max:

proc sql;
create table want as
select vle_a, count(vle_a) as count
from have a
where a.vle_b = (select max(b.vle_b) from have b where b.vle_a = a.vle_a group by b.vle_a)
group by vle_a
;
quit;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

View solution in original post

RW9 · Posted 08-09-2018 05:43 AM

Not tested (as I am not typing in test data - post test data in the form of a datastep!):

proc sql;
  create table want as
  select  vle_a,
          count(*) as nb_repetitions
  from    have
  group by vla_a
  having vle_b=max(vle_b);
run;

luciacossaro · Posted 08-09-2018 05:49 AM

Thank you. But there is a way to do that with a data step ?

Sorry, i had forgotten this :

data test;
input vle_a $ vle_b ;
datalines ;
A 5
A 10
A 9
A 8
A 10
A 2
A 10
B 9
B 1
B 3
B 9
;
run ;

Kurt_Bremser · Posted 08-09-2018 05:53 AM

@luciacossaro wrote:

Thank you. But there is a way to do that with a data step ?

Of course there is. The data step language is considered Turing-complete, so almost anything is possible. But is it worth the hassle?

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

RW9 · Posted 08-09-2018 06:04 AM

Well, its a bit more convoluted:

data test;
  input vle_a $ vle_b ;
datalines ;
A 5
A 10
A 9
A 8
A 10
A 2
A 10
B 9
B 1
B 3
B 9
;
run;

proc sort data=test out=top;
  by vle_a descending vle_b;
run;

data top;
  set top;
  by vle_a;
  if first.vle_a;
run;

data want (keep=vle_a nb_repetitions);
  merge test top (rename=(vle_b=max));
  by vle_a;
  retain nb_repetitions;
  if first.vle_a then nb_repetitions=0;
  if vle_b=max then nb_repetitions=sum(nb_repetitions,1);
  if last.vle_a then output;
run;

luciacossaro · Posted 08-09-2018 06:21 AM

I agree that it is a bit more convoluted, thank you for the help !

Kurt_Bremser · Posted 08-09-2018 05:44 AM

What's so bad about SQL?

proc sql;
create table want as
select distinct vle_a, count(distinct vle_b) as count
from have
group by vle_a
having vle_b = max(vle_b)
;
quit;

Simple and straightforward.

Correction: the SQL needs a sub-select for the max:

proc sql;
create table want as
select vle_a, count(vle_a) as count
from have a
where a.vle_b = (select max(b.vle_b) from have b where b.vle_a = a.vle_a group by b.vle_a)
group by vle_a
;
quit;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

luciacossaro · Posted 08-09-2018 06:02 AM

I had the impression that in some cases my sql code did not work well, maybe it's the way I did it. Should I get the same results with my sql code as with yours? I wanted to know how to do it with a data step to learn more about SAS.

this is the sql code that I had done :

Proc sql ; 
	Create table max_b as
	Select * from test group by Vle_a
	Having vle_b = max(vle_b) 
	Order by vle_a ;
Quit ; 

Proc sql ; 
	Create table nb_repet_max as select vle_a, vle_b, sum(1) as nb_repet 
        From max_b group by vle_a, vle_b
        Order by vle_a ; 
Quit ;

Kurt_Bremser · Posted 08-09-2018 06:24 AM

@luciacossaro wrote:

I had the impression that in some cases my sql code did not work well, maybe it's the way I did it. Should I get the same results with my sql code as with yours? I wanted to know how to do it with a data step to learn more about SAS.

this is the sql code that I had done :
Proc sql ; 
	Create table max_b as
	Select * from test group by Vle_a
	Having vle_b = max(vle_b) 
	Order by vle_a ;
Quit ; 

Proc sql ; 
	Create table nb_repet_max as select vle_a, vle_b, sum(1) as nb_repet 
        From max_b group by vle_a, vle_b
        Order by vle_a ; 
Quit ;

See the addition to my previous post with the sub-select.

Another data step approach is a double do loop:

data want;
maxval = 0;
do until (last.vle_a);
  set have;
  by vle_a;
  maxval = max(maxval,vle_b);
end;
count = 0;
do until (last.vle_a);
  set have;
  by vle_a;
  if vle_b = maxval then count + 1;
end;
keep vle_a count;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Astounding · Posted 08-09-2018 07:22 AM

Given that you simplify your PROC FREQ, it becomes easy to use its output data set:

proc freq data=test noprint;

tables vle_b / noprint out=want;

by vle_a;

run;

data really_want;

set want;

by vle_a;

if last.vle_a;

keep vle_a vle_b count;

rename count=Nb_repetitions;

run;

Ksharp · Posted 08-09-2018 08:51 AM

data test;
  input vle_a $ vle_b ;
datalines ;
A 5
A 10
A 9
A 8
A 10
A 2
A 10
B 9
B 1
B 3
B 9
;
run;

proc sort data=test out=top;
  by vle_a descending vle_b;
run;

data temp;
  set top;
  by vle_a descending vle_b;
  if first.vle_a then n=0;
  n+first.vle_b;
  if n=1;
run;
proc freq data=temp noprint;
table vle_a/list out=want;
run;

FreelanceReinh · Posted 08-10-2018 11:29 AM

Hi @luciacossaro,

In theory, this one-step solution should work:

ods select none;
ods output ExtremeValues=want(keep=vle_a highfreq rename=(highfreq=Nb_repetitions));
proc univariate data=test nextrval=1;
by vle_a;
var vle_b;
run;
ods select all;

And indeed it does work for your test data.

However, there seems to be a bug (!) in SAS, at least in my version 9.4 TS1M2 (Windows 7, 64 bit), which creates a special missing value (._) for highfreq (plus an incorrect value for lowfreq) if the respective BY group does not contain any duplicate value of vle_b.

Example:

ods select none;
ods output ExtremeValues=buggy;
proc univariate data=test(obs=10) nextrval=1;
by vle_a;
var vle_b;
run;
ods select all;

proc print data=buggy;
run;

Result:

                 Var        Low              Low      High              High
Obs    vle_a    Name      Order    Low      Freq     Order    High      Freq

 1       A      vle_b         1     2          1         5     10          3
 2       B      vle_b         1     1          9         3      9          _

It seems that for BY group 'B' LowFreq has been set to the value of High, which makes no sense at all. In listing output the columns "Freq" in section "Extreme Values" are suppressed (without a good reason).

@all: Does anybody have a more recent release (ideally TS1M5) to check if this has been fixed already?

Registration is open

SAS Training: Just a Click Away