BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
luciacossaro
Obsidian | Level 7

Hi ! 

I have a Sas new user’s question.

I have a dataset with this structure and I want to get the number of repetitions of the max value for each Vle_a’s value.

 

Vle_aVle_b
A5
A10
A9
A8
A10
A2
A10
B9
B1
B3
B9

 

 

I need to get a dataset like this :

 

Vle_aNb_repetitions
A3
B2

 

 

I tried this code with proc freq but does not consider the max value and I get the number of observation for each Vle_a’s value.

 

proc freq data=test noprint;
	tables vle_a /  NOCOL NOROW NOPERCENT OUT=want;
	by vle_a ;  
	where vle_b = max(vle_b); 
run ; 

I could do that whit proc sql but I would prefer to do it with a data step.

 

Somebody could help me please ?

 

Thank you very much !

 

Best regards

Lucia

 

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

What's so bad about SQL?

proc sql;
create table want as
select distinct vle_a, count(distinct vle_b) as count
from have
group by vle_a
having vle_b = max(vle_b)
;
quit;

Simple and straightforward.

Correction: the SQL needs a sub-select for the max:

proc sql;
create table want as
select vle_a, count(vle_a) as count
from have a
where a.vle_b = (select max(b.vle_b) from have b where b.vle_a = a.vle_a group by b.vle_a)
group by vle_a
;
quit;

View solution in original post

11 REPLIES 11
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Not tested (as I am not typing in test data - post test data in the form of a datastep!):

proc sql;
  create table want as
  select  vle_a,
          count(*) as nb_repetitions
  from    have
  group by vla_a
  having vle_b=max(vle_b);
run;
luciacossaro
Obsidian | Level 7

Thank you. But there is a way to do that with a data step ?

 

Sorry, i had forgotten this : 

 

data test;
input vle_a $ vle_b ;
datalines ;
A 5
A 10
A 9
A 8
A 10
A 2
A 10
B 9
B 1
B 3
B 9
;
run ;
Kurt_Bremser
Super User

@luciacossaro wrote:

Thank you. But there is a way to do that with a data step ?

 



Of course there is. The data step language is considered Turing-complete, so almost anything is possible. But is it worth the hassle?

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Well, its a bit more convoluted:

data test;
  input vle_a $ vle_b ;
datalines ;
A 5
A 10
A 9
A 8
A 10
A 2
A 10
B 9
B 1
B 3
B 9
;
run;

proc sort data=test out=top;
  by vle_a descending vle_b;
run;

data top;
  set top;
  by vle_a;
  if first.vle_a;
run;

data want (keep=vle_a nb_repetitions);
  merge test top (rename=(vle_b=max));
  by vle_a;
  retain nb_repetitions;
  if first.vle_a then nb_repetitions=0;
  if vle_b=max then nb_repetitions=sum(nb_repetitions,1);
  if last.vle_a then output;
run;
luciacossaro
Obsidian | Level 7

I agree that it is a bit more convoluted, thank you for the help ! 

Kurt_Bremser
Super User

What's so bad about SQL?

proc sql;
create table want as
select distinct vle_a, count(distinct vle_b) as count
from have
group by vle_a
having vle_b = max(vle_b)
;
quit;

Simple and straightforward.

Correction: the SQL needs a sub-select for the max:

proc sql;
create table want as
select vle_a, count(vle_a) as count
from have a
where a.vle_b = (select max(b.vle_b) from have b where b.vle_a = a.vle_a group by b.vle_a)
group by vle_a
;
quit;
luciacossaro
Obsidian | Level 7

I had the impression that in some cases my sql code did not work well, maybe it's the way I did it. Should I get the same results with my sql code as with yours? I wanted to know how to do it with a data step to learn more about SAS.

 

this is the sql code that I had done :

 

Proc sql ; 
	Create table max_b as
	Select * from test group by Vle_a
	Having vle_b = max(vle_b) 
	Order by vle_a ;
Quit ; 

Proc sql ; 
	Create table nb_repet_max as select vle_a, vle_b, sum(1) as nb_repet 
        From max_b group by vle_a, vle_b
        Order by vle_a ; 
Quit ;

 

 

Kurt_Bremser
Super User

@luciacossaro wrote:

I had the impression that in some cases my sql code did not work well, maybe it's the way I did it. Should I get the same results with my sql code as with yours? I wanted to know how to do it with a data step to learn more about SAS.

 

this is the sql code that I had done :

 

Proc sql ; 
	Create table max_b as
	Select * from test group by Vle_a
	Having vle_b = max(vle_b) 
	Order by vle_a ;
Quit ; 

Proc sql ; 
	Create table nb_repet_max as select vle_a, vle_b, sum(1) as nb_repet 
        From max_b group by vle_a, vle_b
        Order by vle_a ; 
Quit ;

 

 


See the addition to my previous post with the sub-select.

 

Another data step approach is a double do loop:

data want;
maxval = 0;
do until (last.vle_a);
  set have;
  by vle_a;
  maxval = max(maxval,vle_b);
end;
count = 0;
do until (last.vle_a);
  set have;
  by vle_a;
  if vle_b = maxval then count + 1;
end;
keep vle_a count;
run;
Astounding
PROC Star

Given that you simplify your PROC FREQ, it becomes easy to use its output data set:

 

proc freq data=test noprint;

   tables vle_b / noprint out=want;

   by vle_a;

run;

 

data really_want;

   set want;

   by vle_a;

   if last.vle_a;

   keep vle_a vle_b count;

   rename count=Nb_repetitions;

run;

Ksharp
Super User
data test;
  input vle_a $ vle_b ;
datalines ;
A 5
A 10
A 9
A 8
A 10
A 2
A 10
B 9
B 1
B 3
B 9
;
run;

proc sort data=test out=top;
  by vle_a descending vle_b;
run;

data temp;
  set top;
  by vle_a descending vle_b;
  if first.vle_a then n=0;
  n+first.vle_b;
  if n=1;
run;
proc freq data=temp noprint;
table vle_a/list out=want;
run;
FreelanceReinh
Jade | Level 19

Hi @luciacossaro,

 

In theory, this one-step solution should work:

 

ods select none;
ods output ExtremeValues=want(keep=vle_a highfreq rename=(highfreq=Nb_repetitions));
proc univariate data=test nextrval=1;
by vle_a;
var vle_b;
run;
ods select all;

And indeed it does work for your test data.

 

 

However, there seems to be a bug (!) in SAS, at least in my version 9.4 TS1M2 (Windows 7, 64 bit), which creates a special missing value (._) for highfreq (plus an incorrect value for lowfreq) if the respective BY group does not contain any duplicate value of vle_b.

 

Example:

ods select none;
ods output ExtremeValues=buggy;
proc univariate data=test(obs=10) nextrval=1;
by vle_a;
var vle_b;
run;
ods select all;

proc print data=buggy;
run;

Result:

                 Var        Low              Low      High              High
Obs    vle_a    Name      Order    Low      Freq     Order    High      Freq

 1       A      vle_b         1     2          1         5     10          3
 2       B      vle_b         1     1          9         3      9          _

It seems that for BY group 'B' LowFreq has been set to the value of High, which makes no sense at all. In listing output the columns "Freq" in section "Extreme Values" are suppressed (without a good reason).

 

@all: Does anybody have a more recent release (ideally TS1M5) to check if this has been fixed already?

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1209 views
  • 0 likes
  • 6 in conversation