DATA Step, Macro, Functions and more

Maximum value of a column (variable)

Accepted Solution Solved
Reply
N/A
Posts: 0
Accepted Solution

Maximum value of a column (variable)

Hello every,

I'm new to SAS and I need to find the maximum value of a variable, put that value in another dataset, and repeat for another category. Example:

DATASET 1
V1 V2 V3
01 AA 03
01 BB 24
01 CC 98
02 DD 54
02 EE 12
02 FF 69
03 GG 55
03 HH 78
03 KK 21


DATASET 2
V1 V2 V3
01 CC 98
02 FF 69
03 HH 78


I have absolutely no idea how to do this so any help at all would be very much appreciated.

My code until now: (doubt it could be of any help but well...)
proc sort data=work.dataset1 out=work.temp; by V1 descending V3; run;
DATA work.dataset2;
SET work.temp;
BY V1;
/* trying to take the 1st row and add it to dataset2 */
IF (_N_=1) then do;
V1=V1; V2=V2; V3=V3;
END;
RUN;


Thank-you in advance. Message was edited by: SanitySlayer42

Accepted Solutions
Solution
‎09-28-2016 12:41 PM
Super Contributor
Posts: 474

Re: Maximum value of a column (variable)

[ Edited ]

Editor's Note: Thanks to all who contributed to this solution.  If the variable is numeric, there are also a number of procedures that can output the MAX statistic, including PROC MEANS  and PROC REPORT.

 

It's actually quite simple to do.

After sorting your data the way is should (V1 descending V3), you just need to check the value of the FIRST automatic variable to identify the first observation of the V1 group.

Here's you're code with the modification:
  

proc sort data=work.dataset1 out=work.temp; 
 by V1 descending V3; 
run;

data work.dataset2;
 set work.temp;
 by V1;
 if first.V1; * first member of the group then output;
run;


 
Cheers from Portugal

Daniel Santos @ www.cgd.pt

 

View solution in original post


All Replies
Valued Guide
Posts: 2,175

Re: Maximum value of a column (variable)

something like
proc sql ;
create table dataset2 as
select a.v1, a.v2, a.v3
from dataset1 as a
join ( select v1 from dataset1
group by v1 having v2= max(v2) ) as b
on a.v1 = b.v2
;
Contributor
Posts: 42

Re: Maximum value of a column (variable)

You want something like this.

data try1;
infile datalines;
input cat01 $ v2 $ v3;
datalines;
01 AA 03
01 BB 24
01 CC 98
02 DD 54
02 EE 12
02 FF 69
03 GG 55
03 HH 78
03 KK 21
;

/* method 1 */
proc means nway max printidvars;
class cat01;
id v2; /* saves the value if v2 corresponding to the max */
output out=data2;
run;

proc print data=data2;
where _stat_='MAX';
var cat01 v2 v3;
run;
Contributor
Posts: 42

Re: Maximum value of a column (variable)

As implied,here is method 2. This uses the output statement more precisely to direct the output. maxid(v3)=v2 means that you want an output for each value of cat01 to be the value of v2 that maximizes v3.

You still want to output the actual maximum, so also include max(v3)= to save the maximum value.
[pre]
/* method 2 */
proc means data=try1 nway noprint;
class cat01;
id v2;
output out=data2 maxid(v3)=v2 max(v3)=;
run;


proc print data=data2;
var cat01 v2 v3;
run;
[/pre]

Message was edited by: mftuchman Message was edited by: mftuchman
Added spacing.
Valued Guide
Posts: 2,175

Re: Maximum value of a column (variable)

sorry folks!
my little piece of sql was untested and inevitably wrong!
The objective I targetted was collecting the row of original data where the column V2 holds the maximum value within the group of V1. For that objective, V3 may not be the maximum value within the V1 group.
A revised (and tested) version follows:
proc sql ;
create table dataset2 as select a.* from dataset1 as a
join ( select v1, v2 from dataset1 group by v1 having v2= max(v2) ) as b
on a.v1 = b.v1 and a.v2= b.v2
;
quit ;

PeterC
Solution
‎09-28-2016 12:41 PM
Super Contributor
Posts: 474

Re: Maximum value of a column (variable)

[ Edited ]

Editor's Note: Thanks to all who contributed to this solution.  If the variable is numeric, there are also a number of procedures that can output the MAX statistic, including PROC MEANS  and PROC REPORT.

 

It's actually quite simple to do.

After sorting your data the way is should (V1 descending V3), you just need to check the value of the FIRST automatic variable to identify the first observation of the V1 group.

Here's you're code with the modification:
  

proc sort data=work.dataset1 out=work.temp; 
 by V1 descending V3; 
run;

data work.dataset2;
 set work.temp;
 by V1;
 if first.V1; * first member of the group then output;
run;


 
Cheers from Portugal

Daniel Santos @ www.cgd.pt

 

N/A
Posts: 0

Re: Maximum value of a column (variable)

Hello, thank-you all for your help.

mftuchman's method doesn't appear to produce the correct result. It produces the dataset

V1 V2 V3
01 CC 98
02 FF 69
03 KK 78

(note the difference when v1=03. It gives KK in V2 and not HH, hence is sorting it by V2 and not by V3)

@Peter C: I thank-you very much for your help but unfortunately I am not meant to know about SQL just yet. I am meant to use something from the BASE package.

@Daniel Santos : This seems more like what I'm looking for but there is just a little problem. It outputs all the variables in the exact same order, how could I go about changing your code to remove/rearrange certain variables?

Example:

From this:
V1 V2 V3
01 AA 03
01 BB 24
01 CC 98
02 DD 54
02 EE 12
02 FF 69
03 GG 55
03 HH 78
03 KK 21

TO THIS:
V2 V1
98 01
69 02
78 03

I haved tried by doing like so:
proc sort data=TEMP; by V1 descending V3; run;

DATA DATA2;
set TEMP;
by V1;
if first.Region then output V2 V3;
RUN;


But I get a "Data set was not specified on the DATA statement." error. What must I do?
Super Contributor
Posts: 474

Re: Maximum value of a column (variable)

OK.

There is more than one way to rearrange the variable layout of a table.

I usually do this with the LENGTH statement, which will redefine variables before reading the layout of your input table, thus reordering the layout.
For this you will need to redefine the actual length and type of each variable (see link about the LENGTH statement).

Dropping or keeping vars, can be done with the DROP/KEEP statement.

something like this:

[pre]data new_data;
length FIRST_VAR 8 SECOND_VAR 8 THIRD_VAR $30; * redefine layout;
set old_data;
keep FIRST_VAR SECOND_VAR THIRD_VAR; * keep vars;
run;[/pre]

More about LENGTH statement:
http://support.sas.com/documentation/cdl/en/lrdict/62618/HTML/default/a000218807.htm

KEEP and DROP:
http://support.sas.com/documentation/cdl/en/lrdict/62618/HTML/default/a000202978.htm
http://support.sas.com/documentation/cdl/en/lrdict/62618/HTML/default/a000202902.htm

Cheers from Portugal

Daniel Santos @ www.cgd.pt
N/A
Posts: 0

Re: Maximum value of a column (variable)

Excellent! I got exactly what I wanted, and best of all the code is just a few lines.

Thank-you very much everyone for your help
Contributor
Posts: 42

Re: Maximum value of a column (variable)

Yes, thank you for pointing that out. However, I did not want to simply give up on maxid, since it was designed to do precisely what you asked. You still consider it for future issues of the same type. Here is the corrected code. I verified that it works.


[pre]
data try1;
infile datalines;
input v1 $ v2 $ v3;
datalines;
01 AA 03
01 BB 24
01 CC 98
02 DD 54
02 EE 12
02 FF 69
03 GG 55
03 HH 78
03 KK 21
;

/* method 1 */
proc means data=try1 nway noprint;
class v1;
id v2;
var v3;
/* avoid duplicating the v2 variable
or the wrong result may be printed */
output out=fixedit maxid(v3(v2))=max_factor max=;
run;

proc print label;
label max_factor='v2';
id v1;
var max_factor v3;
run;
[/pre]


And I learned something too, so thanks.

Generally, I like to use a single proc rather than glue things together with SQL if at all possible. Message was edited by: mftuchman
Respected Advisor
Posts: 3,777

Re: Maximum value of a column (variable)

Check out IDGROUP, OUTPUT statement option. I think you may find it to be the superior replacement for ID and MAXID.

[pre]
proc means data=try1 nway noprint;
class v1;
output out=UsingIDGROUP idgroup(max(v3) out(v2 v3)=);
run;
proc print;
run;
[/pre]
Contributor
Posts: 42

Re: Maximum value of a column (variable)

I appreciate the context. I wish I had seen example like this in the documentation for IDGROUP.
Regular Learner
Posts: 1

Re: Maximum value of a column (variable)

proc sort data=set1 out=set_sort;
by v1 descending v3;
run;
proc sort data=set_sort out=set_highest1 nodup;
by v1;
run;

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 12 replies
  • 37558 views
  • 0 likes
  • 6 in conversation