BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Hello every,

I'm new to SAS and I need to find the maximum value of a variable, put that value in another dataset, and repeat for another category. Example:

DATASET 1
V1 V2 V3
01 AA 03
01 BB 24
01 CC 98
02 DD 54
02 EE 12
02 FF 69
03 GG 55
03 HH 78
03 KK 21


DATASET 2
V1 V2 V3
01 CC 98
02 FF 69
03 HH 78


I have absolutely no idea how to do this so any help at all would be very much appreciated.

My code until now: (doubt it could be of any help but well...)
proc sort data=work.dataset1 out=work.temp; by V1 descending V3; run;
DATA work.dataset2;
SET work.temp;
BY V1;
/* trying to take the 1st row and add it to dataset2 */
IF (_N_=1) then do;
V1=V1; V2=V2; V3=V3;
END;
RUN;


Thank-you in advance. Message was edited by: SanitySlayer42
1 ACCEPTED SOLUTION

Accepted Solutions
DanielSantos
Barite | Level 11

Editor's Note: Thanks to all who contributed to this solution.  If the variable is numeric, there are also a number of procedures that can output the MAX statistic, including PROC MEANS  and PROC REPORT.

 

It's actually quite simple to do.

After sorting your data the way is should (V1 descending V3), you just need to check the value of the FIRST automatic variable to identify the first observation of the V1 group.

Here's you're code with the modification:
  

proc sort data=work.dataset1 out=work.temp; 
 by V1 descending V3; 
run;

data work.dataset2;
 set work.temp;
 by V1;
 if first.V1; * first member of the group then output;
run;


 
Cheers from Portugal

Daniel Santos @ www.cgd.pt

 

View solution in original post

12 REPLIES 12
Peter_C
Rhodochrosite | Level 12
something like
proc sql ;
create table dataset2 as
select a.v1, a.v2, a.v3
from dataset1 as a
join ( select v1 from dataset1
group by v1 having v2= max(v2) ) as b
on a.v1 = b.v2
;
mftuchman
Quartz | Level 8
You want something like this.

data try1;
infile datalines;
input cat01 $ v2 $ v3;
datalines;
01 AA 03
01 BB 24
01 CC 98
02 DD 54
02 EE 12
02 FF 69
03 GG 55
03 HH 78
03 KK 21
;

/* method 1 */
proc means nway max printidvars;
class cat01;
id v2; /* saves the value if v2 corresponding to the max */
output out=data2;
run;

proc print data=data2;
where _stat_='MAX';
var cat01 v2 v3;
run;
mftuchman
Quartz | Level 8
As implied,here is method 2. This uses the output statement more precisely to direct the output. maxid(v3)=v2 means that you want an output for each value of cat01 to be the value of v2 that maximizes v3.

You still want to output the actual maximum, so also include max(v3)= to save the maximum value.
[pre]
/* method 2 */
proc means data=try1 nway noprint;
class cat01;
id v2;
output out=data2 maxid(v3)=v2 max(v3)=;
run;


proc print data=data2;
var cat01 v2 v3;
run;
[/pre]

Message was edited by: mftuchman Message was edited by: mftuchman
Added spacing.
Peter_C
Rhodochrosite | Level 12
sorry folks!
my little piece of sql was untested and inevitably wrong!
The objective I targetted was collecting the row of original data where the column V2 holds the maximum value within the group of V1. For that objective, V3 may not be the maximum value within the V1 group.
A revised (and tested) version follows:
proc sql ;
create table dataset2 as select a.* from dataset1 as a
join ( select v1, v2 from dataset1 group by v1 having v2= max(v2) ) as b
on a.v1 = b.v1 and a.v2= b.v2
;
quit ;

PeterC
DanielSantos
Barite | Level 11

Editor's Note: Thanks to all who contributed to this solution.  If the variable is numeric, there are also a number of procedures that can output the MAX statistic, including PROC MEANS  and PROC REPORT.

 

It's actually quite simple to do.

After sorting your data the way is should (V1 descending V3), you just need to check the value of the FIRST automatic variable to identify the first observation of the V1 group.

Here's you're code with the modification:
  

proc sort data=work.dataset1 out=work.temp; 
 by V1 descending V3; 
run;

data work.dataset2;
 set work.temp;
 by V1;
 if first.V1; * first member of the group then output;
run;


 
Cheers from Portugal

Daniel Santos @ www.cgd.pt

 

deleted_user
Not applicable
Hello, thank-you all for your help.

mftuchman's method doesn't appear to produce the correct result. It produces the dataset

V1 V2 V3
01 CC 98
02 FF 69
03 KK 78

(note the difference when v1=03. It gives KK in V2 and not HH, hence is sorting it by V2 and not by V3)

@Peter C: I thank-you very much for your help but unfortunately I am not meant to know about SQL just yet. I am meant to use something from the BASE package.

@Daniel Santos : This seems more like what I'm looking for but there is just a little problem. It outputs all the variables in the exact same order, how could I go about changing your code to remove/rearrange certain variables?

Example:

From this:
V1 V2 V3
01 AA 03
01 BB 24
01 CC 98
02 DD 54
02 EE 12
02 FF 69
03 GG 55
03 HH 78
03 KK 21

TO THIS:
V2 V1
98 01
69 02
78 03

I haved tried by doing like so:
proc sort data=TEMP; by V1 descending V3; run;

DATA DATA2;
set TEMP;
by V1;
if first.Region then output V2 V3;
RUN;


But I get a "Data set was not specified on the DATA statement." error. What must I do?
DanielSantos
Barite | Level 11
OK.

There is more than one way to rearrange the variable layout of a table.

I usually do this with the LENGTH statement, which will redefine variables before reading the layout of your input table, thus reordering the layout.
For this you will need to redefine the actual length and type of each variable (see link about the LENGTH statement).

Dropping or keeping vars, can be done with the DROP/KEEP statement.

something like this:

[pre]data new_data;
length FIRST_VAR 8 SECOND_VAR 8 THIRD_VAR $30; * redefine layout;
set old_data;
keep FIRST_VAR SECOND_VAR THIRD_VAR; * keep vars;
run;[/pre]

More about LENGTH statement:
http://support.sas.com/documentation/cdl/en/lrdict/62618/HTML/default/a000218807.htm

KEEP and DROP:
http://support.sas.com/documentation/cdl/en/lrdict/62618/HTML/default/a000202978.htm
http://support.sas.com/documentation/cdl/en/lrdict/62618/HTML/default/a000202902.htm

Cheers from Portugal

Daniel Santos @ www.cgd.pt
deleted_user
Not applicable
Excellent! I got exactly what I wanted, and best of all the code is just a few lines.

Thank-you very much everyone for your help
mftuchman
Quartz | Level 8
Yes, thank you for pointing that out. However, I did not want to simply give up on maxid, since it was designed to do precisely what you asked. You still consider it for future issues of the same type. Here is the corrected code. I verified that it works.


[pre]
data try1;
infile datalines;
input v1 $ v2 $ v3;
datalines;
01 AA 03
01 BB 24
01 CC 98
02 DD 54
02 EE 12
02 FF 69
03 GG 55
03 HH 78
03 KK 21
;

/* method 1 */
proc means data=try1 nway noprint;
class v1;
id v2;
var v3;
/* avoid duplicating the v2 variable
or the wrong result may be printed */
output out=fixedit maxid(v3(v2))=max_factor max=;
run;

proc print label;
label max_factor='v2';
id v1;
var max_factor v3;
run;
[/pre]


And I learned something too, so thanks.

Generally, I like to use a single proc rather than glue things together with SQL if at all possible. Message was edited by: mftuchman
data_null__
Jade | Level 19
Check out IDGROUP, OUTPUT statement option. I think you may find it to be the superior replacement for ID and MAXID.

[pre]
proc means data=try1 nway noprint;
class v1;
output out=UsingIDGROUP idgroup(max(v3) out(v2 v3)=);
run;
proc print;
run;
[/pre]
mftuchman
Quartz | Level 8
I appreciate the context. I wish I had seen example like this in the documentation for IDGROUP.
SAS_8
Calcite | Level 5

proc sort data=set1 out=set_sort;
by v1 descending v3;
run;
proc sort data=set_sort out=set_highest1 nodup;
by v1;
run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 91220 views
  • 1 like
  • 6 in conversation