Re: Need max value for each group by column

kajal_30 · Posted 10-09-2024 01:25 PM

Hi All,

I have below data set (have) and need output like below(need) :

TIA

DATA HAVE;
INPUT ID PRICE YEARMONTH;
CARDS;
1 23 2312
1 24 2401
1 25 2402
1 36 2403
2 45 2312
2 34 2401
2 53 2402
2 23 2403
2 664 2404
;
RUN;

DATA need;
INPUT ID PRICE YEARMONTH;
CARDS;
1 36 2403
2 664 2404
;
RUN;

PROC SQL;
SELECT * FROM HAVE 
GROUP BY ID ,YEARMONTH
HAVING MAX(YEARMONTH) ;
QUIT;

PaigeMiller · Posted 10-09-2024 01:34 PM

PROC SUMMARY is the tool you want to use, not PROC SQL. See Maxim 7 and Maxim 14 and Maxim 10.

proc summary data=have nway;
    class id;
    var price;
    output out=want max= maxid(price(yearmonth))=yearmonth;
run;

As a general piece of advice, you want to compute statistics in the proper PROC, in this case PROC SUMMARY but in other cases PROC FREQ. Using these PROCs will provide you with better and more powerful tools (that take less time to program and less time to execute) than PROC SQL. This is particularly useful if you have lots of variables, or lots of slices (here the only slice is by ID). The idea of doing your statistics in PROC SQL is an idea which you should avoid. For small problems like this, probably there is not really a noticeable difference between PROC SQL and PROC SUMMARY, but for larger real world problems, the difference can be dramatic (in favor of PROC SUMMARY or PROC FREQ).

--
Paige Miller

kajal_30 · Posted 10-09-2024 02:01 PM

can I please get in terms of proc sql as I have many more columns to add along with the join .

Regards

kajal

PaigeMiller · Posted 10-09-2024 02:04 PM

Many more columns is a situation that is handled much easier in PROC SUMMARY

--
Paige Miller

ballardw · Posted 10-09-2024 02:10 PM

To demonstrate a bit more complex example of @PaigeMiller's advice please run the following code. You should have the SASHELP.CLASS data set provided by SAS for training and practice.

Proc summary data=sashelp.class;
    class sex age;
    var height weight;
    output out=work.class_summary min= max= mean= std= /autoname;
run;

The output data set contains multiple summaries of the variables height and weight with the requested statistics for height and weight. Note the _type_ variable provides information about combinations of the class variables. So we get a summary overall ( type=0), by age only (type=1), by sex only (type=2) and Sex crossed with age categories (type=3 ).

To get this in Proc SQL you would have to write 4 separate Proc SQL selects with different group by . Also the handy option AUTONAME, which appends the statistic to the variable name, means you don't have to expressly provide a written out name for each stat replacing multiple

mean(var) as var_mean

, max(var) as var_max

etc.

One project I have requires reporting on pretty much every crossing or single level of 7 different class variables plus the overall stats for 30 or so report variables. ONE call to proc summary creates all of the values in one data set. Select based on the value of _TYPE_ for specific report tables. About 2 pages (without the comments) of code generates about 4,000 pages of documents for the user to reference as needed.

kajal_30 · Posted 10-09-2024 02:14 PM

I also need to do a join so I am not able to use the join with the proc that's why It will easier for me to use Proc sql

Thanks

PaigeMiller · Posted 10-09-2024 02:30 PM

You can run a join on the output of PROC SUMMARY

--
Paige Miller

Need max value for each group by column