Hi All,
I have below data set (have) and need output like below(need) :
TIA
DATA HAVE;
INPUT ID PRICE YEARMONTH;
CARDS;
1 23 2312
1 24 2401
1 25 2402
1 36 2403
2 45 2312
2 34 2401
2 53 2402
2 23 2403
2 664 2404
;
RUN;
DATA need;
INPUT ID PRICE YEARMONTH;
CARDS;
1 36 2403
2 664 2404
;
RUN;
PROC SQL;
SELECT * FROM HAVE
GROUP BY ID ,YEARMONTH
HAVING MAX(YEARMONTH) ;
QUIT;
PROC SUMMARY is the tool you want to use, not PROC SQL. See Maxim 7 and Maxim 14 and Maxim 10.
proc summary data=have nway;
class id;
var price;
output out=want max= maxid(price(yearmonth))=yearmonth;
run;
As a general piece of advice, you want to compute statistics in the proper PROC, in this case PROC SUMMARY but in other cases PROC FREQ. Using these PROCs will provide you with better and more powerful tools (that take less time to program and less time to execute) than PROC SQL. This is particularly useful if you have lots of variables, or lots of slices (here the only slice is by ID). The idea of doing your statistics in PROC SQL is an idea which you should avoid. For small problems like this, probably there is not really a noticeable difference between PROC SQL and PROC SUMMARY, but for larger real world problems, the difference can be dramatic (in favor of PROC SUMMARY or PROC FREQ).
can I please get in terms of proc sql as I have many more columns to add along with the join .
Regards
kajal
Many more columns is a situation that is handled much easier in PROC SUMMARY
To demonstrate a bit more complex example of @PaigeMiller's advice please run the following code. You should have the SASHELP.CLASS data set provided by SAS for training and practice.
Proc summary data=sashelp.class; class sex age; var height weight; output out=work.class_summary min= max= mean= std= /autoname; run;
The output data set contains multiple summaries of the variables height and weight with the requested statistics for height and weight. Note the _type_ variable provides information about combinations of the class variables. So we get a summary overall ( type=0), by age only (type=1), by sex only (type=2) and Sex crossed with age categories (type=3 ).
To get this in Proc SQL you would have to write 4 separate Proc SQL selects with different group by . Also the handy option AUTONAME, which appends the statistic to the variable name, means you don't have to expressly provide a written out name for each stat replacing multiple
mean(var) as var_mean
, max(var) as var_max
etc.
One project I have requires reporting on pretty much every crossing or single level of 7 different class variables plus the overall stats for 30 or so report variables. ONE call to proc summary creates all of the values in one data set. Select based on the value of _TYPE_ for specific report tables. About 2 pages (without the comments) of code generates about 4,000 pages of documents for the user to reference as needed.
I also need to do a join so I am not able to use the join with the proc that's why It will easier for me to use Proc sql
Thanks
You can run a join on the output of PROC SUMMARY
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.