Hi Guys,
Need some advice here. I realized that the PROC MEANS command removes all other variables from my list. For instance, if I have variables A, B, C, D, E on my list, and I applied PROC MEANS on variable A, the rest of the variables (B,C,D,E) will be dropped from the output. This is the code I used:
PROC MEANS DATA=work.trial1 NOPRINT;
VAR RET;
OUTPUT OUT=trial2
N=N
MAX(RET)=MAX_RET;
BY DATENEW DIST;
RUN;
May I know, how can I write the code, such that Variables B, C, D, E show up in the output too.
Thank you guys so much for your help!
Cheers.
How about something like?:
proc sql;
create table test as
select *
from sashelp.class
group by sex
having age=max(age)
;
quit;
You can try the code below to see if you can get what you want:
proc sort data=trial2;by BY DATENEW DIST;run;
proc sort data=trial1;by BY DATENEW DIST;run;
data want;
merge trial2(in=a)trial1;
by DATENEW DIST;
if a;
run;
Benn,
First, you have to realize that the output from PROC MEANS contains far fewer observations than you started with. So if you are looking to add the summarized results to your original data set, then Linlin has the right idea. However ...
If you are looking for the output from PROC MEANS to contain additional variables, which values do you want? SAS would have to choose B, C, D, and E from the values across many observations that all went into the calculations for that summary line. If you don't care which of those values become part of the output, the ID statement would do it for you:
ID B C D E;
There is a rhyme and reason as to which values SAS selects, but it's a little tricky to explain. And there are a couple of choices as well. Best to study up on the ID statement.
Good luck.
SQL is good for this scenario .
proc sql;
select mean(a) as mean_a,b,c,d
from have
group by DATENEW,DIST
;quit;
Ksharp
Hi Ksharp,
thanks for the advice. The code got me very close to what I need. I am hoping that you could help me a little more.
The code that you provided drew out all the variables for all the observations from the input file. However. I am only looking for that row of variables observed for that observation.
E.g these are the following variables
Carmake Price Mileage Insurance
BMW $1000 200 $10
BMW $1200 80 $10
MERC $1500 50 $50
PROC SQL NOPRINT;
CREATE TABLE TRY2 AS SELECT MAX(PRICE) AS MAX_PRICE, FROM CARDATA GROUP BY CARMAKE;
and I was hoping that only the accompanying variables show up in the output. So for BMW, the variables that is accompanied by observation 2 will show up. (But now, the codes you provided, shows observation 1 as well- and maps the $1200 to it).
Hope I am not too confusing.
Thanks a lot Ksharp.
Hi Benn,
what you are asking is remerging summary statistics back to the original data.
Hope the following code helps to get the desired results.
proc sql;
create table test
as
select a.*,max(age) max_age_by_gender
from sashelp.class as a
group by sex
;quit;
proc print;
run;
Obs Name Sex Age Height Weight max_age_by_gender
1 Carol F 14 62.8 102.5 15
2 Alice F 13 56.5 84.0 15
3 Joyce F 11 51.3 50.5 15
4 Louise F 12 56.3 77.0 15
5 Barbara F 13 65.3 98.0 15
6 Janet F 15 62.5 112.5 15
7 Judy F 14 64.3 90.0 15
8 Jane F 12 59.8 84.5 15
9 Mary F 15 66.5 112.0 15
10 Robert M 12 64.8 128.0 16
11 James M 12 57.3 83.0 16
12 Henry M 14 63.5 102.5 16
13 Alfred M 14 69.0 112.5 16
14 John M 12 59.0 99.5 16
15 Jeffrey M 13 62.5 84.0 16
16 Thomas M 11 57.5 85.0 16
17 Philip M 16 72.0 150.0 16
18 William M 15 66.5 112.0 16
19 Ronald M 15 67.0 133.0 16
How about something like?:
proc sql;
create table test as
select *
from sashelp.class
group by sex
having age=max(age)
;
quit;
Hi Arthur,
thank you so much for your code! It gave me what I needed. I was wondering if you could help me a little more.
Basically, I have used your code to retrieve a set of data. Then I used the PROC RANK function to form deciles within dates.
PROC RANK
DATA=work.data OUT=data1 Groups=10;
BY DATE;
VAR PRICE;
RANKS R_PRICE;
RUN;
Then now I am trying to obtain the median within each decile at each date (and at the same time having all the variables associated with that median observation displayed in that output as well). I modified and used the following code:
PROC SQL;
CREATE TABLE data2 AS SELECT*
FROM work.data1 GROUP BY DATE,R_PRICE
HAVING PRICE=MEDIAN(PRICE);
QUIT;
However, the following error showed up,
"
WARNING: A GROUP BY clause has been transformed into an ORDER BY clause because neither the SELECT clause nor the optional HAVING clause of the associated table-expression referenced a summary function."
Thanks a lot Arthur!
Median is a a row-wise not an aggregate function in proc sql. Take a look at: http://www.sascommunity.org/wiki/Fun_with_PROC_SQL_Summary_Functions
If you need to use median values, calculate them with proc means or proc summary, and merge the results with your original data.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.