The moment I subset the output of proc means things get complicated.
For the where clause I don't have all the obs from the input data set.
I haven't found any ods table that outputs the "real" used observations, like it is possible with proc sgplot for example.
From the output table I could build a filter with "proc sql select into :" to subset my in-data set, but it's a lot of work.
Therefore the best option I found for the moment is to cycle through different num variables in the data set and use them in conjunction with ID variable and maxid and minid. this way I get at least 2 IDs for any row of the proc means output.
Doing this over an over brings me closer to the real amount of obs used.
If I have few obs per row in proc means output, it's ok, but for large numbers I must write a macro.
DATA CARS;
SET SASHELP.CARS;
OBS_ID=_N_;
RUN;
%LET CH=Invoice;
PROC MEANS DATA=WORK.CARS NOPRINT ;
CLASS Make DriveTrain ;
VAR MSRP;
ID OBS_ID;
OUTPUT OUT=PROM(WHERE=(_TYPE_ = 3 AND _FREQ_ GT 10))
MEDIAN=MED MEAN=MEAN MINID(&CH)=MIN_ID MAXID(&CH)=MAX_ID / LEVELS;
RUN;
Hi @acordes,
Maybe you can use PROC SQL instead of PROC MEANS:
/* Create test data */
data have;
call streaminit(27182818);
call stream(268);
id=5;
do make='A', 'B', 'C';
do gearbox=3,2,1;
do _n_=1 to rand('integer',23);
id=mod(67*id, 101);
cyl=4*rand('table',.25,.5,.25);
output;
end;
end;
end;
run; /* 100 obs. */
/* Instead of PROC MEANS ...
proc means data=have n mean median;
class make gearbox;
var cyl;
output out=stats(where=(_type_=3 & mean_cyl=median_cyl)) mean=mean_cyl median=median_cyl;
run;
... use PROC SQL: */
proc sql;
create table want as
select make, gearbox, n(cyl) as n, mean(cyl) as mean_cyl, median(cyl) as median_cyl, id
from have
group by make, gearbox
having mean_cyl=median_cyl
order by 1,2,6;
quit;
proc print data=want;
by make gearbox n me:;
id id;
run;
Dataset STATS would only show the summary statistics for the selected CLASS levels:
median_ make gearbox _TYPE_ _FREQ_ mean_cyl cyl A 1 3 23 8 8 A 3 3 6 10 10 B 3 3 4 8 8 C 2 3 5 8 8
PROC SQL, however, yields both summary statistics and individual ID values (thanks to remerging).
Partial PROC PRINT output (dataset WANT):
make=A gearbox=1 n=23 mean_cyl=8 median_cyl=8 id 3 4 17 20 21 27 28 34 36 38 39 41 48 56 58 66 79 85 88 89 92 94 100 make=A gearbox=3 n=6 mean_cyl=10 median_cyl=10 id 14 23 25 26 32 59 (...)
@acordes wrote:
Imagine that I am only interested in the _type_=4 which could be the the Crossing of Make and Gear Box and in its line might apper the statistics for 23 members that fall into this group.
No, _type_=4 (=power of 2) represents always values of a single CLASS variable.
It’s not clear what you have and what you want from this post. Can you include an example of what you have or what you would want as output using sashelp.cars perhaps?
@acordes wrote:
The moment I subset the output of proc means things get complicated.
For the where clause I don't have all the obs from the input data set.
I haven't found any ods table that outputs the "real" used observations, like it is possible with proc sgplot for example.
From the output table I could build a filter with "proc sql select into :" to subset my in-data set, but it's a lot of work.
Therefore the best option I found for the moment is to cycle through different num variables in the data set and use them in conjunction with ID variable and maxid and minid. this way I get at least 2 IDs for any row of the proc means output.
Doing this over an over brings me closer to the real amount of obs used.
If I have few obs per row in proc means output, it's ok, but for large numbers I must write a macro.
DATA CARS; SET SASHELP.CARS; OBS_ID=_N_; RUN; %LET CH=Invoice; PROC MEANS DATA=WORK.CARS NOPRINT ; CLASS Make DriveTrain ; VAR MSRP; ID OBS_ID; OUTPUT OUT=PROM(WHERE=(_TYPE_ = 3 AND _FREQ_ GT 10)) MEDIAN=MED MEAN=MEAN MINID(&CH)=MIN_ID MAXID(&CH)=MAX_ID / LEVELS; RUN;
If what you want is to select high frequency classes, you could use:
PROC MEANS DATA=SASHELP.CARS NOPRINT NWAY;
CLASS Make DriveTrain ;
VAR MSRP;
OUTPUT OUT=PROM MEDIAN=MED MEAN=MEAN;
RUN;
proc sql;
create table popularMakes as
select a.*, b.mean, b.med
from
sashelp.cars as a inner join
prom as b on a.make=b.make and a.DriveTrain=b.DriveTrain
where b._freq_ > 10;
quit;
Thanks, but I am not interested in the high frequency ones.
As I subset the output data set by selecting only a specific _type_ that furthermore meets a conditon (here that median and mean are equal, for me it serves as a control for the quality of the want data set), I lose track of the observations with its ID that finally underpin the output statistics.
I want to know every member of each row of the output data set.
For sure I could buid a where filter clause that selects only these observations from the incoming data set, but this seems suboptimal to me as it envolves a lot of work.
Hi @acordes,
Maybe you can use PROC SQL instead of PROC MEANS:
/* Create test data */
data have;
call streaminit(27182818);
call stream(268);
id=5;
do make='A', 'B', 'C';
do gearbox=3,2,1;
do _n_=1 to rand('integer',23);
id=mod(67*id, 101);
cyl=4*rand('table',.25,.5,.25);
output;
end;
end;
end;
run; /* 100 obs. */
/* Instead of PROC MEANS ...
proc means data=have n mean median;
class make gearbox;
var cyl;
output out=stats(where=(_type_=3 & mean_cyl=median_cyl)) mean=mean_cyl median=median_cyl;
run;
... use PROC SQL: */
proc sql;
create table want as
select make, gearbox, n(cyl) as n, mean(cyl) as mean_cyl, median(cyl) as median_cyl, id
from have
group by make, gearbox
having mean_cyl=median_cyl
order by 1,2,6;
quit;
proc print data=want;
by make gearbox n me:;
id id;
run;
Dataset STATS would only show the summary statistics for the selected CLASS levels:
median_ make gearbox _TYPE_ _FREQ_ mean_cyl cyl A 1 3 23 8 8 A 3 3 6 10 10 B 3 3 4 8 8 C 2 3 5 8 8
PROC SQL, however, yields both summary statistics and individual ID values (thanks to remerging).
Partial PROC PRINT output (dataset WANT):
make=A gearbox=1 n=23 mean_cyl=8 median_cyl=8 id 3 4 17 20 21 27 28 34 36 38 39 41 48 56 58 66 79 85 88 89 92 94 100 make=A gearbox=3 n=6 mean_cyl=10 median_cyl=10 id 14 23 25 26 32 59 (...)
@acordes wrote:
Imagine that I am only interested in the _type_=4 which could be the the Crossing of Make and Gear Box and in its line might apper the statistics for 23 members that fall into this group.
No, _type_=4 (=power of 2) represents always values of a single CLASS variable.
Thanks to all for the responses.
It's my fault to not have explained well what I want.
Let's suppose that I read 100 observations into PROC Means. I use several class variables which will lead to different _type_ analysis in the output. I.e. without using nway option or restricting the _type_, the 0 level will have the statistics for all 100 variables.
Imagine that I am only interested in the _type_=4 which could be the the Crossing of Make and Gear Box and in its line might apper the statistics for 23 members that fall into this group.
I want to unveil the members of the group, which 23 observations participated in the row statistics.
Something similar to the option in proc sgplot:
proc sgplot noautolegend nocycleattrs; ods output sgplot=sg;
Thanks @acordes,
The number of iterations of the innermost DO loop is a random number. In addition to a generic ("unsuspicious") seed value for the CALL STREAMINIT routine I used the CALL STREAM routine in order to specify a random-number stream. I had tested the arguments 1, 2, ..., 1000 for CALL STREAM and selected one (268) that happened to yield exactly 100 observations, as desired (knowing that this might look somewhat mysterious to the attentive reader). 🙂
The reason that the IDs are free of duplicates is essentially the number-theoretic fact that the multiplicative order of 67 modulo 101 is 100. Again, I had tried a few candidates and picked 67. Multiplying the powers of 67 by 5 (modulo 101) doesn't create duplicates either because 101 is a prime number and hence the ring of integers modulo 101 is a field (a fact that I knew from algebra).
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.