BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
acordes
Rhodochrosite | Level 12

The moment I subset the output of proc means things get complicated.

For the where clause I don't have all the obs from the input data set.

I haven't found any ods table that outputs the "real" used observations, like it is possible with proc sgplot for example. 

From the output table I could build a filter with "proc sql select into :" to subset my in-data set, but it's a lot of work. 

Therefore the best option I found for the moment is to cycle through different num variables in the data set and use them in conjunction with ID variable and maxid and minid. this way I get at least 2 IDs for any row of the proc means output.

Doing this over an over brings me closer to the real amount of obs used. 

If I have few obs per row in proc means output, it's ok, but for large numbers I must write a macro.

 

DATA CARS;
SET SASHELP.CARS;
OBS_ID=_N_;
RUN;


%LET CH=Invoice;
PROC MEANS DATA=WORK.CARS NOPRINT ;
CLASS Make DriveTrain ;
VAR  MSRP;
ID  OBS_ID;
OUTPUT OUT=PROM(WHERE=(_TYPE_ = 3 AND _FREQ_ GT 10))
MEDIAN=MED MEAN=MEAN  MINID(&CH)=MIN_ID MAXID(&CH)=MAX_ID /  LEVELS;
RUN;

proc means.png

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hi @acordes,

 

Maybe you can use PROC SQL instead of PROC MEANS:

/* Create test data */

data have;
call streaminit(27182818);
call stream(268);
id=5;
do make='A', 'B', 'C';
  do gearbox=3,2,1;
    do _n_=1 to rand('integer',23);
      id=mod(67*id, 101);
      cyl=4*rand('table',.25,.5,.25);
      output;
    end;
  end;
end;
run; /* 100 obs. */

/* Instead of PROC MEANS ...

proc means data=have n mean median;
class make gearbox;
var cyl;
output out=stats(where=(_type_=3 & mean_cyl=median_cyl)) mean=mean_cyl median=median_cyl;
run;

... use PROC SQL: */

proc sql;
create table want as
select make, gearbox, n(cyl) as n, mean(cyl) as mean_cyl, median(cyl) as median_cyl, id
from have
group by make, gearbox
having mean_cyl=median_cyl
order by 1,2,6;
quit;

proc print data=want;
by make gearbox n me:;
id id;
run;

Dataset STATS would only show the summary statistics for the selected CLASS levels:

                                                   median_
make    gearbox    _TYPE_    _FREQ_    mean_cyl      cyl

 A         1          3        23          8           8
 A         3          3         6         10          10
 B         3          3         4          8           8
 C         2          3         5          8           8

PROC SQL, however, yields both summary statistics and individual ID values (thanks to remerging).

Partial PROC PRINT output (dataset WANT):

make=A gearbox=1 n=23 mean_cyl=8 median_cyl=8

 id

  3
  4
 17
 20
 21
 27
 28
 34
 36
 38
 39
 41
 48
 56
 58
 66
 79
 85
 88
 89
 92
 94
100


make=A gearbox=3 n=6 mean_cyl=10 median_cyl=10

id

14
23
25
26
32
59

(...)
@acordes wrote:

Imagine that I am only interested in the _type_=4 which could be the the Crossing of Make and Gear Box and in its line might apper the statistics for 23 members that fall into this group.

 

No, _type_=4 (=power of 2) represents always values of a single CLASS variable.

View solution in original post

8 REPLIES 8
Reeza
Super User

It’s not clear what you have and what you want from this post. Can you include an example of what you have or what you would want as output using sashelp.cars perhaps?

 


@acordes wrote:

The moment I subset the output of proc means things get complicated.

For the where clause I don't have all the obs from the input data set.

I haven't found any ods table that outputs the "real" used observations, like it is possible with proc sgplot for example. 

From the output table I could build a filter with "proc sql select into :" to subset my in-data set, but it's a lot of work. 

Therefore the best option I found for the moment is to cycle through different num variables in the data set and use them in conjunction with ID variable and maxid and minid. this way I get at least 2 IDs for any row of the proc means output.

Doing this over an over brings me closer to the real amount of obs used. 

If I have few obs per row in proc means output, it's ok, but for large numbers I must write a macro.

 

DATA CARS;
SET SASHELP.CARS;
OBS_ID=_N_;
RUN;


%LET CH=Invoice;
PROC MEANS DATA=WORK.CARS NOPRINT ;
CLASS Make DriveTrain ;
VAR  MSRP;
ID  OBS_ID;
OUTPUT OUT=PROM(WHERE=(_TYPE_ = 3 AND _FREQ_ GT 10))
MEDIAN=MED MEAN=MEAN  MINID(&CH)=MIN_ID MAXID(&CH)=MAX_ID /  LEVELS;
RUN;

proc means.png


 

PGStats
Opal | Level 21

If what you want is to select high frequency classes, you could use:

 

PROC MEANS DATA=SASHELP.CARS NOPRINT NWAY;
CLASS Make DriveTrain ;
VAR  MSRP;
OUTPUT OUT=PROM MEDIAN=MED MEAN=MEAN;
RUN;

proc sql;
create table popularMakes as
select a.*, b.mean, b.med  
from 
    sashelp.cars as a inner join
    prom as b on a.make=b.make and a.DriveTrain=b.DriveTrain
where b._freq_ > 10;
quit;
PG
acordes
Rhodochrosite | Level 12

Thanks, but I am not interested in the high frequency ones. 

As I subset the output data set by selecting only a specific _type_ that furthermore meets a conditon (here that median and mean are equal, for me it serves as a control for the quality of the want data set), I lose track of the observations with its ID that finally underpin the output statistics. 

I want to know every member of each row of the output data set.

For sure I could buid a where filter clause that selects only these observations from the incoming data set, but this seems suboptimal to me as it envolves a lot of work. 

FreelanceReinh
Jade | Level 19

Hi @acordes,

 

Maybe you can use PROC SQL instead of PROC MEANS:

/* Create test data */

data have;
call streaminit(27182818);
call stream(268);
id=5;
do make='A', 'B', 'C';
  do gearbox=3,2,1;
    do _n_=1 to rand('integer',23);
      id=mod(67*id, 101);
      cyl=4*rand('table',.25,.5,.25);
      output;
    end;
  end;
end;
run; /* 100 obs. */

/* Instead of PROC MEANS ...

proc means data=have n mean median;
class make gearbox;
var cyl;
output out=stats(where=(_type_=3 & mean_cyl=median_cyl)) mean=mean_cyl median=median_cyl;
run;

... use PROC SQL: */

proc sql;
create table want as
select make, gearbox, n(cyl) as n, mean(cyl) as mean_cyl, median(cyl) as median_cyl, id
from have
group by make, gearbox
having mean_cyl=median_cyl
order by 1,2,6;
quit;

proc print data=want;
by make gearbox n me:;
id id;
run;

Dataset STATS would only show the summary statistics for the selected CLASS levels:

                                                   median_
make    gearbox    _TYPE_    _FREQ_    mean_cyl      cyl

 A         1          3        23          8           8
 A         3          3         6         10          10
 B         3          3         4          8           8
 C         2          3         5          8           8

PROC SQL, however, yields both summary statistics and individual ID values (thanks to remerging).

Partial PROC PRINT output (dataset WANT):

make=A gearbox=1 n=23 mean_cyl=8 median_cyl=8

 id

  3
  4
 17
 20
 21
 27
 28
 34
 36
 38
 39
 41
 48
 56
 58
 66
 79
 85
 88
 89
 92
 94
100


make=A gearbox=3 n=6 mean_cyl=10 median_cyl=10

id

14
23
25
26
32
59

(...)
@acordes wrote:

Imagine that I am only interested in the _type_=4 which could be the the Crossing of Make and Gear Box and in its line might apper the statistics for 23 members that fall into this group.

 

No, _type_=4 (=power of 2) represents always values of a single CLASS variable.

Reeza
Super User
Are you just looking for the number of observations used in the calculation of the statistics? If so, the N statistic will do that. Unlike other procs, Proc Means does not exclude row wise, it excludes per variable.
acordes
Rhodochrosite | Level 12

Thanks to all for the responses.

It's my fault to not have explained well what I want.

Let's suppose that I read 100 observations into PROC Means. I use several class variables which will lead to different _type_ analysis in the output. I.e. without using nway option or restricting the _type_, the 0 level will have the statistics for all 100 variables.

Imagine that I am only interested in the _type_=4 which could be the the Crossing of Make and Gear Box and in its line might apper the statistics for 23 members that fall into this group.

I want to unveil the members of the group, which 23 observations participated in the row statistics. 

Something similar to the option in proc sgplot:

 

proc sgplot noautolegend nocycleattrs;
   ods output sgplot=sg;
acordes
Rhodochrosite | Level 12
@freelancereinhard super elegant solution.
I take notice of all the nice techniques you used. The only thing I don't understand is in regard to the data step. how do you control to get 100 obs and how do you ensure that no ID gets duplicated?
FreelanceReinh
Jade | Level 19

Thanks @acordes,

 

The number of iterations of the innermost DO loop is a random number. In addition to a generic ("unsuspicious") seed value for the CALL STREAMINIT routine I used the CALL STREAM routine in order to specify a random-number stream. I had tested the arguments 1, 2, ..., 1000 for CALL STREAM and selected one (268) that happened to yield exactly 100 observations, as desired (knowing that this might look somewhat mysterious to the attentive reader). 🙂

 

The reason that the IDs are free of duplicates is essentially the number-theoretic fact that the multiplicative order of 67 modulo 101 is 100. Again, I had tried a few candidates and picked 67. Multiplying the powers of 67 by 5 (modulo 101) doesn't create duplicates either because 101 is a prime number and hence the ring of integers modulo 101 is a field (a fact that I knew from algebra).

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 2437 views
  • 2 likes
  • 4 in conversation