Solved: Using data from Proc Means in a different data set

Drgoldstein · Posted 12-23-2015 04:13 PM

Hi -- this is my first post. I would like to run PROC Means on Dataset 1, and create a set with the output variables. I know how to do this. But then I want to use the output variables in a DIFFERENT dataset (dataset2).

So, for example, suppose I wrote:

proc univariate data=rankgroups noprint;

var weispread weihp2;

output out=_winsor pctlpts=1 99 pctlpre=weispreada weishp2a;

run;

and I got my values for weispreada1 weispreada99 weishp2a1 and weishp2a99 in output dataset _winsor, which is basically a one row, four column dataset.

Now lets say that I have some other dataset, let's call it LARGEG. Lets say I have in it a variety of variables, one of which is GVAR. I would like to do an operation on GVAR if GVAR is bigger than weispreada1. While inelegant, if I could only create a variable for each record in LARGEG that is equal to weispreada1 for every record, I would be all set. But if I merge, I only get it for the first record.

Data TEST;

Merge LARGEG _winsor;

run;

would only attach the values weispreada1 weispreada99 weishp2a1 and weishp2a99 in _winsor for the FIRST observation in TEST. For all the other rows of TEST, the variables weispreada1 weispreada99 weishp2a1 and weishp2a99 come in as missing values. I would like them to be the same in each column, just repeating.

If I could do that, then I would write this if I wanted to flag when GVAR is bigger than the first percentile of the dataset RANKGROUPS from above:

Data OTHER;

set TEST;

if GVAR > weispreada1 then junkvar=1;

run;

Even if that is a stupid way to do it, I would like to know how to do it, particularly if there is a way without using arrays. It just seems like I should know how to merge a 1XQ matrix onto each record of an NxM matrix. If I had a BY variable, I think I could do it, but I want this for all variables.

Of course, I could just run Proc Means, print out the output, look at the values of the percentiles or the values of the variable weispreada1, and (pretend it was 7.2345) just do the following:

Data OLDSCHOOL;

set LARGEG;

If GVAR>7.2345 then junkvar=1;

run;

but that is very inelegant and really I think the computer should be able to keep that variable in its memory and use it without my printing it out (I know it can and does, but I just seem to have forgotten how to access it when it is in another dataset).

Basically, I have two questions:

(1) How do I merge a 1x4 matrix (such as that which is the output of PROC MEANS or other such PROCs) with an NxM matrix in such a way that I get the NxM matrix with the 1x4 matrix added to each row? and

(2) OK, that is a very wasteful way to do this -- what if I just want to use the values of the variables in the 1x4 matrix (the output of PROC means, for example) in a different data set as part of IF statements or values or such?

Thanks!

Astounding · Posted 12-23-2015 04:17 PM

The DATA step lets you do this fairly easily. Taking some liberty with the names of your data sets:

data want;

if _n_=1 then set onerow;

set huge_dataset;

run;

All the variables in ONEROW will in effect be made available to every observation from HUGE_DATASET.

View solution in original post

Astounding · Posted 12-23-2015 04:17 PM

The DATA step lets you do this fairly easily. Taking some liberty with the names of your data sets:

data want;

if _n_=1 then set onerow;

set huge_dataset;

run;

All the variables in ONEROW will in effect be made available to every observation from HUGE_DATASET.

Drgoldstein · Posted 12-23-2015 04:21 PM

WOW! An answer so fast! I will try it right now! Thanks so much!

Reeza · Posted 12-23-2015 04:23 PM

There's a couple of different ways. First, you're thinking of your data in linear algebra terms. That's not how SAS operates, think of it as going line by line through a data set. If you want matrix type language you can look into IML.

1. Merge in using if _n_=1

2. Use SQL Cross Join

3. Calculate witin PROC SQL directly - not available for all stats, specifically median before SAS 9.4

proc means data=sashelp.class noprint nway;
var weight;
output out=stats mean(weight)=avg_weight_by_sex;
run;

1. Merge
data want1;
if _n_=1 then set stats;
set sashelp.class;
run;

2. SQL Cross Join
proc sql;
create table want2 as
select a.*, b.*
from sashelp.class as a
cross join stats as b
order by 1, 2;
quit;

3. Calculate within PROC SQL - not always an option
proc sql;
create table want3 as
select a.*, mean(weight) as average_weight
from sashelp.class as a
order by 1,2;
quit;

Drgoldstein · Posted 12-23-2015 04:30 PM

data want;

if _n_=1 then set onerow;

set huge_dataset;

run;

Thanks! That worked great!

Drgoldstein · Posted 12-23-2015 04:34 PM

Also, what is cool is that if I reverse the order to put the if statement after the Set Huge_dataset then the data in onerow comes at the last set of variables in the output instead of the first set. So this puts onerow at the end instead of the beginning:

data want;
set huge_dataset;
if _n_=1 then set onerow;
run;

Thanks again!

Using data from Proc Means in a different data set

Re: Using data from Proc Means in a different data set

Re: Using data from Proc Means in a different data set

Re: Using data from Proc Means in a different data set

Re: Using data from Proc Means in a different data set

Re: Using data from Proc Means in a different data set

Re: Using data from Proc Means in a different data set

Using data from Proc Means in a different data set

Re: Using data from Proc Means in a different data set

Re: Using data from Proc Means in a different data set

Re: Using data from Proc Means in a different data set

Re: Using data from Proc Means in a different data set

Re: Using data from Proc Means in a different data set

Re: Using data from Proc Means in a different data set

Registration is open

SAS Training: Just a Click Away