Solved: Re: Retain values for evalution throughout the dataset

Elvin · Posted 08-11-2014 02:39 PM

Hi, I have the follow question referring to the attached dataset. Basically, I want to normalize variables (Xtest_1 to Xtest8). using their respective (mean1 to mean8) and (std1-std8).

I know I can use IML for vector manipulation, but if I choose to use dataset to do this. How to proceed?

Question1:

I manage to "gather" the (mean1 to mean8) and (std1-std8) with variables (Xtest_1 to Xtest8) under the same dataset, see attached file.

I tried to use following code to populate Normalize array, but it produce only 1 obs, because (mean1-mean8) and (std1-std8) are missing for the rest of obs.

Maybe: how can I declare them constants such I can refer to them as SAS sweeps through the dataset?

Question2:

Or, take one-step back before "gather" the statistics, how can I use the means(mean1-mean8) and standard deviation(std1-std8) produced by PROC MEANS. I still cannot figure out how to "apply" the output statistics from PROC MEANS on other dataset.

Thanks

----------------------------------------------------------------------------------------------------------------

DATA New_ex;

SET ex;

RETAIN mean1-mean8;

RETAIN std1-std8;

ARRAY MeanStat[8] mean1-mean8;

ARRAY StdStat[8] std1-std8;

ARRAY X

Xtest_1-Xtest_8;

ARRAY Normal[8] Normal1-Normal8 ;

DO i = 1 to 8;

Normal=(X-MeanStat)/StdStat;

END;

OUTPUT;

RUN;

Elvin · Posted 08-12-2014 03:01 PM

Just found out a much easier way to standardize (xtest_1-xtest_8), using PROC STDIZE.

Given:

1) The statistics means(mean1-mean8) and standard deviation(std1-std8) [produced by output dataset of PROC MEANS] reside in 'one-observation' dataset.

2) variable Xtest1_Xtest_8 resides in 'many-observation' dataset

I can standardized variables (Xtest1_Xtest_8) according to their mean and std using:

----------------------------------------------------------------------

PROC STDIZE DATA="many-observation dataset" OUT=Want PSTAT METHOD=IN("one-observation datase");

VAR Xtest_1-Xtest_8;

LOCATION mean1-mean8 ;

SCALE std1-std8 ;

RUN;

View solution in original post

Reeza · Posted 08-11-2014 02:44 PM

There's a proc for that:

PROC STANDARD

OR

PROC STDIZE

Elvin · Posted 08-11-2014 06:19 PM

I am aware of the PROC STANDARD, wihich takes only 1 mu and 1 sigma a time, applying on the many-observation dataset.

Here my situation is: the variables (Xtest_1-Xtest_8), all residing in one dataset, each should be normalized by their respective mean and std. PROC STANDARD won't work, unless I split the dataset into 8 individual dataset. (ie, dataset1 has only Xtest_1, dataset2 has only Xtest_2, ...etc) , then use PROC STANDARD on each of them.

Reeza · Posted 08-11-2014 06:46 PM

Proc STDIZE will though and calculates the mean/std as well

proc stdize data=sashelp.class out=check;

var weight height age;

run;

Astounding · Posted 08-11-2014 02:52 PM

As Reeza notes, you may find it easier to use an existing procedure for this particular problem. Just for the record, though, there is an easy DATA step technique to combine a one-observation data set (your means and standard deviations) with a many-observation data set (your original line-by-line values):

data want;

if _n_=1 then set one_observation;

set many_observations;

*** array processing, no retain needed;

run;

Variables that come in from a SAS data set are automatically retained. The trick is to keep the DATA step going instead of having it end prematurely. That's why there's an IF/THEN statement. Good luck.

stat_sas · Posted 08-11-2014 06:11 PM

Hi,

This will populate missing values in ex dataset to have mean and standard deviation for every observation.

proc stdize data=imp.ex out=want reponly method=median;
var m: s: ;
run;

proc print data=want;
run;

Elvin · Posted 08-11-2014 06:35 PM

Thanks, I was looking to fill the missing value before, using an ad-hoc approach. This makes it easier.

Elvin · Posted 08-12-2014 03:01 PM

Just found out a much easier way to standardize (xtest_1-xtest_8), using PROC STDIZE.

Given:

1) The statistics means(mean1-mean8) and standard deviation(std1-std8) [produced by output dataset of PROC MEANS] reside in 'one-observation' dataset.

2) variable Xtest1_Xtest_8 resides in 'many-observation' dataset

I can standardized variables (Xtest1_Xtest_8) according to their mean and std using:

----------------------------------------------------------------------

PROC STDIZE DATA="many-observation dataset" OUT=Want PSTAT METHOD=IN("one-observation datase");

VAR Xtest_1-Xtest_8;

LOCATION mean1-mean8 ;

SCALE std1-std8 ;

RUN;

Reeza · Posted 08-12-2014 03:12 PM

The default method is STD, which automatically sets the location to be the mean and scale to be the standard deviation.

Don't work too hard

This should give you the same results.

PROC STDIZE DATA="many-observation dataset" OUT=Want PSTAT ;

VAR Xtest_1-Xtest_8;

RUN;

Elvin · Posted 08-12-2014 06:56 PM

I see your point. But, in my case, means(mean1-mean8) and standard deviation(std1-std8) are not generated from (Xtest_1-Xtest_8). They are computed based on other dataset.

Thanks

Registration is open

SAS Training: Just a Click Away