- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi, I have the follow question referring to the attached dataset. Basically, I want to normalize variables (Xtest_1 to Xtest8). using their respective (mean1 to mean8) and (std1-std8).
I know I can use IML for vector manipulation, but if I choose to use dataset to do this. How to proceed?
Question1:
I manage to "gather" the (mean1 to mean8) and (std1-std8) with variables (Xtest_1 to Xtest8) under the same dataset, see attached file.
I tried to use following code to populate Normalize array, but it produce only 1 obs, because (mean1-mean8) and (std1-std8) are missing for the rest of obs.
Maybe: how can I declare them constants such I can refer to them as SAS sweeps through the dataset?
Question2:
Or, take one-step back before "gather" the statistics, how can I use the means(mean1-mean8) and standard deviation(std1-std8) produced by PROC MEANS. I still cannot figure out how to "apply" the output statistics from PROC MEANS on other dataset.
Thanks
----------------------------------------------------------------------------------------------------------------
DATA New_ex;
SET ex;
RETAIN mean1-mean8;
RETAIN std1-std8;
ARRAY MeanStat[8] mean1-mean8;
ARRAY StdStat[8] std1-std8;
ARRAY X
ARRAY Normal[8] Normal1-Normal8 ;
DO i = 1 to 8;
Normal=(X-MeanStat)/StdStat;
END;
OUTPUT;
RUN;
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Just found out a much easier way to standardize (xtest_1-xtest_8), using PROC STDIZE.
Given:
1) The statistics means(mean1-mean8) and standard deviation(std1-std8) [produced by output dataset of PROC MEANS] reside in 'one-observation' dataset.
2) variable Xtest1_Xtest_8 resides in 'many-observation' dataset
I can standardized variables (Xtest1_Xtest_8) according to their mean and std using:
----------------------------------------------------------------------
PROC STDIZE DATA="many-observation dataset" OUT=Want PSTAT METHOD=IN("one-observation datase");
VAR Xtest_1-Xtest_8;
LOCATION mean1-mean8 ;
SCALE std1-std8 ;
RUN;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
There's a proc for that:
PROC STANDARD
OR
PROC STDIZE
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I am aware of the PROC STANDARD, wihich takes only 1 mu and 1 sigma a time, applying on the many-observation dataset.
Here my situation is: the variables (Xtest_1-Xtest_8), all residing in one dataset, each should be normalized by their respective mean and std. PROC STANDARD won't work, unless I split the dataset into 8 individual dataset. (ie, dataset1 has only Xtest_1, dataset2 has only Xtest_2, ...etc) , then use PROC STANDARD on each of them.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Proc STDIZE will though and calculates the mean/std as well
proc stdize data=sashelp.class out=check;
var weight height age;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
As Reeza notes, you may find it easier to use an existing procedure for this particular problem. Just for the record, though, there is an easy DATA step technique to combine a one-observation data set (your means and standard deviations) with a many-observation data set (your original line-by-line values):
data want;
if _n_=1 then set one_observation;
set many_observations;
*** array processing, no retain needed;
run;
Variables that come in from a SAS data set are automatically retained. The trick is to keep the DATA step going instead of having it end prematurely. That's why there's an IF/THEN statement. Good luck.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
This will populate missing values in ex dataset to have mean and standard deviation for every observation.
proc stdize data=imp.ex out=want reponly method=median;
var m: s: ;
run;
proc print data=want;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks, I was looking to fill the missing value before, using an ad-hoc approach. This makes it easier.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Just found out a much easier way to standardize (xtest_1-xtest_8), using PROC STDIZE.
Given:
1) The statistics means(mean1-mean8) and standard deviation(std1-std8) [produced by output dataset of PROC MEANS] reside in 'one-observation' dataset.
2) variable Xtest1_Xtest_8 resides in 'many-observation' dataset
I can standardized variables (Xtest1_Xtest_8) according to their mean and std using:
----------------------------------------------------------------------
PROC STDIZE DATA="many-observation dataset" OUT=Want PSTAT METHOD=IN("one-observation datase");
VAR Xtest_1-Xtest_8;
LOCATION mean1-mean8 ;
SCALE std1-std8 ;
RUN;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The default method is STD, which automatically sets the location to be the mean and scale to be the standard deviation.
Don't work too hard
This should give you the same results.
PROC STDIZE DATA="many-observation dataset" OUT=Want PSTAT ;
VAR Xtest_1-Xtest_8;
RUN;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I see your point. But, in my case, means(mean1-mean8) and standard deviation(std1-std8) are not generated from (Xtest_1-Xtest_8). They are computed based on other dataset.
Thanks