Hi, I have the follow question referring to the attached dataset. Basically, I want to normalize variables (Xtest_1 to Xtest8). using their respective (mean1 to mean8) and (std1-std8).
I know I can use IML for vector manipulation, but if I choose to use dataset to do this. How to proceed?
Question1:
I manage to "gather" the (mean1 to mean8) and (std1-std8) with variables (Xtest_1 to Xtest8) under the same dataset, see attached file.
I tried to use following code to populate Normalize array, but it produce only 1 obs, because (mean1-mean8) and (std1-std8) are missing for the rest of obs.
Maybe: how can I declare them constants such I can refer to them as SAS sweeps through the dataset?
Question2:
Or, take one-step back before "gather" the statistics, how can I use the means(mean1-mean8) and standard deviation(std1-std8) produced by PROC MEANS. I still cannot figure out how to "apply" the output statistics from PROC MEANS on other dataset.
Thanks
----------------------------------------------------------------------------------------------------------------
DATA New_ex;
SET ex;
RETAIN mean1-mean8;
RETAIN std1-std8;
ARRAY MeanStat[8] mean1-mean8;
ARRAY StdStat[8] std1-std8;
ARRAY X
ARRAY Normal[8] Normal1-Normal8 ;
DO i = 1 to 8;
Normal=(X-MeanStat)/StdStat;
END;
OUTPUT;
RUN;
Just found out a much easier way to standardize (xtest_1-xtest_8), using PROC STDIZE.
Given:
1) The statistics means(mean1-mean8) and standard deviation(std1-std8) [produced by output dataset of PROC MEANS] reside in 'one-observation' dataset.
2) variable Xtest1_Xtest_8 resides in 'many-observation' dataset
I can standardized variables (Xtest1_Xtest_8) according to their mean and std using:
----------------------------------------------------------------------
PROC STDIZE DATA="many-observation dataset" OUT=Want PSTAT METHOD=IN("one-observation datase");
VAR Xtest_1-Xtest_8;
LOCATION mean1-mean8 ;
SCALE std1-std8 ;
RUN;
There's a proc for that:
PROC STANDARD
OR
PROC STDIZE
I am aware of the PROC STANDARD, wihich takes only 1 mu and 1 sigma a time, applying on the many-observation dataset.
Here my situation is: the variables (Xtest_1-Xtest_8), all residing in one dataset, each should be normalized by their respective mean and std. PROC STANDARD won't work, unless I split the dataset into 8 individual dataset. (ie, dataset1 has only Xtest_1, dataset2 has only Xtest_2, ...etc) , then use PROC STANDARD on each of them.
Proc STDIZE will though and calculates the mean/std as well ![]()
proc stdize data=sashelp.class out=check;
var weight height age;
run;
As Reeza notes, you may find it easier to use an existing procedure for this particular problem. Just for the record, though, there is an easy DATA step technique to combine a one-observation data set (your means and standard deviations) with a many-observation data set (your original line-by-line values):
data want;
if _n_=1 then set one_observation;
set many_observations;
*** array processing, no retain needed;
run;
Variables that come in from a SAS data set are automatically retained. The trick is to keep the DATA step going instead of having it end prematurely. That's why there's an IF/THEN statement. Good luck.
Hi,
This will populate missing values in ex dataset to have mean and standard deviation for every observation.
proc stdize data=imp.ex out=want reponly method=median;
var m: s: ;
run;
proc print data=want;
run;
Thanks, I was looking to fill the missing value before, using an ad-hoc approach. This makes it easier.
Just found out a much easier way to standardize (xtest_1-xtest_8), using PROC STDIZE.
Given:
1) The statistics means(mean1-mean8) and standard deviation(std1-std8) [produced by output dataset of PROC MEANS] reside in 'one-observation' dataset.
2) variable Xtest1_Xtest_8 resides in 'many-observation' dataset
I can standardized variables (Xtest1_Xtest_8) according to their mean and std using:
----------------------------------------------------------------------
PROC STDIZE DATA="many-observation dataset" OUT=Want PSTAT METHOD=IN("one-observation datase");
VAR Xtest_1-Xtest_8;
LOCATION mean1-mean8 ;
SCALE std1-std8 ;
RUN;
The default method is STD, which automatically sets the location to be the mean and scale to be the standard deviation.
Don't work too hard ![]()
This should give you the same results.
PROC STDIZE DATA="many-observation dataset" OUT=Want PSTAT ;
VAR Xtest_1-Xtest_8;
RUN;
I see your point. But, in my case, means(mean1-mean8) and standard deviation(std1-std8) are not generated from (Xtest_1-Xtest_8). They are computed based on other dataset.
Thanks
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.