BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Elvin
Calcite | Level 5

Hi, I have the follow question referring to the attached dataset. Basically, I want to normalize variables (Xtest_1 to Xtest8). using their respective (mean1 to mean8) and (std1-std8).

I know I can use IML for vector manipulation, but if I choose to use dataset to do this. How to proceed?

Question1:

I manage to "gather" the (mean1 to mean8) and (std1-std8) with variables (Xtest_1 to Xtest8) under the same dataset, see attached file.

I tried to use following code to populate Normalize array, but it produce only 1 obs, because (mean1-mean8) and (std1-std8) are missing for the rest of obs.

Maybe: how can I declare them constants such I can refer to them as SAS sweeps through the dataset?

Question2:

Or, take one-step back before "gather" the statistics, how can I use the means(mean1-mean8) and standard deviation(std1-std8) produced by PROC MEANS. I still cannot figure out how to "apply" the output statistics from PROC MEANS on other dataset.

Thanks

----------------------------------------------------------------------------------------------------------------

DATA New_ex;

SET ex;

RETAIN     mean1-mean8;

RETAIN  std1-std8;

ARRAY MeanStat[8] mean1-mean8;

ARRAY StdStat[8] std1-std8;

    ARRAY X

  • Xtest_1-Xtest_8;
  •     ARRAY Normal[8] Normal1-Normal8 ;

        DO i = 1 to 8;

            Normal=(X-MeanStat)/StdStat;

       

        END;

        OUTPUT;

    RUN;

    1 ACCEPTED SOLUTION

    Accepted Solutions
    Elvin
    Calcite | Level 5

    Just found out a much easier way to standardize (xtest_1-xtest_8), using PROC STDIZE.


    Given:

    1) The statistics means(mean1-mean8) and standard deviation(std1-std8)  [produced by output dataset of PROC MEANS] reside in  'one-observation' dataset.

    2) variable Xtest1_Xtest_8 resides in 'many-observation' dataset

    I can standardized variables (Xtest1_Xtest_8) according to their mean and std using:

    ----------------------------------------------------------------------

    PROC STDIZE DATA="many-observation dataset" OUT=Want PSTAT METHOD=IN("one-observation datase");

          VAR  Xtest_1-Xtest_8;

          LOCATION  mean1-mean8 ;

          SCALE std1-std8 ;

    RUN;

    View solution in original post

    9 REPLIES 9
    Reeza
    Super User

    There's a proc for that:

    PROC STANDARD

    OR

    PROC STDIZE

    Elvin
    Calcite | Level 5

    I am aware of the PROC STANDARD, wihich takes only 1 mu and 1 sigma a time, applying on the many-observation dataset.

    Here my situation is: the variables (Xtest_1-Xtest_8), all residing in one dataset, each should be normalized by their respective mean and std. PROC STANDARD won't work, unless I split the dataset into 8 individual dataset. (ie, dataset1 has only Xtest_1, dataset2 has only Xtest_2, ...etc) , then use PROC STANDARD on each of them.

    Reeza
    Super User

    Proc STDIZE will though and calculates the mean/std as well Smiley Happy

    proc stdize data=sashelp.class out=check;

    var weight height age;

    run;

    Astounding
    PROC Star

    As Reeza notes, you may find it easier to use an existing procedure for this particular problem.  Just for the record, though, there is an easy DATA step technique to combine a one-observation data set (your means and standard deviations) with a many-observation data set (your original line-by-line values):

    data want;

       if _n_=1 then set one_observation;

       set many_observations;

       *** array processing, no retain needed;

    run;

    Variables that come in from a SAS data set are automatically retained.  The trick is to keep the DATA step going instead of having it end prematurely.  That's why there's an IF/THEN statement.  Good luck.

    stat_sas
    Ammonite | Level 13

    Hi,

    This will populate missing values in ex dataset to have mean and standard deviation for every observation.

    proc stdize data=imp.ex out=want reponly method=median;
    var m: s: ;
    run;

    proc print data=want;
    run;

    Elvin
    Calcite | Level 5

    Thanks, I was looking to fill the missing value before, using an ad-hoc approach. This makes it easier.

    Elvin
    Calcite | Level 5

    Just found out a much easier way to standardize (xtest_1-xtest_8), using PROC STDIZE.


    Given:

    1) The statistics means(mean1-mean8) and standard deviation(std1-std8)  [produced by output dataset of PROC MEANS] reside in  'one-observation' dataset.

    2) variable Xtest1_Xtest_8 resides in 'many-observation' dataset

    I can standardized variables (Xtest1_Xtest_8) according to their mean and std using:

    ----------------------------------------------------------------------

    PROC STDIZE DATA="many-observation dataset" OUT=Want PSTAT METHOD=IN("one-observation datase");

          VAR  Xtest_1-Xtest_8;

          LOCATION  mean1-mean8 ;

          SCALE std1-std8 ;

    RUN;

    Reeza
    Super User

    The default method is STD, which automatically sets the location to be the mean and scale to be the standard deviation.

    Don't work too hard Smiley Happy

    This should give you the same results.

    PROC STDIZE DATA="many-observation dataset" OUT=Want PSTAT ;

          VAR  Xtest_1-Xtest_8;

    RUN;

    Elvin
    Calcite | Level 5

    I see your point. But, in my case, means(mean1-mean8) and standard deviation(std1-std8) are not generated from (Xtest_1-Xtest_8). They are computed based on other dataset.

    Thanks

    SAS Innovate 2025: Save the Date

     SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

    Save the date!

    What is Bayesian Analysis?

    Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

    Find more tutorials on the SAS Users YouTube channel.

    SAS Training: Just a Click Away

     Ready to level-up your skills? Choose your own adventure.

    Browse our catalog!

    Discussion stats
    • 9 replies
    • 1478 views
    • 7 likes
    • 4 in conversation