Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Re: Standardizing a variable and specifying a minimum and maximum

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 10-08-2018 04:24 PM
(1788 views)

I have some data that I'd like to standardize but can i also set a minimum and/or maximum value? Let's say in my example below that i wanted the minimum to be at least 0.25, is that possible?

```
proc standard data =sashelp.Baseball out=Baseball_Normalized
mean=0 std=1;
var CrAtBat;
run;
```

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi @SAShole,

There is no *linear* transformation that satisfies all three conditions. So, if the condition "transformed values sum to the number of observations" is mandatory, you can obtain *either* a minimum of .25 *or* a maximum of 4:

```
data trans1;
set have;
a=0.75/1321.18;
b=1-1333.18*a;
HH_Weight_t=a*HH_Weight+b;
run;
proc means data=trans1 sum min max;
var HH_Weight_t;
run;
```

Result:

Sum Minimum Maximum -------------------------------------------- 50.0000000 0.2500000 1.6538587 --------------------------------------------

Same with a=3/1151.82 (and again b=1-1333.18*a):

Sum Minimum Maximum -------------------------------------------- 50.0000000 -2.4411106 4.0000000 --------------------------------------------

Or a compromise between these two (using least squares)? (a=0.0014472809)

Sum Minimum Maximum -------------------------------------------- 50.0000000 -0.9121186 2.6670071 --------------------------------------------

However, there are *non-linear* transformations which satisfy all three conditions. Would this make sense? Unfortunately, there is no suitable *quadratic* transformation (t(x)=ax²+bx+c) -- in spite of three free parameters. Reason: It would not be *monotone*, hence there would be no unique inverse transformation!

But how about an *exponential* transformation (t(x)=a exp(bx)+c), which is of course monotone?

```
data trans;
retain a 0.01556848794537
b 2.20865846733238e-3
c 0.2340133696564;
set have;
HH_Weight_t=a*exp(b*HH_Weight)+c;
run;
```

Result of PROC MEANS:

Sum Minimum Maximum --------------------------------------------50.0000000 0.2500000 4.0000000--------------------------------------------

6 REPLIES 6

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

PROC **STDIZE **may have a method that will suite you.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@SAShole wrote:

I have some data that I'd like to standardize but can i also set a minimum and/or maximum value? Let's say in my example below that i wanted the minimum to be at least 0.25, is that possible?

`proc standard data =sashelp.Baseball out=Baseball_Normalized mean=0 std=1; var CrAtBat; run;`

While I think I understand the request when you want the MAXIMUM to be 0.25, and there is the MAXABS method of PROC STDIZE (which is a form of standardizing, but it might not have mean zero and would not have a variance of 1).

However, you specified you want the MINIMUM to be 0.25, and that doesn't make any mathematical sense to me. Minimum of what? Can you explain further, and even provide a small example?

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I have survey weights that I need to transform such that they sum to the number of observations and have a minimum of .25 and a maximum of 4 (if possible). Here is a small example of my dataset:

```
data Have;
input HH_Weight 8.;
datalines;
213
2161
973
1870
1335
2087
2002
312
2076
2226
484
669
1190
972
711
1874
1773
2291
2213
1582
1584
1581
2182
2329
382
158
1567
183
1171
2485
12
1283
1008
1283
1631
1206
2399
871
1892
1475
876
1373
2357
285
971
229
545
1800
1137
1390
run;
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@SAShole wrote:

I have survey weights that I need to transform such that they sum to the number of observations and have a minimum of .25 and a maximum of 4 (if possible). Here is a small example of my dataset:

`data Have; input HH_Weight 8.; datalines; 213 2161 973 1870 1335 2087 2002 312 2076 2226 484 669 1190 972 711 1874 1773 2291 2213 1582 1584 1581 2182 2329 382 158 1567 183 1171 2485 12 1283 1008 1283 1631 1206 2399 871 1892 1475 876 1373 2357 285 971 229 545 1800 1137 1390 run;`

Okay, this isn't really standardizing, is it? And I'm not sure you can get all of the conditions you specify to hold at the same time, min of 0.25, max of 4 and sum to the number of observations. Has this actually been done in other applications, getting all three of the conditions to hold at the same time, or is it something you have decided you need for this application?

The part where you want them to sum to the total number of observations is relatively easy. I am skeptical that you can then take the result and set the min to 0.25 and max to 4 and get the same sum.

To get them to sum to the number of observations, count the number of observatrions (let's call this N) and find the sum of all observations (let's call this SUM), and then each value would be multiplied by (N/SUM). After that, I don't see a path forward.

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi @SAShole,

There is no *linear* transformation that satisfies all three conditions. So, if the condition "transformed values sum to the number of observations" is mandatory, you can obtain *either* a minimum of .25 *or* a maximum of 4:

```
data trans1;
set have;
a=0.75/1321.18;
b=1-1333.18*a;
HH_Weight_t=a*HH_Weight+b;
run;
proc means data=trans1 sum min max;
var HH_Weight_t;
run;
```

Result:

Sum Minimum Maximum -------------------------------------------- 50.0000000 0.2500000 1.6538587 --------------------------------------------

Same with a=3/1151.82 (and again b=1-1333.18*a):

Sum Minimum Maximum -------------------------------------------- 50.0000000 -2.4411106 4.0000000 --------------------------------------------

Or a compromise between these two (using least squares)? (a=0.0014472809)

Sum Minimum Maximum -------------------------------------------- 50.0000000 -0.9121186 2.6670071 --------------------------------------------

However, there are *non-linear* transformations which satisfy all three conditions. Would this make sense? Unfortunately, there is no suitable *quadratic* transformation (t(x)=ax²+bx+c) -- in spite of three free parameters. Reason: It would not be *monotone*, hence there would be no unique inverse transformation!

But how about an *exponential* transformation (t(x)=a exp(bx)+c), which is of course monotone?

```
data trans;
retain a 0.01556848794537
b 2.20865846733238e-3
c 0.2340133696564;
set have;
HH_Weight_t=a*exp(b*HH_Weight)+c;
run;
```

Result of PROC MEANS:

Sum Minimum Maximum --------------------------------------------50.0000000 0.2500000 4.0000000--------------------------------------------

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@SAShole wrote:

I have some data that I'd like to standardize but can i also set a minimum and/or maximum value? Let's say in my example below that i wanted the minimum to be at least 0.25, is that possible?

`proc standard data =sashelp.Baseball out=Baseball_Normalized mean=0 std=1; var CrAtBat; run;`

I am wondering what you will do with the resulting values.

Are you looking for a sampling probability?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. **Registration is now open through August 30th**. Visit the SAS Hackathon homepage.

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.