Re: re-allocate a category to other categories

ting1 · Posted 01-09-2023 01:30 PM

I am trying to allocate a no-response category to other categories for a age variable. The frequencies of age shown below

Table 1. age_2
age_2	Frequency	Percent	Cumulative Frequency	Cumulative Percent
1. 18-24	150	7.43	150	7.43
2. 25-34	301	14.91	451	22.34
3. 35-44	308	15.26	759	37.59
4. 45-54	400	19.81	1159	57.40
5. 55-64	426	21.10	1585	78.50
6. 65 and up	416	20.60	2001	99.11
9. no respond (NB)	18	0.89	2019	100.00

I used following code to redistribute no-response to other 1 to 6 categories in the data step (result shown in Table 2). From the table (Table 1) above, one can see that there are more older people (45 years old+) in the data set. Is there a method I can use to distribute more cases of “NB” to the age group 4 to 6? Or there is method that I can use to allocate the cases of NB to other categories (1-6) according the distribution of the age group (see Table 1)?

Data set1;

Set set1

call streaminit(0);

if age_2=9 then age_2=ceil(ranuni(0)*5+1);

run;

Table 2. age_2
age_2	Frequency	Percent	Cumulative Frequency	Cumulative Percent
1. 18-24	150	7.43	150	7.43
2. 25-34	305	15.11	455	22.54
3. 35-44	312	15.45	767	37.99
4. 45-54	404	20.01	1171	58.00
5. 55-64	431	21.35	1602	79.35
6. 65 and up	417	20.65	2019	100.00

PaigeMiller · Posted 01-09-2023 03:35 PM

You can use the RAND functions with the 'TABLE' option

Example:

if age_2=9 then age_2=rand('table',0.0743,0.1491,0.1526,0.1981,0.2110,0.2060);

--
Paige Miller

ting1 · Posted 01-09-2023 03:59 PM

Thanks for your help!

Reeza · Posted 01-09-2023 03:39 PM

Run the table without the NB category and get your percentages. Multiply those percentages by the total number INCLUDING the NB to get your new adjusted numbers proportionally.

Or do you want to randomly adjust the categories? Random rounding is a bit harder...

PaigeMiller · Posted 01-09-2023 03:49 PM

@Reeza wrote:
Run the table without the NB category and get your percentages.

Excellent point!

--
Paige Miller

ting1 · Posted 01-09-2023 03:59 PM

Thanks for the solution.

Reeza · Posted 01-09-2023 04:32 PM

Make sure to set the streaminit variable with a constant, otherwise everytime you run the code you may get a slightly different answer.

ting1 · Posted 01-13-2023 07:51 AM

Thanks!

FreelanceReinh · Posted 01-10-2023 06:53 AM

Hello @ting1,

You've already got valuable suggestions from PaigeMiller and Reeza. Here's another approach:

data want;
call streaminit(27182818);
set have;
if age_2=9 then do until(age_2 ne 9);
  p=rand('integer',n);
  set have(keep=age_2) nobs=n point=p;
end;
run;

The RAND 'INTEGER' function with the number n of observations in your original dataset (I called it "HAVE") as the parameter returns a random observation number (uniformly distributed on {1, 2, ..., n}). The AGE_2 value of the corresponding observation is then read by the second SET statement and overwrites the original value AGE_2=9. If the random observation happened to have AGE_2=9 as well (probability only 0.0089 according to your frequency table), the DO UNTIL loop starts another random draw and so on, until an admissible AGE_2 value is found. Thus, the probability distribution of the randomly selected AGE_2 values matches the distribution of AGE_2 values in have(where=(age_2 ne 9)), as desired. (Edit: More precisely, it's the multinomial distribution Mult(m, p₁, ..., p₆) with parameters m=18 and p_i the relative frequency of AGE_2 category i in that WHERE-restricted HAVE dataset, i=1, ..., 6.)

Note that your code

Data set1;

Set set1

call streaminit(0);

if age_2=9 then age_2=ceil(ranuni(0)*5+1);

run;

is not quite correct:

The pseudorandom values are uniformly distributed on {2, 3, 4, 5, 6}, thus neglecting the first age category. Since ranuni(0)>1E-10, the CEIL function would never return the value 1.
The STREAMINIT call routine has no impact on the values created by the (deprecated!) RANUNI function. So, better use the RAND function, whose values depend on the initialization by CALL STREAMINIT.
Missing semicolon after the SET statement (particularly risky because you're overwriting dataset SET1).

Registration is open

SAS Training: Just a Click Away