I am trying to allocate a no-response category to other categories for a age variable. The frequencies of age shown below
Table 1. age_2 | ||||
age_2 | Frequency | Percent | Cumulative | Cumulative |
1. 18-24 | 150 | 7.43 | 150 | 7.43 |
2. 25-34 | 301 | 14.91 | 451 | 22.34 |
3. 35-44 | 308 | 15.26 | 759 | 37.59 |
4. 45-54 | 400 | 19.81 | 1159 | 57.40 |
5. 55-64 | 426 | 21.10 | 1585 | 78.50 |
6. 65 and up | 416 | 20.60 | 2001 | 99.11 |
9. no respond (NB) | 18 | 0.89 | 2019 | 100.00 |
I used following code to redistribute no-response to other 1 to 6 categories in the data step (result shown in Table 2). From the table (Table 1) above, one can see that there are more older people (45 years old+) in the data set. Is there a method I can use to distribute more cases of “NB” to the age group 4 to 6? Or there is method that I can use to allocate the cases of NB to other categories (1-6) according the distribution of the age group (see Table 1)?
Data set1;
Set set1
call streaminit(0);
if age_2=9 then age_2=ceil(ranuni(0)*5+1);
run;
Table 2. age_2 | ||||
age_2 | Frequency | Percent | Cumulative | Cumulative |
1. 18-24 | 150 | 7.43 | 150 | 7.43 |
2. 25-34 | 305 | 15.11 | 455 | 22.54 |
3. 35-44 | 312 | 15.45 | 767 | 37.99 |
4. 45-54 | 404 | 20.01 | 1171 | 58.00 |
5. 55-64 | 431 | 21.35 | 1602 | 79.35 |
6. 65 and up | 417 | 20.65 | 2019 | 100.00 |
You can use the RAND functions with the 'TABLE' option
Example:
if age_2=9 then age_2=rand('table',0.0743,0.1491,0.1526,0.1981,0.2110,0.2060);
@Reeza wrote:
Run the table without the NB category and get your percentages.
Excellent point!
Make sure to set the streaminit variable with a constant, otherwise everytime you run the code you may get a slightly different answer.
Hello @ting1,
You've already got valuable suggestions from PaigeMiller and Reeza. Here's another approach:
data want;
call streaminit(27182818);
set have;
if age_2=9 then do until(age_2 ne 9);
p=rand('integer',n);
set have(keep=age_2) nobs=n point=p;
end;
run;
The RAND 'INTEGER' function with the number n of observations in your original dataset (I called it "HAVE") as the parameter returns a random observation number (uniformly distributed on {1, 2, ..., n}). The AGE_2 value of the corresponding observation is then read by the second SET statement and overwrites the original value AGE_2=9. If the random observation happened to have AGE_2=9 as well (probability only 0.0089 according to your frequency table), the DO UNTIL loop starts another random draw and so on, until an admissible AGE_2 value is found. Thus, the probability distribution of the randomly selected AGE_2 values matches the distribution of AGE_2 values in have(where=(age_2 ne 9)), as desired. (Edit: More precisely, it's the multinomial distribution Mult(m, p1, ..., p6) with parameters m=18 and pi the relative frequency of AGE_2 category i in that WHERE-restricted HAVE dataset, i=1, ..., 6.)
Note that your code
Data set1;
Set set1
call streaminit(0);
if age_2=9 then age_2=ceil(ranuni(0)*5+1);
run;
is not quite correct:
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.