I am trying to allocate a no-response category to other categories for a age variable. The frequencies of age shown below
Table 1. age_2 | ||||
age_2 | Frequency | Percent | Cumulative | Cumulative |
1. 18-24 | 150 | 7.43 | 150 | 7.43 |
2. 25-34 | 301 | 14.91 | 451 | 22.34 |
3. 35-44 | 308 | 15.26 | 759 | 37.59 |
4. 45-54 | 400 | 19.81 | 1159 | 57.40 |
5. 55-64 | 426 | 21.10 | 1585 | 78.50 |
6. 65 and up | 416 | 20.60 | 2001 | 99.11 |
9. no respond (NB) | 18 | 0.89 | 2019 | 100.00 |
I used following code to redistribute no-response to other 1 to 6 categories in the data step (result shown in Table 2). From the table (Table 1) above, one can see that there are more older people (45 years old+) in the data set. Is there a method I can use to distribute more cases of “NB” to the age group 4 to 6? Or there is method that I can use to allocate the cases of NB to other categories (1-6) according the distribution of the age group (see Table 1)?
Data set1;
Set set1
call streaminit(0);
if age_2=9 then age_2=ceil(ranuni(0)*5+1);
run;
Table 2. age_2 | ||||
age_2 | Frequency | Percent | Cumulative | Cumulative |
1. 18-24 | 150 | 7.43 | 150 | 7.43 |
2. 25-34 | 305 | 15.11 | 455 | 22.54 |
3. 35-44 | 312 | 15.45 | 767 | 37.99 |
4. 45-54 | 404 | 20.01 | 1171 | 58.00 |
5. 55-64 | 431 | 21.35 | 1602 | 79.35 |
6. 65 and up | 417 | 20.65 | 2019 | 100.00 |
You can use the RAND functions with the 'TABLE' option
Example:
if age_2=9 then age_2=rand('table',0.0743,0.1491,0.1526,0.1981,0.2110,0.2060);
@Reeza wrote:
Run the table without the NB category and get your percentages.
Excellent point!
Make sure to set the streaminit variable with a constant, otherwise everytime you run the code you may get a slightly different answer.
Hello @ting1,
You've already got valuable suggestions from PaigeMiller and Reeza. Here's another approach:
data want;
call streaminit(27182818);
set have;
if age_2=9 then do until(age_2 ne 9);
p=rand('integer',n);
set have(keep=age_2) nobs=n point=p;
end;
run;
The RAND 'INTEGER' function with the number n of observations in your original dataset (I called it "HAVE") as the parameter returns a random observation number (uniformly distributed on {1, 2, ..., n}). The AGE_2 value of the corresponding observation is then read by the second SET statement and overwrites the original value AGE_2=9. If the random observation happened to have AGE_2=9 as well (probability only 0.0089 according to your frequency table), the DO UNTIL loop starts another random draw and so on, until an admissible AGE_2 value is found. Thus, the probability distribution of the randomly selected AGE_2 values matches the distribution of AGE_2 values in have(where=(age_2 ne 9)), as desired. (Edit: More precisely, it's the multinomial distribution Mult(m, p1, ..., p6) with parameters m=18 and pi the relative frequency of AGE_2 category i in that WHERE-restricted HAVE dataset, i=1, ..., 6.)
Note that your code
Data set1;
Set set1
call streaminit(0);
if age_2=9 then age_2=ceil(ranuni(0)*5+1);
run;
is not quite correct:
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.