BookmarkSubscribeRSS Feed
ting1
Fluorite | Level 6

I am trying to allocate a no-response category to other categories for a  age variable. The frequencies of age shown below

Table 1. age_2

age_2

Frequency

Percent

Cumulative
Frequency

Cumulative
Percent

1.     18-24

150

7.43

150

7.43

2.     25-34

301

14.91

451

22.34

3.     35-44

308

15.26

759

37.59

4.     45-54

400

19.81

1159

57.40

5.     55-64

426

21.10

1585

78.50

6.     65 and up

416

20.60

2001

99.11

9. no respond (NB)

18

0.89

2019

100.00

I used following code to redistribute no-response to other 1 to 6 categories in the data step (result shown in Table 2). From the table (Table 1) above, one can see that there are more older people (45 years old+) in the data set. Is there a method I can use to distribute more cases of “NB” to the age group 4 to 6?  Or there is method that I can use to allocate the cases of NB to other categories (1-6) according the distribution of the age group (see Table 1)?

 

Data set1;

Set set1  

call streaminit(0);

if age_2=9 then age_2=ceil(ranuni(0)*5+1);

run;

Table 2. age_2

age_2

Frequency

Percent

Cumulative
Frequency

Cumulative
Percent

1.     18-24

150

7.43

150

7.43

2.     25-34

305

15.11

455

22.54

3.     35-44

312

15.45

767

37.99

4.     45-54

404

20.01

1171

58.00

5.     55-64

431

21.35

1602

79.35

6.     65 and up

417

20.65

2019

100.00

 

8 REPLIES 8
PaigeMiller
Diamond | Level 26

You can use the RAND functions with the 'TABLE' option

 

Example:

 

if age_2=9 then age_2=rand('table',0.0743,0.1491,0.1526,0.1981,0.2110,0.2060);
--
Paige Miller
ting1
Fluorite | Level 6
Thanks for your help!
Reeza
Super User
Run the table without the NB category and get your percentages. Multiply those percentages by the total number INCLUDING the NB to get your new adjusted numbers proportionally.

Or do you want to randomly adjust the categories? Random rounding is a bit harder...
PaigeMiller
Diamond | Level 26

@Reeza wrote:
Run the table without the NB category and get your percentages.

Excellent point!

--
Paige Miller
ting1
Fluorite | Level 6
Thanks for the solution.
Reeza
Super User

Make sure to set the streaminit variable with a constant, otherwise everytime you run the code you may get a slightly different answer. 

 

 

ting1
Fluorite | Level 6
Thanks!
FreelanceReinh
Jade | Level 19

Hello @ting1,

 

You've already got valuable suggestions from PaigeMiller and Reeza. Here's another approach:

data want;
call streaminit(27182818);
set have;
if age_2=9 then do until(age_2 ne 9);
  p=rand('integer',n);
  set have(keep=age_2) nobs=n point=p;
end;
run;

The RAND 'INTEGER' function with the number n of observations in your original dataset (I called it "HAVE") as the parameter returns a random observation number (uniformly distributed on {1, 2, ..., n}). The AGE_2 value of the corresponding observation is then read by the second SET statement and overwrites the original value AGE_2=9. If the random observation happened to have AGE_2=9 as well (probability only 0.0089 according to your frequency table), the DO UNTIL loop starts another random draw and so on, until an admissible AGE_2 value is found. Thus, the probability distribution of the randomly selected AGE_2 values matches the distribution of AGE_2 values in have(where=(age_2 ne 9)), as desired. (Edit: More precisely, it's the multinomial distribution Mult(m, p1, ..., p6) with parameters m=18 and pi the relative frequency of AGE_2 category i in that WHERE-restricted HAVE dataset, i=1, ..., 6.) 

 

Note that your code


Data set1;

Set set1  

call streaminit(0);

if age_2=9 then age_2=ceil(ranuni(0)*5+1);

run;


is not quite correct:

  • The pseudorandom values are uniformly distributed on {2, 3, 4, 5, 6}, thus neglecting the first age category. Since ranuni(0)>1E-10, the CEIL function would never return the value 1.
  • The STREAMINIT call routine has no impact on the values created by the (deprecated!) RANUNI function. So, better use the RAND function, whose values depend on the initialization by CALL STREAMINIT.
  • Missing semicolon after the SET statement (particularly risky because you're overwriting dataset SET1).

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 1563 views
  • 2 likes
  • 4 in conversation