BookmarkSubscribeRSS Feed
ting1
Fluorite | Level 6

I have a dataset from a survey. The question about gender is:

What is your gender?  

 

               1    Male          

               2    Female

               3    Transgender

     99    Prefer not to answer

 The frequency (cell-count) for option 3 (transgender ) is less than 10. I would like to assign value of option 3 (transgender) to option 1 or 2 randomly. Any one has suggestions on how to randomly assigning a small category to other categories?

Thanks for your help in advance!

9 REPLIES 9
PaigeMiller
Diamond | Level 26
data want;
    set have;
    if gender=3 then gender = rand('uniform')<0.5 + 1;
run;

 

 

I add that I am skeptical about this being a statistically valid thing to do, you should be concerned about that and entertain (perhaps) more valid ways to handle this, such as just leaving gender=3 out of your analysis. In the end, you (not anyone else) have to vouch for this being an acceptable method.

--
Paige Miller
ting1
Fluorite | Level 6
Thank you very much for the quick answer.



However, it looks like the method assign all the value of category 3 to category 1. Before running the code






QGENDER
QGENDER
Frequency
Percent
Cumulative
Frequency
Cumulative
Percent
1
1000
49.53
1000
49.53
2
985
48.79
1985
98.32
3
34
1.68
2019
100.00



After running the code:

data want;
set have;
if gender=3 then gender = rand('uniform')<0.5 + 1;
run;
QGENDER
QGENDER
Frequency
Percent
Cumulative
Frequency
Cumulative
Percent
1
1034
51.21
1034
51.21
2
985
48.79
2019
100.00



Is there a way to assign some values of category 3 to category 2?


PaigeMiller
Diamond | Level 26

My mistake. I left out parentheses. Try it this way

 

data want;
    set have;
    if gender=3 then gender = (rand('uniform')<0.5) + 1;
run;
--
Paige Miller
mkeintz
PROC Star
Why reassign a small but probably atypical sample to a large category, thereby probably artificially increasing the within-group variance in the variables of interest?
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
ting1
Fluorite | Level 6

Thanks! 

That works. 

Reeza
Super User

@ting1 wrote:

I would like to assign value of option 3 (transgender) to option 1 or 2 randomly. Any one has suggestions on how to randomly assigning a small category to other categories?

 


From a representation and inclusion standpoint, this is not something that should be done and is in fact the opposite. Please do not do this. If you do put a giant warning around your analysis so that people know you did this and your results are not representative. Also, do not collect information if you don't know how you're planning to use it. 

 

ballardw
Super User

I agree with @Reeza . IF I were forced to treat your 3 as a different value on that scale then I would be more likely to combine the 99 and 3 codes. Which would be easily done with a format and not lose any information in the data:

 

Proc format;

value gender_r

1='Male'

2='Female'

3,99='Trans/ Prefer not to Answer'  /* or some other text */

;

 

Then use that format gender_r with your gender variable for any summary statistics, reporting or graphing purpose in SAS procedures.

 

 

 

 

 

ting1
Fluorite | Level 6
Thanks for your advice.

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 995 views
  • 6 likes
  • 5 in conversation