Hello,
I'm currently trying to compress a dataset where I have a series of a hundred dummy variable which essentially are all associated with a different referral reason code such that
PrimaryKey RefCd01 RefCd03 RefCdN9 RefCdG8 .....
a . . 1 .
b . . 1 .
c . 1 . 1
d 1 . . 1
e . . . 1
f . . 1 .
Where the part following the string 'RefCd' is unique (i.e. 01, 02, 03, N9, G8)
Due to the large size of the data, I wanted to compress the referral reason codes in one of two ways:
1) calculate the max number of referral codes assigned to any observation and create that number of variable such that I have (Referral 1, Referral 2, Refferal 3.... (Max Possible Refferal)) and where for each observation, I would have the column name where the dummy =1. Such that:
PrimaryKey RefCd01 RefCd03 RefCdN9 RefCdG8 .....
PrimaryKey Referral1 Referral2 Referral3 Referral4 .....
a RefCdN9
b RefCdN9
c RefCd03 RefCdG8
d RefCd01 RefCdG8
e RefCdG8
f RefCdN9
OR Option 2 is to limit to one variable which contains all the column names applicable for any given observation but I'm afraid it may be a little too long.
Any suggestions?????
How will the resulting data be used?
Or possibly how is the current data with " a hundred dummy variables" actually used?
Some types of summary or analysis work better with certain data structures.
It could be that what you really need instead of a hundred variables is 3:
Primary Key, Referral number, Code
a 1 RefCdN9
a 2 RefCdG8
If your real concern is the size of the data, I would suggest option #3: store all these codes as character variables, each one character long.
How will the resulting data be used?
Or possibly how is the current data with " a hundred dummy variables" actually used?
Some types of summary or analysis work better with certain data structures.
It could be that what you really need instead of a hundred variables is 3:
Primary Key, Referral number, Code
a 1 RefCdN9
a 2 RefCdG8
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.