BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
camfarrell25
Quartz | Level 8

Hello,

 

I'm currently trying to compress a dataset where I have a series of a hundred dummy variable which essentially are all associated with a different referral reason code such that

 

PrimaryKey      RefCd01 RefCd03 RefCdN9 RefCdG8 .....

a                            .              .             1               .

b                            .              .             1               .

c                            .              1             .               1

d                            1              .             .               1

e                            .              .             .                1

f                            .              .             1               .

 

 

Where the part following the string 'RefCd' is unique (i.e. 01, 02, 03, N9, G8)

 

Due to the large size of the data, I wanted to compress the referral reason codes in one of two ways:

1) calculate the max number of referral codes assigned to any observation and create that number of variable such that I have (Referral 1, Referral 2, Refferal 3.... (Max Possible Refferal))  and where for each observation, I would have the column name where the dummy =1. Such that:

PrimaryKey      RefCd01 RefCd03 RefCdN9 RefCdG8 .....

PrimaryKey      Referral1 Referral2 Referral3 Referral4 .....

a                      RefCdN9

b                      RefCdN9     

c                      RefCd03     RefCdG8    

d                      RefCd01     RefCdG8    

e                      RefCdG8        

f                       RefCdN9     

 

OR  Option 2 is to limit to one variable which contains all the column names applicable for any given observation but I'm afraid it may be a little too long.


Any suggestions?????

 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

How will the resulting data be used?

Or possibly how is the current data with " a hundred dummy variables" actually used?

Some types of summary or analysis work better with certain data structures.

 

It could be that what you really need instead of a hundred variables is 3:

Primary Key, Referral number, Code

a                   1                          RefCdN9

a                   2                          RefCdG8    

View solution in original post

2 REPLIES 2
Astounding
PROC Star

If your real concern is the size of the data, I would suggest option #3:  store all these codes as character variables, each one character long.

ballardw
Super User

How will the resulting data be used?

Or possibly how is the current data with " a hundred dummy variables" actually used?

Some types of summary or analysis work better with certain data structures.

 

It could be that what you really need instead of a hundred variables is 3:

Primary Key, Referral number, Code

a                   1                          RefCdN9

a                   2                          RefCdG8    

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 1109 views
  • 0 likes
  • 3 in conversation