DATA Step, Macro, Functions and more

Compressing dummy variables into one

Accepted Solution Solved
Reply
Contributor
Posts: 39
Accepted Solution

Compressing dummy variables into one

Hello,

 

I'm currently trying to compress a dataset where I have a series of a hundred dummy variable which essentially are all associated with a different referral reason code such that

 

PrimaryKey      RefCd01 RefCd03 RefCdN9 RefCdG8 .....

a                            .              .             1               .

b                            .              .             1               .

c                            .              1             .               1

d                            1              .             .               1

e                            .              .             .                1

f                            .              .             1               .

 

 

Where the part following the string 'RefCd' is unique (i.e. 01, 02, 03, N9, G8)

 

Due to the large size of the data, I wanted to compress the referral reason codes in one of two ways:

1) calculate the max number of referral codes assigned to any observation and create that number of variable such that I have (Referral 1, Referral 2, Refferal 3.... (Max Possible Refferal))  and where for each observation, I would have the column name where the dummy =1. Such that:

PrimaryKey      RefCd01 RefCd03 RefCdN9 RefCdG8 .....

PrimaryKey      Referral1 Referral2 Referral3 Referral4 .....

a                      RefCdN9

b                      RefCdN9     

c                      RefCd03     RefCdG8    

d                      RefCd01     RefCdG8    

e                      RefCdG8        

f                       RefCdN9     

 

OR  Option 2 is to limit to one variable which contains all the column names applicable for any given observation but I'm afraid it may be a little too long.


Any suggestions?????

 


Accepted Solutions
Solution
‎09-01-2016 08:44 AM
Super User
Posts: 11,343

Re: Compressing dummy variables into one

Posted in reply to camfarrell25

How will the resulting data be used?

Or possibly how is the current data with " a hundred dummy variables" actually used?

Some types of summary or analysis work better with certain data structures.

 

It could be that what you really need instead of a hundred variables is 3:

Primary Key, Referral number, Code

a                   1                          RefCdN9

a                   2                          RefCdG8    

View solution in original post


All Replies
Super User
Posts: 5,516

Re: Compressing dummy variables into one

Posted in reply to camfarrell25

If your real concern is the size of the data, I would suggest option #3:  store all these codes as character variables, each one character long.

Solution
‎09-01-2016 08:44 AM
Super User
Posts: 11,343

Re: Compressing dummy variables into one

Posted in reply to camfarrell25

How will the resulting data be used?

Or possibly how is the current data with " a hundred dummy variables" actually used?

Some types of summary or analysis work better with certain data structures.

 

It could be that what you really need instead of a hundred variables is 3:

Primary Key, Referral number, Code

a                   1                          RefCdN9

a                   2                          RefCdG8    

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 2 replies
  • 225 views
  • 0 likes
  • 3 in conversation