Solved: Re: Compressing dummy variables into one

camfarrell25 · Posted 08-31-2016 12:54 PM

Hello,

I'm currently trying to compress a dataset where I have a series of a hundred dummy variable which essentially are all associated with a different referral reason code such that

PrimaryKey RefCd01 RefCd03 RefCdN9 RefCdG8 .....

a . . 1 .

b . . 1 .

c . 1 . 1

d 1 . . 1

e . . . 1

f . . 1 .

Where the part following the string 'RefCd' is unique (i.e. 01, 02, 03, N9, G8)

Due to the large size of the data, I wanted to compress the referral reason codes in one of two ways:

1) calculate the max number of referral codes assigned to any observation and create that number of variable such that I have (Referral 1, Referral 2, Refferal 3.... (Max Possible Refferal)) and where for each observation, I would have the column name where the dummy =1. Such that:

PrimaryKey RefCd01 RefCd03 RefCdN9 RefCdG8 .....

PrimaryKey Referral1 Referral2 Referral3 Referral4 .....

a RefCdN9

b RefCdN9

c RefCd03 RefCdG8

d RefCd01 RefCdG8

e RefCdG8

f RefCdN9

OR Option 2 is to limit to one variable which contains all the column names applicable for any given observation but I'm afraid it may be a little too long.

Any suggestions?????

ballardw · Posted 08-31-2016 01:35 PM

How will the resulting data be used?

Or possibly how is the current data with " a hundred dummy variables" actually used?

Some types of summary or analysis work better with certain data structures.

It could be that what you really need instead of a hundred variables is 3:

Primary Key, Referral number, Code

a 1 RefCdN9

a 2 RefCdG8

View solution in original post

Astounding · Posted 08-31-2016 01:02 PM

If your real concern is the size of the data, I would suggest option #3: store all these codes as character variables, each one character long.

ballardw · Posted 08-31-2016 01:35 PM

How will the resulting data be used?

Or possibly how is the current data with " a hundred dummy variables" actually used?

Some types of summary or analysis work better with certain data structures.

It could be that what you really need instead of a hundred variables is 3:

Primary Key, Referral number, Code

a 1 RefCdN9

a 2 RefCdG8

Compressing dummy variables into one

Re: Compressing dummy variables into one

Re: Compressing dummy variables into one

Re: Compressing dummy variables into one

Click image to register for webinar

Classroom Training Available!