turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- Base SAS Programming
- /
- Compressing dummy variables into one

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

08-31-2016 12:54 PM

Hello,

I'm currently trying to compress a dataset where I have a series of a hundred dummy variable which essentially are all associated with a different referral reason code such that

PrimaryKey RefCd01 RefCd03 RefCdN9 RefCdG8 .....

a . . 1 .

b . . 1 .

c . 1 . 1

d 1 . . 1

e . . . 1

f . . 1 .

Where the part following the string 'RefCd' is unique (i.e. 01, 02, 03, N9, G8)

Due to the large size of the data, I wanted to compress the referral reason codes in one of two ways:

1) calculate the max number of referral codes assigned to any observation and create that number of variable such that I have (Referral 1, Referral 2, Refferal 3.... (Max Possible Refferal)) and where for each observation, I would have the column name where the dummy =1. Such that:

PrimaryKey RefCd01 RefCd03 RefCdN9 RefCdG8 .....

PrimaryKey Referral1 Referral2 Referral3 Referral4 .....

a RefCdN9

b RefCdN9

c RefCd03 RefCdG8

d RefCd01 RefCdG8

e RefCdG8

f RefCdN9

OR Option 2 is to limit to one variable which contains all the column names applicable for any given observation but I'm afraid it may be a little too long.

Any suggestions?????

Accepted Solutions

Solution

09-01-2016
08:44 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to camfarrell25

08-31-2016 01:35 PM

How will the resulting data be used?

Or possibly how is the current data with " a hundred dummy variables" actually used?

Some types of summary or analysis work better with certain data structures.

It could be that what you really need instead of a hundred variables is 3:

Primary Key, Referral number, Code

a 1 RefCdN9

a 2 RefCdG8

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to camfarrell25

08-31-2016 01:02 PM

If your real concern is the size of the data, I would suggest option #3: store all these codes as character variables, each one character long.

Solution

09-01-2016
08:44 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to camfarrell25

08-31-2016 01:35 PM

How will the resulting data be used?

Or possibly how is the current data with " a hundred dummy variables" actually used?

Some types of summary or analysis work better with certain data structures.

It could be that what you really need instead of a hundred variables is 3:

Primary Key, Referral number, Code

a 1 RefCdN9

a 2 RefCdG8