BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
camfarrell25
Quartz | Level 8

Hello,

 

I'm currently trying to compress a dataset where I have a series of a hundred dummy variable which essentially are all associated with a different referral reason code such that

 

PrimaryKey      RefCd01 RefCd03 RefCdN9 RefCdG8 .....

a                            .              .             1               .

b                            .              .             1               .

c                            .              1             .               1

d                            1              .             .               1

e                            .              .             .                1

f                            .              .             1               .

 

 

Where the part following the string 'RefCd' is unique (i.e. 01, 02, 03, N9, G8)

 

Due to the large size of the data, I wanted to compress the referral reason codes in one of two ways:

1) calculate the max number of referral codes assigned to any observation and create that number of variable such that I have (Referral 1, Referral 2, Refferal 3.... (Max Possible Refferal))  and where for each observation, I would have the column name where the dummy =1. Such that:

PrimaryKey      RefCd01 RefCd03 RefCdN9 RefCdG8 .....

PrimaryKey      Referral1 Referral2 Referral3 Referral4 .....

a                      RefCdN9

b                      RefCdN9     

c                      RefCd03     RefCdG8    

d                      RefCd01     RefCdG8    

e                      RefCdG8        

f                       RefCdN9     

 

OR  Option 2 is to limit to one variable which contains all the column names applicable for any given observation but I'm afraid it may be a little too long.


Any suggestions?????

 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

How will the resulting data be used?

Or possibly how is the current data with " a hundred dummy variables" actually used?

Some types of summary or analysis work better with certain data structures.

 

It could be that what you really need instead of a hundred variables is 3:

Primary Key, Referral number, Code

a                   1                          RefCdN9

a                   2                          RefCdG8    

View solution in original post

2 REPLIES 2
Astounding
PROC Star

If your real concern is the size of the data, I would suggest option #3:  store all these codes as character variables, each one character long.

ballardw
Super User

How will the resulting data be used?

Or possibly how is the current data with " a hundred dummy variables" actually used?

Some types of summary or analysis work better with certain data structures.

 

It could be that what you really need instead of a hundred variables is 3:

Primary Key, Referral number, Code

a                   1                          RefCdN9

a                   2                          RefCdG8    

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 804 views
  • 0 likes
  • 3 in conversation