BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
finans_sas
Quartz | Level 8

Dear All,

I have a dataset (attached) with observation ids (obs) and a set of codes for each id (comcode). Some obs may have only one comcode and some others may have as many as 10. My goal is to transform the data the following way:

           FROM:                             obs         comcode

                                                      1              251

                                                      1              182

                                                      1              134

                                                      2              251

                                                      2              134

                                                      2              182

                                                      3              186

                                                      3              251

                                                      3              134

                                                      3              182

                                                      4              134

                                                      4              182

TO:

obs c102   c104   c106 ......c134......c182.......c186.......

1     0           0        0          1            1            0

2     0           0         0         1            1            0

3     0           0         0         1            1            1

.

.

.

In other words, I want to reduce the size of the dataset by creating dummies for each code. I tried transposing the data, but it created errors due to repeating codes. I would appreciate your help.

Thanks,

1 ACCEPTED SOLUTION

Accepted Solutions
data_null__
Jade | Level 19

This should be about right.  Each cell will be the frequency.

libname me '~';
proc summary data=me.s7_comm completetypes nway;
  
class obs comcode;
   output out=counts(drop=_type_);
   run;

proc transpose data=counts prefix=CODE;
   by obs;
   id comcode;
   var _freq_;
   run;

View solution in original post

6 REPLIES 6
Linlin
Lapis Lazuli | Level 10

example:

data have;

input obs comcode;

cards;

1 2

1 6

1 9

2 5

2 8

2 10

3 4

3 8

3 9

3 20

;

data want(drop=i);

set have;

by obs;

  retain c1-c20 0;

array _com(*) c1-c20;

do i=1 to dim(_com);

if comcode=i then _com(i)=1;

     end;

if last.obs then do;

  output;

  do i=1 to dim(_com);

    _com(i)=0;

   end;

  end;

proc print;run;

        o

              m

              c

       O  o   o                             c  c  c  c  c  c  c  c  c  c  c

       b  b   d  c  c  c  c  c  c  c  c  c  1  1  1  1  1  1  1  1  1  1  2

       s  s   e  1  2  3  4  5  6  7  8  9  0  1  2  3  4  5  6  7  8  9  0

       1  1   9  0  1  0  0  0  1  0  0  1  0  0  0  0  0  0  0  0  0  0  0

       2  2  10  0  0  0  0  1  0  0  1  0  1  0  0  0  0  0  0  0  0  0  0

       3  3  20  0  0  0  1  0  0  0  1  1  0  0  0  0  0  0  0  0  0  0  1

Astounding
PROC Star

You do have to realize, though, that the output you describe probably increases the size of your data set.  SAS will store all numerics using 8 bytes, so it takes just as much storage to hold a single digit as to hold 3 digits.  And you could have many more numbers to store than you started with.  You might affect that using compression (or alternatively, understanding how SAS stores numerics and applying the right length to your variables), or perhaps storing your data as character instead of numeric.  But transposing and converting to flags may backfire.

finans_sas
Quartz | Level 8

Thank you for your help and suggestions. I am going to try the code suggested by Linlin. Regarding Astounding's question, I realized that I was not clear in asking this question. By reducing the size, I actually meant reducing the number of rows. I have multiple observations for each id (obs), I want to put all of it in one single row. The reason why I am interested in this transformation is to change my dataset to a panel date setting that allows me to have dummies for each code (comcode).

Best,

data_null__
Jade | Level 19

This should be about right.  Each cell will be the frequency.

libname me '~';
proc summary data=me.s7_comm completetypes nway;
  
class obs comcode;
   output out=counts(drop=_type_);
   run;

proc transpose data=counts prefix=CODE;
   by obs;
   id comcode;
   var _freq_;
   run;
Linlin
Lapis Lazuli | Level 10

Hi DN,

your code is so nice! I added the RED part to your code.

proc transpose data=counts prefix=CODE out=wanted_dataset;
   by obs;
   id comcode;
   var _freq_;
   run;

finans_sas
Quartz | Level 8

Thank you all for your amazing help. This is what I needed.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1469 views
  • 11 likes
  • 4 in conversation