Help using Base SAS procedures

Reducing the size of the data via dummies

Accepted Solution Solved
Reply
Contributor
Posts: 55
Accepted Solution

Reducing the size of the data via dummies

Dear All,

I have a dataset (attached) with observation ids (obs) and a set of codes for each id (comcode). Some obs may have only one comcode and some others may have as many as 10. My goal is to transform the data the following way:

           FROM:                             obs         comcode

                                                      1              251

                                                      1              182

                                                      1              134

                                                      2              251

                                                      2              134

                                                      2              182

                                                      3              186

                                                      3              251

                                                      3              134

                                                      3              182

                                                      4              134

                                                      4              182

TO:

obs c102   c104   c106 ......c134......c182.......c186.......

1     0           0        0          1            1            0

2     0           0         0         1            1            0

3     0           0         0         1            1            1

.

.

.

In other words, I want to reduce the size of the dataset by creating dummies for each code. I tried transposing the data, but it created errors due to repeating codes. I would appreciate your help.

Thanks,

Attachment

Accepted Solutions
Solution
‎01-28-2013 11:22 AM
Respected Advisor
Posts: 3,777

Re: Reducing the size of the data via dummies

This should be about right.  Each cell will be the frequency.

libname me '~';
proc summary data=me.s7_comm completetypes nway;
  
class obs comcode;
   output out=counts(drop=_type_);
   run;

proc transpose data=counts prefix=CODE;
   by obs;
   id comcode;
   var _freq_;
   run;

View solution in original post


All Replies
Super Contributor
Posts: 1,636

Re: Reducing the size of the data via dummies

example:

data have;

input obs comcode;

cards;

1 2

1 6

1 9

2 5

2 8

2 10

3 4

3 8

3 9

3 20

;

data want(drop=i);

set have;

by obs;

  retain c1-c20 0;

array _com(*) c1-c20;

do i=1 to dim(_com);

if comcode=i then _com(i)=1;

     end;

if last.obs then do;

  output;

  do i=1 to dim(_com);

    _com(i)=0;

   end;

  end;

proc print;run;

        o

              m

              c

       O  o   o                             c  c  c  c  c  c  c  c  c  c  c

       b  b   d  c  c  c  c  c  c  c  c  c  1  1  1  1  1  1  1  1  1  1  2

       s  s   e  1  2  3  4  5  6  7  8  9  0  1  2  3  4  5  6  7  8  9  0

       1  1   9  0  1  0  0  0  1  0  0  1  0  0  0  0  0  0  0  0  0  0  0

       2  2  10  0  0  0  0  1  0  0  1  0  1  0  0  0  0  0  0  0  0  0  0

       3  3  20  0  0  0  1  0  0  0  1  1  0  0  0  0  0  0  0  0  0  0  1

Super User
Posts: 5,085

Re: Reducing the size of the data via dummies

You do have to realize, though, that the output you describe probably increases the size of your data set.  SAS will store all numerics using 8 bytes, so it takes just as much storage to hold a single digit as to hold 3 digits.  And you could have many more numbers to store than you started with.  You might affect that using compression (or alternatively, understanding how SAS stores numerics and applying the right length to your variables), or perhaps storing your data as character instead of numeric.  But transposing and converting to flags may backfire.

Contributor
Posts: 55

Re: Reducing the size of the data via dummies

Thank you for your help and suggestions. I am going to try the code suggested by Linlin. Regarding Astounding's question, I realized that I was not clear in asking this question. By reducing the size, I actually meant reducing the number of rows. I have multiple observations for each id (obs), I want to put all of it in one single row. The reason why I am interested in this transformation is to change my dataset to a panel date setting that allows me to have dummies for each code (comcode).

Best,

Solution
‎01-28-2013 11:22 AM
Respected Advisor
Posts: 3,777

Re: Reducing the size of the data via dummies

This should be about right.  Each cell will be the frequency.

libname me '~';
proc summary data=me.s7_comm completetypes nway;
  
class obs comcode;
   output out=counts(drop=_type_);
   run;

proc transpose data=counts prefix=CODE;
   by obs;
   id comcode;
   var _freq_;
   run;
Super Contributor
Posts: 1,636

Re: Reducing the size of the data via dummies

Hi DN,

your code is so nice! I added the RED part to your code.

proc transpose data=counts prefix=CODE out=wanted_dataset;
   by obs;
   id comcode;
   var _freq_;
   run;

Contributor
Posts: 55

Re: Reducing the size of the data via dummies

Thank you all for your amazing help. This is what I needed.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 205 views
  • 11 likes
  • 4 in conversation