Dear All,
I have a dataset (attached) with observation ids (obs) and a set of codes for each id (comcode). Some obs may have only one comcode and some others may have as many as 10. My goal is to transform the data the following way:
FROM: obs comcode
1 251
1 182
1 134
2 251
2 134
2 182
3 186
3 251
3 134
3 182
4 134
4 182
TO:
obs c102 c104 c106 ......c134......c182.......c186.......
1 0 0 0 1 1 0
2 0 0 0 1 1 0
3 0 0 0 1 1 1
.
.
.
In other words, I want to reduce the size of the dataset by creating dummies for each code. I tried transposing the data, but it created errors due to repeating codes. I would appreciate your help.
Thanks,
This should be about right. Each cell will be the frequency.
example:
data have;
input obs comcode;
cards;
1 2
1 6
1 9
2 5
2 8
2 10
3 4
3 8
3 9
3 20
;
data want(drop=i);
set have;
by obs;
retain c1-c20 0;
array _com(*) c1-c20;
do i=1 to dim(_com);
if comcode=i then _com(i)=1;
end;
if last.obs then do;
output;
do i=1 to dim(_com);
_com(i)=0;
end;
end;
proc print;run;
o
m
c
O o o c c c c c c c c c c c
b b d c c c c c c c c c 1 1 1 1 1 1 1 1 1 1 2
s s e 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0
1 1 9 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0
2 2 10 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0
3 3 20 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1
You do have to realize, though, that the output you describe probably increases the size of your data set. SAS will store all numerics using 8 bytes, so it takes just as much storage to hold a single digit as to hold 3 digits. And you could have many more numbers to store than you started with. You might affect that using compression (or alternatively, understanding how SAS stores numerics and applying the right length to your variables), or perhaps storing your data as character instead of numeric. But transposing and converting to flags may backfire.
Thank you for your help and suggestions. I am going to try the code suggested by Linlin. Regarding Astounding's question, I realized that I was not clear in asking this question. By reducing the size, I actually meant reducing the number of rows. I have multiple observations for each id (obs), I want to put all of it in one single row. The reason why I am interested in this transformation is to change my dataset to a panel date setting that allows me to have dummies for each code (comcode).
Best,
This should be about right. Each cell will be the frequency.
Hi DN,
your code is so nice! I added the RED part to your code.
proc transpose data=counts prefix=CODE out=wanted_dataset;
by obs;
id comcode;
var _freq_;
run;
Thank you all for your amazing help. This is what I needed.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.