Hi,
I have one file with names and categories, the categories ranging from 1 to 4:
name | category |
---|---|
a | 1 |
b | 2 |
c | 3 |
d | 4 |
e | 1 |
f | 2 |
and I want to merge it with another file that contains the names, but here there are 4 category columns each for every category:
name | category1 | category2 | category3 | category4 |
---|---|---|---|---|
a | 1 | 0 | 0 | 0 |
b | 0 | 1 | 0 | 0 |
c | 0 | 0 | 1 | 0 |
d | 0 | 0 | 0 | 1 |
e | 1 | 0 | 0 | 0 |
f | 0 | 1 | 0 | 0 |
like this in the second table I get dummy variables and then do a regression analysis. So each category for each name in the first file I put a 1 into the column that corresponds to that category and zero in the other categories, for the same name.
Please note that the number of categories can change and therefore the code should be flexible to accommodate this
Thank you
Then a little bit Hash would be handy:
data have;
input
name $ category;
cards;
a 1
a 3
b 2
c 3
c 4
d 4
e 1
f 2
a 2
;
/*otherwise, this could do*/
data _null_;
if _n_=1 then
do;
dcl hash h(ordered:'a');
h.definekey('name');
h.definedata('name', 'category1', 'category2', 'category3', 'category4');
h.definedone();
end;
set have end=last;
array cat(4) category1-category4;
do i=1 to dim(cat);
cat(i)=0;
end;
rc=h.find();
cat(category)=1;
rc=h.replace();
if last then
h.output(dataset:'want');
run;
Haikuo
data have;
input
name $ category;
cards;
a 1
b 2
c 3
d 4
e 1
f 2
;
/*as a simple starter, if no duplicates existing for name, try the following*/
data want;
set have;
array cat category1-category4;
do over cat;
if _i_=category then
cat=1;
else cat=0;
end;
run;
Hi Hai and thanks for the quick reply!!!
Actually, you mentioned a very important point that I overlooked in the description and that is that I do have name duplicates in the second file because the second file contains the board of directors for a given company for a period of 10 years and therefore in most cases many names will be present for all the years of the given company.
Also, some directors can be present for different companies in the same year.
Thank you
Then a little bit Hash would be handy:
data have;
input
name $ category;
cards;
a 1
a 3
b 2
c 3
c 4
d 4
e 1
f 2
a 2
;
/*otherwise, this could do*/
data _null_;
if _n_=1 then
do;
dcl hash h(ordered:'a');
h.definekey('name');
h.definedata('name', 'category1', 'category2', 'category3', 'category4');
h.definedone();
end;
set have end=last;
array cat(4) category1-category4;
do i=1 to dim(cat);
cat(i)=0;
end;
rc=h.find();
cat(category)=1;
rc=h.replace();
if last then
h.output(dataset:'want');
run;
Haikuo
Also, if your output table is too large to fit into your memory (where Hash table resides), then the following will do the same thing by adding some steps:
data have;
input
name $ category;
cards;
a 1
a 3
b 2
c 3
c 4
d 4
e 1
f 2
a 2
;
data h2;
set have;
array cat(4) category1-category4;
cat(category)=1;
run;
proc sort data=h2 out=h3;
by name;
run;
data want;
update h3(obs=0) h3;
by name;
drop category;
run;
Hi Hai,
I ran the code of #3 but unfortunately got an error message:
31 data _null_;
32 if _n _=1 then
-
22
ERROR 22-322: Syntax error, expecting one of the following: !, !!, &, (, *, **, +, -, /, ;, <, <=, <>, =, >, ><, >=, AND, EQ,
GE, GT, IN, LE, LT, MAX, MIN, NE, NG, NL, NOTIN, OR, [, ^=, {, |, ||, ~=.
33 do;
34 dcl hash h(ordered:'a');
ERROR: DATA STEP Component Object failure. Aborted during the COMPILATION phase.
For some reason, there is an additional blank inserted into the variable name as _n _, where it should be _n_:
if _n_=1 then
Hi Hai,
actually I managed to do the merge based on the very first code that you put because I realized that the categories file from which I had to make the new 4 category columns didn't have any duplicated since I DISTINCTed it in the very beggining, so I guess the first code is the best answer.
Thanks a lot for helping me!!!
proc sql;
create table want as
select name,
category=1 as category1,
category=2 as category2,
category=3 as category3,
category=4 as category4
from have;
quit;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.