Solved: Re: "Merging" 2 tables by transferring values of a column of a table i...

ilikesas · Posted 11-30-2014 05:00 PM

Hi,

I have one file with names and categories, the categories ranging from 1 to 4:

name	category
a	1
b	2
c	3
d	4
e	1
f	2

and I want to merge it with another file that contains the names, but here there are 4 category columns each for every category:

name	category1	category2	category3	category4
a	1	0	0	0
b	0	1	0	0
c	0	0	1	0
d	0	0	0	1
e	1	0	0	0
f	0	1	0	0

like this in the second table I get dummy variables and then do a regression analysis. So each category for each name in the first file I put a 1 into the column that corresponds to that category and zero in the other categories, for the same name.

Please note that the number of categories can change and therefore the code should be flexible to accommodate this

Thank you

Haikuo · Posted 11-30-2014 07:55 PM

Then a little bit Hash would be handy:

data have;

input

name $ category;

cards;

a 1

a 3

b 2

c 3

c 4

d 4

e 1

f 2

a 2

;

/*otherwise, this could do*/

data _null_;

if _n_=1 then

do;

dcl hash h(ordered:'a');

h.definekey('name');

h.definedata('name', 'category1', 'category2', 'category3', 'category4');

h.definedone();

end;

set have end=last;

array cat(4) category1-category4;

do i=1 to dim(cat);

cat(i)=0;

end;

rc=h.find();

cat(category)=1;

rc=h.replace();

if last then

h.output(dataset:'want');

run;

Haikuo

View solution in original post

Haikuo · Posted 11-30-2014 05:51 PM

data have;

input

name $ category;

cards;

a 1

b 2

c 3

d 4

e 1

f 2

;

/*as a simple starter, if no duplicates existing for name, try the following*/

data want;

set have;

array cat category1-category4;

do over cat;

if _i_=category then

cat=1;

else cat=0;

end;

run;

ilikesas · Posted 11-30-2014 07:53 PM

Hi Hai and thanks for the quick reply!!!

Actually, you mentioned a very important point that I overlooked in the description and that is that I do have name duplicates in the second file because the second file contains the board of directors for a given company for a period of 10 years and therefore in most cases many names will be present for all the years of the given company.

Also, some directors can be present for different companies in the same year.

Thank you

Haikuo · Posted 11-30-2014 07:55 PM

Then a little bit Hash would be handy:

data have;

input

name $ category;

cards;

a 1

a 3

b 2

c 3

c 4

d 4

e 1

f 2

a 2

;

/*otherwise, this could do*/

data _null_;

if _n_=1 then

do;

dcl hash h(ordered:'a');

h.definekey('name');

h.definedata('name', 'category1', 'category2', 'category3', 'category4');

h.definedone();

end;

set have end=last;

array cat(4) category1-category4;

do i=1 to dim(cat);

cat(i)=0;

end;

rc=h.find();

cat(category)=1;

rc=h.replace();

if last then

h.output(dataset:'want');

run;

Haikuo

Haikuo · Posted 11-30-2014 08:10 PM

Also, if your output table is too large to fit into your memory (where Hash table resides), then the following will do the same thing by adding some steps:

data have;

input

name $ category;

cards;

a 1

a 3

b 2

c 3

c 4

d 4

e 1

f 2

a 2

;

data h2;

set have;

array cat(4) category1-category4;

cat(category)=1;

run;

proc sort data=h2 out=h3;

by name;

run;

data want;

update h3(obs=0) h3;

by name;

drop category;

run;

ilikesas · Posted 12-02-2014 09:29 AM

Hi Hai,

I ran the code of #3 but unfortunately got an error message:

31 data _null_;

32 if _n _=1 then

-

22

ERROR 22-322: Syntax error, expecting one of the following: !, !!, &, (, *, **, +, -, /, ;, <, <=, <>, =, >, ><, >=, AND, EQ,

GE, GT, IN, LE, LT, MAX, MIN, NE, NG, NL, NOTIN, OR, [, ^=, {, |, ||, ~=.

33 do;

34 dcl hash h(ordered:'a');

ERROR: DATA STEP Component Object failure. Aborted during the COMPILATION phase.

Haikuo · Posted 12-02-2014 09:47 AM

For some reason, there is an additional blank inserted into the variable name as _n _, where it should be _n_:

if _n_=1 then

ilikesas · Posted 12-02-2014 02:00 PM

Hi Hai,

actually I managed to do the merge based on the very first code that you put because I realized that the categories file from which I had to make the new 4 category columns didn't have any duplicated since I DISTINCTed it in the very beggining, so I guess the first code is the best answer.

Thanks a lot for helping me!!!

Vladislaff · Posted 12-05-2014 07:12 AM

proc sql;

create table want as

select name,

category=1 as category1,

category=2 as category2,

category=3 as category3,

category=4 as category4

from have;

quit;

"Merging" 2 tables by transferring values of a column of a table into multiple columns of the other table

Re: "Merging" 2 tables by transferring values of a column of a table into multiple columns of the other table

Re: "Merging" 2 tables by transferring values of a column of a table into multiple columns of the other table

Re: "Merging" 2 tables by transferring values of a column of a table into multiple columns of the other table

Re: "Merging" 2 tables by transferring values of a column of a table into multiple columns of the other table

Re: "Merging" 2 tables by transferring values of a column of a table into multiple columns of the other table

Re: "Merging" 2 tables by transferring values of a column of a table into multiple columns of the other table

Re: "Merging" 2 tables by transferring values of a column of a table into multiple columns of the other table

Re: "Merging" 2 tables by transferring values of a column of a table into multiple columns of the other table

Re: "Merging" 2 tables by transferring values of a column of a table into multiple columns of the other table

SAS Innovate 2025: Save the Date