BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ilikesas
Barite | Level 11

Hi,

I have one file with names and categories, the categories ranging from 1 to 4:

name category
a1
b2
c3
d4
e1
f2

and I want to merge it with another file that contains the names, but here there are 4 category columns each for every category:

namecategory1category2category3category4
a1000
b0100
c0010
d0001
e1000
f0100

like this in the second table I get dummy variables and then do a regression analysis. So each category for each name in the first file I put a 1 into the column that corresponds to that category and zero in the other categories, for the same name.

Please note that the number of categories can change and therefore the code should be flexible to accommodate this

Thank you

1 ACCEPTED SOLUTION

Accepted Solutions
Haikuo
Onyx | Level 15

Then a little bit Hash would be handy:

data have;

     input

           name $     category;

     cards;

a 1

a 3

b 2

c 3

c 4

d 4

e 1

f 2

a 2

;

/*otherwise, this could do*/

data _null_;

     if _n_=1 then

           do;

                dcl hash h(ordered:'a');

                h.definekey('name');

                h.definedata('name', 'category1', 'category2', 'category3', 'category4');

                h.definedone();

           end;

     set have end=last;

     array cat(4) category1-category4;

     do i=1 to dim(cat);

           cat(i)=0;

     end;

     rc=h.find();

     cat(category)=1;

     rc=h.replace();

     if last then

           h.output(dataset:'want');

run;


Haikuo

View solution in original post

8 REPLIES 8
Haikuo
Onyx | Level 15

data have;

     input

           name $     category;

     cards;

a 1

b 2

c 3

d 4

e 1

f 2

;

/*as a simple starter, if no duplicates existing for name, try the following*/

data want;

     set have;

     array cat category1-category4;

     do over cat;

           if _i_=category then

                cat=1;

           else cat=0;

     end;

run;

ilikesas
Barite | Level 11

Hi Hai and thanks for the quick reply!!!

Actually, you mentioned a very important point that I overlooked in the description and that is that I do have name duplicates in the second file because the second file contains the board of directors for a given company for a period of 10 years and therefore in most cases many names will be present for all the years of the given company.

Also, some directors can be present for different companies in the same year.

Thank you

Haikuo
Onyx | Level 15

Then a little bit Hash would be handy:

data have;

     input

           name $     category;

     cards;

a 1

a 3

b 2

c 3

c 4

d 4

e 1

f 2

a 2

;

/*otherwise, this could do*/

data _null_;

     if _n_=1 then

           do;

                dcl hash h(ordered:'a');

                h.definekey('name');

                h.definedata('name', 'category1', 'category2', 'category3', 'category4');

                h.definedone();

           end;

     set have end=last;

     array cat(4) category1-category4;

     do i=1 to dim(cat);

           cat(i)=0;

     end;

     rc=h.find();

     cat(category)=1;

     rc=h.replace();

     if last then

           h.output(dataset:'want');

run;


Haikuo

Haikuo
Onyx | Level 15

Also, if your output table is too large to fit into your memory (where Hash table resides), then the following will do the same thing by adding some steps:

data have;

     input

           name $     category;

     cards;

a 1

a 3

b 2

c 3

c 4

d 4

e 1

f 2

a 2

;

data h2;

     set have;

     array cat(4) category1-category4;

     cat(category)=1;

run;

proc sort data=h2 out=h3;

     by name;

run;

data want;

     update h3(obs=0) h3;

     by name;

     drop category;

run;

ilikesas
Barite | Level 11

Hi Hai,

I ran the code of #3 but unfortunately got an error message:

31   data _null_;

32        if _n _=1 then

                -

                22

ERROR 22-322: Syntax error, expecting one of the following: !, !!, &, (, *, **, +, -, /, ;, <, <=, <>, =, >, ><, >=, AND, EQ,

              GE, GT, IN, LE, LT, MAX, MIN, NE, NG, NL, NOTIN, OR, [, ^=, {, |, ||, ~=.

33              do;

34                   dcl hash h(ordered:'a');

ERROR: DATA STEP Component Object failure.  Aborted during the COMPILATION phase.

Haikuo
Onyx | Level 15

For some reason, there is an additional blank inserted into the variable name as _n _, where it should be _n_:

if _n_=1 then

ilikesas
Barite | Level 11

Hi Hai,

actually I managed to do the merge based on the very first code that you put because I realized that the categories file from which I had to make the new 4 category columns didn't have any duplicated since I DISTINCTed it in the very beggining, so I guess the first code is the best answer.

Thanks a lot for helping me!!!

Vladislaff
SAS Employee

proc sql;

create table want as

select name,

category=1 as category1,

category=2 as category2,

category=3 as category3,

category=4 as category4

from have;

quit;

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 1294 views
  • 0 likes
  • 3 in conversation