Hi, I have two datasets
data deal;
compressed_a = 'WYETH'; compressed_b = 'FORTDODGEANIMALHEALTH'; year = 1945;output;
compressed_a = 'ELILILLYANDCOMPANY'; compressed_b = 'HYBRITECHINCORPORATED'; year = 1985;output;
compressed_a = 'GENENTECH'; compressed_b = 'ABBOTTLABORATORIES'; year = 1990;output;
run;
data patent;
firmname = 'GENENTECH'; pnyear = 1985; IPC = 'B05C'; output;
firmname = 'ELILILLYANDCOMPANY'; pnyear = 1983; IPC = 'A01N'; output;
run;
table deal has the M&A data of two firms, including firm names (compressed_a, compressed_b) and the year of the deal was conducted.
table patent has the list of ipc codes of a firm has acquired.
my main question is this: How many number of patents does a company has, at a focal year (deal year).
expected output(ipc_a indicating number of patents, accumulative)
compressed_a | compressed_d | year | ipc_a | ipc_b |
WYETH | FORTDODGEANIMALHEALTH | 1945 | 0 | 0 |
ELILILLYANDCOMPANY | HYBRITECHINCORPORATED | 1985 | 1 | 0 |
GENENTECH | ABBOTTLABORATORIES | 1990 | 1 | 0 |
thus i have constructed a code as below:
data size_merge5;
if _n_=1 then do;
if 0 then set patent;
dcl hash H (dataset:'patent',multidata:'y') ;
h.definekey ("firmname") ;
h.definedata ("pnyear","ipc") ;
h.definedone () ;
end;
set deal;
IPC_A=0;
do rc=h.find(key:'compressed_a') by 0 while(rc=0);
if pnyear < year then IPC_A =sum(IPC_A,1);
rc=h.find_next();
end;
IPC_B=0;
do rc=h.find(key:'compressed_b') by 0 while(rc=0);
if pnyear < year then IPC_B =sum(IPC_B,1);
rc=h.find_next();
end;
run;
but unfortunately this code gives me all zeros for ipc_a and ipc_b.
cannot find what's wrong. please help.
First, thank you for provide a complete program, with sample data.
There are two problems - one having to do with hash object method use, and the other with character variable length:
data deal;
length compressed_a compressed_b $24;
compressed_a = 'WYETH'; compressed_b = 'FORTDODGEANIMALHEALTH'; year = 1945;output;
compressed_a = 'ELILILLYANDCOMPANY'; compressed_b = 'HYBRITECHINCORPORATED'; year = 1985;output;
compressed_a = 'GENENTECH'; compressed_b = 'ABBOTTLABORATORIES'; year = 1990;output;
run;
data patent;
length firmname $24;
firmname = 'GENENTECH'; pnyear = 1985; IPC = 'B05C'; output;
firmname = 'ELILILLYANDCOMPANY'; pnyear = 1983; IPC = 'A01N'; output;
run;
data size_merge5;
if _n_=1 then do;
if 0 then set patent;
dcl hash H (dataset:'patent',multidata:'y') ;
h.definekey ("firmname") ;
h.definedata ("pnyear","ipc") ;
h.definedone () ;
end;
set deal;
IPC_A=0;
do rc=h.find(key:compressed_a) by 0 while(rc=0);
if pnyear < year then IPC_A =sum(IPC_A,1);
rc=h.find_next();
end;
IPC_B=0;
do rc=h.find(key:compressed_b) by 0 while(rc=0);
if pnyear < year then IPC_B =sum(IPC_B,1);
rc=h.find_next();
end;
run;
Why not just merge on based on compressed_a=firmname? Thats all you seem to want from that snippet? If it needs to match both, then merge twice (same datastep) and just rename the variable?
/* Note both assumes sorted */ data want; merge deal patent (rename=(firmname=compressed_a ipc=ipc1)); by compressed_a; run; data want; merge want patent (rename=(firmname=compress_b ipc=ipc2)); by compressed_b; run;
First, thank you for provide a complete program, with sample data.
There are two problems - one having to do with hash object method use, and the other with character variable length:
data deal;
length compressed_a compressed_b $24;
compressed_a = 'WYETH'; compressed_b = 'FORTDODGEANIMALHEALTH'; year = 1945;output;
compressed_a = 'ELILILLYANDCOMPANY'; compressed_b = 'HYBRITECHINCORPORATED'; year = 1985;output;
compressed_a = 'GENENTECH'; compressed_b = 'ABBOTTLABORATORIES'; year = 1990;output;
run;
data patent;
length firmname $24;
firmname = 'GENENTECH'; pnyear = 1985; IPC = 'B05C'; output;
firmname = 'ELILILLYANDCOMPANY'; pnyear = 1983; IPC = 'A01N'; output;
run;
data size_merge5;
if _n_=1 then do;
if 0 then set patent;
dcl hash H (dataset:'patent',multidata:'y') ;
h.definekey ("firmname") ;
h.definedata ("pnyear","ipc") ;
h.definedone () ;
end;
set deal;
IPC_A=0;
do rc=h.find(key:compressed_a) by 0 while(rc=0);
if pnyear < year then IPC_A =sum(IPC_A,1);
rc=h.find_next();
end;
IPC_B=0;
do rc=h.find(key:compressed_b) by 0 while(rc=0);
if pnyear < year then IPC_B =sum(IPC_B,1);
rc=h.find_next();
end;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.