BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jimmychoi
Obsidian | Level 7

Hi, I have two datasets

 

data deal;
  compressed_a = 'WYETH'; compressed_b = 'FORTDODGEANIMALHEALTH'; year = 1945;output;
  compressed_a = 'ELILILLYANDCOMPANY'; compressed_b = 'HYBRITECHINCORPORATED'; year = 1985;output;
  compressed_a = 'GENENTECH'; compressed_b = 'ABBOTTLABORATORIES'; year = 1990;output;
run;
data patent;
  firmname = 'GENENTECH'; pnyear = 1985; IPC = 'B05C'; output;
  firmname = 'ELILILLYANDCOMPANY'; pnyear = 1983; IPC = 'A01N'; output;
run;

table deal has the M&A data of two firms, including firm names (compressed_a, compressed_b) and the year of the deal was conducted.

table patent has the list of ipc codes of a firm has acquired.

my main question is this: How many number of patents does a company has, at a focal year (deal year).

expected output(ipc_a indicating number of patents, accumulative)

compressed_acompressed_dyearipc_aipc_b
WYETHFORTDODGEANIMALHEALTH194500
ELILILLYANDCOMPANYHYBRITECHINCORPORATED198510
GENENTECHABBOTTLABORATORIES199010

 

thus i have constructed a code as below:

data size_merge5;
  if _n_=1 then do;
	if 0 then set patent;
	dcl hash H (dataset:'patent',multidata:'y') ;
	h.definekey  ("firmname") ;
	h.definedata ("pnyear","ipc") ;
	h.definedone () ;
  end;

  set deal;  

  IPC_A=0;
  do rc=h.find(key:'compressed_a') by 0 while(rc=0);		
	if pnyear < year then IPC_A =sum(IPC_A,1);	
	rc=h.find_next();
  end;

  IPC_B=0;
  do rc=h.find(key:'compressed_b') by 0 while(rc=0);		
	if pnyear < year then IPC_B =sum(IPC_B,1);	
	rc=h.find_next();
  end;

run;

but unfortunately this code gives me all zeros for ipc_a and ipc_b.

cannot find what's wrong. please help.

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

First, thank you for provide a complete program, with sample data. 

 

There are two problems - one having to do with hash object method use, and the other with character variable length:

  1. Character variable length.  Because the first assignment of compressed_a='WYETH', that variable was assigned a length of $5, thereby truncating values of compressed_a in other observations.  Same with firmname (length=$9) in data set patents.  So assign length by means of a LENGTH statement in your sample.

    But probably your actual data does not suffer from this issue.  In this case, you may have inadvertently added to the problem by making the sample data sets via assignment statements instead of INPUT.
  2. hash object usage.  By specifying    h.find(key:'compressed_a') you were looking for a data itemtime with a key VALUE of 'compressed_a'.  You should specify   h.find(key:compressed_a), which takes the value in variable compressed_a.
data deal;
  length compressed_a  compressed_b $24;
  compressed_a = 'WYETH';              compressed_b = 'FORTDODGEANIMALHEALTH'; year = 1945;output;
  compressed_a = 'ELILILLYANDCOMPANY'; compressed_b = 'HYBRITECHINCORPORATED'; year = 1985;output;
  compressed_a = 'GENENTECH';          compressed_b = 'ABBOTTLABORATORIES';    year = 1990;output;
run;
data patent;
  length firmname $24;
  firmname = 'GENENTECH';          pnyear = 1985; IPC = 'B05C'; output;
  firmname = 'ELILILLYANDCOMPANY'; pnyear = 1983; IPC = 'A01N'; output;
run;

data size_merge5;
  if _n_=1 then do;
	if 0 then set patent;
	dcl hash H (dataset:'patent',multidata:'y') ;
	h.definekey  ("firmname") ;
	h.definedata ("pnyear","ipc") ;
	h.definedone () ;
  end;

  set deal;  

  IPC_A=0;
  do rc=h.find(key:compressed_a) by 0 while(rc=0);		
	if pnyear < year then IPC_A =sum(IPC_A,1);	
	rc=h.find_next();
  end;

  IPC_B=0;
  do rc=h.find(key:compressed_b) by 0 while(rc=0);		
	if pnyear < year then IPC_B =sum(IPC_B,1);	
	rc=h.find_next();
  end;

run;

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

4 REPLIES 4
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Why not just merge on based on compressed_a=firmname?  Thats all you seem to want from that snippet?  If it needs to match both, then merge twice (same datastep) and just rename the variable?

/* Note both assumes sorted */
data want;
  merge deal
             patent (rename=(firmname=compressed_a ipc=ipc1));
  by compressed_a;
run;
data want;
  merge want
             patent (rename=(firmname=compress_b ipc=ipc2));
  by compressed_b;
run;
jimmychoi
Obsidian | Level 7
i am trying to count the accumulative count of IPC codes
mkeintz
PROC Star

First, thank you for provide a complete program, with sample data. 

 

There are two problems - one having to do with hash object method use, and the other with character variable length:

  1. Character variable length.  Because the first assignment of compressed_a='WYETH', that variable was assigned a length of $5, thereby truncating values of compressed_a in other observations.  Same with firmname (length=$9) in data set patents.  So assign length by means of a LENGTH statement in your sample.

    But probably your actual data does not suffer from this issue.  In this case, you may have inadvertently added to the problem by making the sample data sets via assignment statements instead of INPUT.
  2. hash object usage.  By specifying    h.find(key:'compressed_a') you were looking for a data itemtime with a key VALUE of 'compressed_a'.  You should specify   h.find(key:compressed_a), which takes the value in variable compressed_a.
data deal;
  length compressed_a  compressed_b $24;
  compressed_a = 'WYETH';              compressed_b = 'FORTDODGEANIMALHEALTH'; year = 1945;output;
  compressed_a = 'ELILILLYANDCOMPANY'; compressed_b = 'HYBRITECHINCORPORATED'; year = 1985;output;
  compressed_a = 'GENENTECH';          compressed_b = 'ABBOTTLABORATORIES';    year = 1990;output;
run;
data patent;
  length firmname $24;
  firmname = 'GENENTECH';          pnyear = 1985; IPC = 'B05C'; output;
  firmname = 'ELILILLYANDCOMPANY'; pnyear = 1983; IPC = 'A01N'; output;
run;

data size_merge5;
  if _n_=1 then do;
	if 0 then set patent;
	dcl hash H (dataset:'patent',multidata:'y') ;
	h.definekey  ("firmname") ;
	h.definedata ("pnyear","ipc") ;
	h.definedone () ;
  end;

  set deal;  

  IPC_A=0;
  do rc=h.find(key:compressed_a) by 0 while(rc=0);		
	if pnyear < year then IPC_A =sum(IPC_A,1);	
	rc=h.find_next();
  end;

  IPC_B=0;
  do rc=h.find(key:compressed_b) by 0 while(rc=0);		
	if pnyear < year then IPC_B =sum(IPC_B,1);	
	rc=h.find_next();
  end;

run;

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
jimmychoi
Obsidian | Level 7
mkeintz, my savior!
you're right i wasn't suffering from the issue #1 with my actual data but was suffering because of issue #2!!! just by removing single quotes, it works now. THANKS SO MUCH

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1017 views
  • 0 likes
  • 3 in conversation