BookmarkSubscribeRSS Feed
Alexxxxxxx
Pyrite | Level 9
Hi, 
I am trying to compute the percentage of citations (in other technology class) received by a patent in a given year.
Screenshot 2020-10-05 at 10.40.31 pm.pngScreenshot 2020-10-05 at 10.39.11 pm.png
By using table1 as an example,
 
obs
patent class
citation_id
patent_id
year
1
A1
21
1
1990
2
A1
31
1
1990
3
B1
11
1
1990
4
B1
21
1
1990
5
B1
41
1
1990
6
C1
11
1
1990
7
A1
31
2
1991
8
B1
41
2
1991
 
For group1 (i.e., obs1-obs6), it is computed as
Screenshot 2020-10-05 at 10.41.56 pm.png
They belong to the same group because they have same ‘patent_id’ and ‘year’.
The denominator of group 1 is 6, while they have 6 different ‘patent class & citation_id’ observations.
The first numerator is 2 because there are two A1 in group 1 (i.e., obs1, obs2); the second numerator is 3 because there are three B1 in group 1 (i.e., obs3-5); the third numerator is 1 because there are one C1 in group 1 (i.e., obs3-5).
 
For group 2(i.e., obs7-obs8), it is computed as
Screenshot 2020-10-05 at 10.42.01 pm.png
I expect to get table have
patent_id
year
score
1
1990
0.61112
2
1991
0.5

 

Could you please give me some suggestions?

 

data table1;
    infile cards dsd  dlm=",";
    input
    patent_class :$50.  
    patent_id :$50.
    cited_id :$50.
    year :10.
    ;
    cards;
    A1,21,1,1990
    A1,31,1,1990
    B1,11,1,1990
    B1,21,1,1990
    B1,41,1,1990
    C1,11,1,1990
    A1,31,2,1991
    B1,41,2,1991
    ;;;;
run;

 

2 REPLIES 2
ChrisNZ
Tourmaline | Level 20

Consider how the calculation is progressed:

proc sql;
  select CITED_ID,count(*) as CLASSK from HAVE group by PATENT_CLASS, CITED_ID;

  select *, sum(CLASSK) as NCIT from(
    select CITED_ID,count(*) as CLASSK from HAVE group by PATENT_CLASS, CITED_ID 
  ) group by CITED_ID;

  select CITED_ID, 1 - sum( (CLASSK/NCIT)**2 ) from(
    select *, sum(CLASSK) as NCIT from(
      select CITED_ID,count(*) as CLASSK from HAVE group by PATENT_CLASS, CITED_ID 
    ) group by CITED_ID
  ) group by CITED_ID;

CITED_ID CLASSK
1 2
1 3
1 1
2 1
2 1

 

CITED_ID CLASSK NCIT
1 2 6
1 3 6
1 1 6
2 1 2
2 1 2

 

CITED_ID  
1 0.611111
2 0.5

 

 

 

ChrisNZ
Tourmaline | Level 20

A much more efficient way that uses the SAS tools:

proc summary data=HAVE chartype;
  class CITED_ID PATENT_CLASS ;
  types CITED_ID CITED_ID*PATENT_CLASS;
  output out=SUM;
run;

data WANT;
  merge SUM(where=(_TYPE_='10') rename=(_FREQ_=NCIT  ))
        SUM(where=(_TYPE_='11') rename=(_FREQ_=CLASSK));
  by CITED_ID;
  SUM + (CLASSK/NCIT)**2;
  if last.CITED_ID then do;
    SCORE=1-SUM;
    output; 
    SUM=0;
  end;
run;

 

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 436 views
  • 0 likes
  • 2 in conversation