BookmarkSubscribeRSS Feed
Alexxxxxxx
Pyrite | Level 9
Hi, 
I am trying to compute the percentage of citations (in other technology class) received by a patent in a given year.
Screenshot 2020-10-05 at 10.40.31 pm.pngScreenshot 2020-10-05 at 10.39.11 pm.png
By using table1 as an example,
 
obs
patent class
citation_id
patent_id
year
1
A1
21
1
1990
2
A1
31
1
1990
3
B1
11
1
1990
4
B1
21
1
1990
5
B1
41
1
1990
6
C1
11
1
1990
7
A1
31
2
1991
8
B1
41
2
1991
 
For group1 (i.e., obs1-obs6), it is computed as
Screenshot 2020-10-05 at 10.41.56 pm.png
They belong to the same group because they have same ‘patent_id’ and ‘year’.
The denominator of group 1 is 6, while they have 6 different ‘patent class & citation_id’ observations.
The first numerator is 2 because there are two A1 in group 1 (i.e., obs1, obs2); the second numerator is 3 because there are three B1 in group 1 (i.e., obs3-5); the third numerator is 1 because there are one C1 in group 1 (i.e., obs3-5).
 
For group 2(i.e., obs7-obs8), it is computed as
Screenshot 2020-10-05 at 10.42.01 pm.png
I expect to get table have
patent_id
year
score
1
1990
0.61112
2
1991
0.5

 

Could you please give me some suggestions?

 

data table1;
    infile cards dsd  dlm=",";
    input
    patent_class :$50.  
    patent_id :$50.
    cited_id :$50.
    year :10.
    ;
    cards;
    A1,21,1,1990
    A1,31,1,1990
    B1,11,1,1990
    B1,21,1,1990
    B1,41,1,1990
    C1,11,1,1990
    A1,31,2,1991
    B1,41,2,1991
    ;;;;
run;

 

2 REPLIES 2
ChrisNZ
Tourmaline | Level 20

Consider how the calculation is progressed:

proc sql;
  select CITED_ID,count(*) as CLASSK from HAVE group by PATENT_CLASS, CITED_ID;

  select *, sum(CLASSK) as NCIT from(
    select CITED_ID,count(*) as CLASSK from HAVE group by PATENT_CLASS, CITED_ID 
  ) group by CITED_ID;

  select CITED_ID, 1 - sum( (CLASSK/NCIT)**2 ) from(
    select *, sum(CLASSK) as NCIT from(
      select CITED_ID,count(*) as CLASSK from HAVE group by PATENT_CLASS, CITED_ID 
    ) group by CITED_ID
  ) group by CITED_ID;

CITED_ID CLASSK
1 2
1 3
1 1
2 1
2 1

 

CITED_ID CLASSK NCIT
1 2 6
1 3 6
1 1 6
2 1 2
2 1 2

 

CITED_ID  
1 0.611111
2 0.5

 

 

 

ChrisNZ
Tourmaline | Level 20

A much more efficient way that uses the SAS tools:

proc summary data=HAVE chartype;
  class CITED_ID PATENT_CLASS ;
  types CITED_ID CITED_ID*PATENT_CLASS;
  output out=SUM;
run;

data WANT;
  merge SUM(where=(_TYPE_='10') rename=(_FREQ_=NCIT  ))
        SUM(where=(_TYPE_='11') rename=(_FREQ_=CLASSK));
  by CITED_ID;
  SUM + (CLASSK/NCIT)**2;
  if last.CITED_ID then do;
    SCORE=1-SUM;
    output; 
    SUM=0;
  end;
run;

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 661 views
  • 0 likes
  • 2 in conversation