BookmarkSubscribeRSS Feed
Alexxxxxxx
Pyrite | Level 9
Hi, 
I am trying to compute the percentage of citations (in other technology class) received by a patent in a given year.
Screenshot 2020-10-05 at 10.40.31 pm.pngScreenshot 2020-10-05 at 10.39.11 pm.png
By using table1 as an example,
 
obs
patent class
citation_id
patent_id
year
1
A1
21
1
1990
2
A1
31
1
1990
3
B1
11
1
1990
4
B1
21
1
1990
5
B1
41
1
1990
6
C1
11
1
1990
7
A1
31
2
1991
8
B1
41
2
1991
 
For group1 (i.e., obs1-obs6), it is computed as
Screenshot 2020-10-05 at 10.41.56 pm.png
They belong to the same group because they have same ‘patent_id’ and ‘year’.
The denominator of group 1 is 6, while they have 6 different ‘patent class & citation_id’ observations.
The first numerator is 2 because there are two A1 in group 1 (i.e., obs1, obs2); the second numerator is 3 because there are three B1 in group 1 (i.e., obs3-5); the third numerator is 1 because there are one C1 in group 1 (i.e., obs3-5).
 
For group 2(i.e., obs7-obs8), it is computed as
Screenshot 2020-10-05 at 10.42.01 pm.png
I expect to get table have
patent_id
year
score
1
1990
0.61112
2
1991
0.5

 

Could you please give me some suggestions?

 

data table1;
    infile cards dsd  dlm=",";
    input
    patent_class :$50.  
    patent_id :$50.
    cited_id :$50.
    year :10.
    ;
    cards;
    A1,21,1,1990
    A1,31,1,1990
    B1,11,1,1990
    B1,21,1,1990
    B1,41,1,1990
    C1,11,1,1990
    A1,31,2,1991
    B1,41,2,1991
    ;;;;
run;

 

2 REPLIES 2
ChrisNZ
Tourmaline | Level 20

Consider how the calculation is progressed:

proc sql;
  select CITED_ID,count(*) as CLASSK from HAVE group by PATENT_CLASS, CITED_ID;

  select *, sum(CLASSK) as NCIT from(
    select CITED_ID,count(*) as CLASSK from HAVE group by PATENT_CLASS, CITED_ID 
  ) group by CITED_ID;

  select CITED_ID, 1 - sum( (CLASSK/NCIT)**2 ) from(
    select *, sum(CLASSK) as NCIT from(
      select CITED_ID,count(*) as CLASSK from HAVE group by PATENT_CLASS, CITED_ID 
    ) group by CITED_ID
  ) group by CITED_ID;

CITED_ID CLASSK
1 2
1 3
1 1
2 1
2 1

 

CITED_ID CLASSK NCIT
1 2 6
1 3 6
1 1 6
2 1 2
2 1 2

 

CITED_ID  
1 0.611111
2 0.5

 

 

 

ChrisNZ
Tourmaline | Level 20

A much more efficient way that uses the SAS tools:

proc summary data=HAVE chartype;
  class CITED_ID PATENT_CLASS ;
  types CITED_ID CITED_ID*PATENT_CLASS;
  output out=SUM;
run;

data WANT;
  merge SUM(where=(_TYPE_='10') rename=(_FREQ_=NCIT  ))
        SUM(where=(_TYPE_='11') rename=(_FREQ_=CLASSK));
  by CITED_ID;
  SUM + (CLASSK/NCIT)**2;
  if last.CITED_ID then do;
    SCORE=1-SUM;
    output; 
    SUM=0;
  end;
run;

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 388 views
  • 0 likes
  • 2 in conversation