obs | patent class | citation_id | patent_id | year |
1 | A1 | 21 | 1 | 1990 |
2 | A1 | 31 | 1 | 1990 |
3 | B1 | 11 | 1 | 1990 |
4 | B1 | 21 | 1 | 1990 |
5 | B1 | 41 | 1 | 1990 |
6 | C1 | 11 | 1 | 1990 |
7 | A1 | 31 | 2 | 1991 |
8 | B1 | 41 | 2 | 1991 |
patent_id | year | score |
1 | 1990 | 0.61112 |
2 | 1991 | 0.5 |
Could you please give me some suggestions?
data table1; infile cards dsd dlm=","; input patent_class :$50. patent_id :$50. cited_id :$50. year :10. ; cards; A1,21,1,1990 A1,31,1,1990 B1,11,1,1990 B1,21,1,1990 B1,41,1,1990 C1,11,1,1990 A1,31,2,1991 B1,41,2,1991 ;;;; run;
Consider how the calculation is progressed:
proc sql;
select CITED_ID,count(*) as CLASSK from HAVE group by PATENT_CLASS, CITED_ID;
select *, sum(CLASSK) as NCIT from(
select CITED_ID,count(*) as CLASSK from HAVE group by PATENT_CLASS, CITED_ID
) group by CITED_ID;
select CITED_ID, 1 - sum( (CLASSK/NCIT)**2 ) from(
select *, sum(CLASSK) as NCIT from(
select CITED_ID,count(*) as CLASSK from HAVE group by PATENT_CLASS, CITED_ID
) group by CITED_ID
) group by CITED_ID;
CITED_ID | CLASSK |
---|---|
1 | 2 |
1 | 3 |
1 | 1 |
2 | 1 |
2 | 1 |
CITED_ID | CLASSK | NCIT |
---|---|---|
1 | 2 | 6 |
1 | 3 | 6 |
1 | 1 | 6 |
2 | 1 | 2 |
2 | 1 | 2 |
CITED_ID | |
---|---|
1 | 0.611111 |
2 | 0.5 |
A much more efficient way that uses the SAS tools:
proc summary data=HAVE chartype;
class CITED_ID PATENT_CLASS ;
types CITED_ID CITED_ID*PATENT_CLASS;
output out=SUM;
run;
data WANT;
merge SUM(where=(_TYPE_='10') rename=(_FREQ_=NCIT ))
SUM(where=(_TYPE_='11') rename=(_FREQ_=CLASSK));
by CITED_ID;
SUM + (CLASSK/NCIT)**2;
if last.CITED_ID then do;
SCORE=1-SUM;
output;
SUM=0;
end;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.