obs | patent class | citation_id | patent_id | year |
1 | A1 | 21 | 1 | 1990 |
2 | A1 | 31 | 1 | 1990 |
3 | B1 | 11 | 1 | 1990 |
4 | B1 | 21 | 1 | 1990 |
5 | B1 | 41 | 1 | 1990 |
6 | C1 | 11 | 1 | 1990 |
7 | A1 | 31 | 2 | 1991 |
8 | B1 | 41 | 2 | 1991 |
patent_id | year | score |
1 | 1990 | 0.61112 |
2 | 1991 | 0.5 |
Could you please give me some suggestions?
data table1; infile cards dsd dlm=","; input patent_class :$50. patent_id :$50. cited_id :$50. year :10. ; cards; A1,21,1,1990 A1,31,1,1990 B1,11,1,1990 B1,21,1,1990 B1,41,1,1990 C1,11,1,1990 A1,31,2,1991 B1,41,2,1991 ;;;; run;
Consider how the calculation is progressed:
proc sql;
select CITED_ID,count(*) as CLASSK from HAVE group by PATENT_CLASS, CITED_ID;
select *, sum(CLASSK) as NCIT from(
select CITED_ID,count(*) as CLASSK from HAVE group by PATENT_CLASS, CITED_ID
) group by CITED_ID;
select CITED_ID, 1 - sum( (CLASSK/NCIT)**2 ) from(
select *, sum(CLASSK) as NCIT from(
select CITED_ID,count(*) as CLASSK from HAVE group by PATENT_CLASS, CITED_ID
) group by CITED_ID
) group by CITED_ID;
CITED_ID | CLASSK |
---|---|
1 | 2 |
1 | 3 |
1 | 1 |
2 | 1 |
2 | 1 |
CITED_ID | CLASSK | NCIT |
---|---|---|
1 | 2 | 6 |
1 | 3 | 6 |
1 | 1 | 6 |
2 | 1 | 2 |
2 | 1 | 2 |
CITED_ID | |
---|---|
1 | 0.611111 |
2 | 0.5 |
A much more efficient way that uses the SAS tools:
proc summary data=HAVE chartype;
class CITED_ID PATENT_CLASS ;
types CITED_ID CITED_ID*PATENT_CLASS;
output out=SUM;
run;
data WANT;
merge SUM(where=(_TYPE_='10') rename=(_FREQ_=NCIT ))
SUM(where=(_TYPE_='11') rename=(_FREQ_=CLASSK));
by CITED_ID;
SUM + (CLASSK/NCIT)**2;
if last.CITED_ID then do;
SCORE=1-SUM;
output;
SUM=0;
end;
run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.