- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Dear Everyone,
I have the US patent data from 1926 to 2010. In each year, firms apply for patents in different technology class. I want to count the number of patents a firm applied for in each year from known and unknown technology class. Say, Firm 10001 (Permno- a unique firm identifier) applied for a patent 1000 (patnum- an unique patent identifier) in technology class 80 in year 1990 and then applied for another patent 1500 in the same technology class 80 in year 2000. Then I would assume that the firm filed for a patent in the known technology class in 2000. So the dummy variable identifying known technology class would take value 1 for patent 1500 of firm 10001 in year 2000.
Now assume that Firm 10001 applied for another patent 1600 in the technology class 90 in year 2000 for the first time in its history. So the dummy variable identifying known technology class would take value 0 for patent 1600 of firm 10001 in year 2000.
Could you please help in putting the above stated arguments in a proper SAS code? I would really appreciate your help. I have around 2 million observations in my original file from year 1926 to 2010. A firm may have multiple patent applications in each year and may not have a single application in a year.
All the variables- Year, Patnum, Permno, Class are numeric.
Sample Data:
year | Patnum | Permno | Class |
1990 | 1000 | 10001 | 80 |
1995 | 900 | 10001 | 70 |
2000 | 1500 | 10001 | 80 |
2000 | 1600 | 10001 | 90 |
1926 | 1300 | 10002 | 70 |
2000 | 1400 | 10002 | 80 |
2010 | 1200 | 10002 | 90 |
Expected Outcome:
year | Patnum | Permno | Class | Known_Dummy |
1990 | 1000 | 10001 | 80 | 0 |
1995 | 900 | 10001 | 70 | 0 |
2000 | 1500 | 10001 | 80 | 1 |
2000 | 1600 | 10001 | 90 | 0 |
1926 | 1300 | 10002 | 70 | 0 |
2000 | 1400 | 10002 | 80 | 0 |
2010 | 1200 | 10002 | 90 | 0 |
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
One approach that may work for you:
proc sort data=have; by permno class year; run; data want; set have; by permno class; if first.class then Known_dummy=0; else Known_dummy = 1; run;
Where have is the name of your data set.
When data is used with BY variables then SAS creates special values to indicate whether an observation is the first or the last of its group. You can reference then by use of First.variablename or Last.varaiblename. Notice that the dot here is NOT related to a library just in case that came to mind. The values returned by First. and Last. are 1 for true and 0 for false so they can be used with IF and any other calculations.
For added fun you might want to figure out why: Known_dummy = not(first.class); returns the same values as the IF/THEN/ELSE.
If you want to test this with a subset of your data you could use
proc sort data=have (obs=1000) out=temp; by permno class year; run; data temp; set have; by permno class; if first.class then Known_dummy=0; else Known_dummy = 1; run;
replace the 1000 with any number 1 or greater up to the number of records in the input set to select that many for testing.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Why do you not show values for "known_dummy" for Permno=10001, Class=80 and year=1990? I would expect that to be 0.
similar question for Permno=10002 class=70 and year=1926.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Dear Ballardw,
Yes. You are right. That should also be 0.
I am updating the tables.
I appreciate your feedback
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
One approach that may work for you:
proc sort data=have; by permno class year; run; data want; set have; by permno class; if first.class then Known_dummy=0; else Known_dummy = 1; run;
Where have is the name of your data set.
When data is used with BY variables then SAS creates special values to indicate whether an observation is the first or the last of its group. You can reference then by use of First.variablename or Last.varaiblename. Notice that the dot here is NOT related to a library just in case that came to mind. The values returned by First. and Last. are 1 for true and 0 for false so they can be used with IF and any other calculations.
For added fun you might want to figure out why: Known_dummy = not(first.class); returns the same values as the IF/THEN/ELSE.
If you want to test this with a subset of your data you could use
proc sort data=have (obs=1000) out=temp; by permno class year; run; data temp; set have; by permno class; if first.class then Known_dummy=0; else Known_dummy = 1; run;
replace the 1000 with any number 1 or greater up to the number of records in the input set to select that many for testing.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you so much. I really appreciate your help