Hello all,
For each application_id, I expect to 1) identify the applicant which applied the application_id and then 2) identify application_id which applied by applicants over the past 5 years.
By using the following data as an example,
Obs, Applicant_name application_date application_id
1,A, 2021-07-28, 101
2,B, 2021-07-28, 101
3,A, 2017-07-28, 102
4,A, 2016-07-27, 103
5,B, 2018-07-28, 104
For observation application_id 101, it is applied by two applicants, which are A and B. and then I expect to include application_id 101, because it is applied by applicant A on 2017-07-28, which is within 5 years before 2021-07-28. I include application_id 104 for the same reason. I do not include application_id 103, because although it is applied by applicant A, it is not applied within 5 years before 2021-07-28.
In the end, for application_id 101, I expect to get table like below,
Obs,Applicant_name,application_date,application_id,applied_id
1,A, 2021-07-28, 101, 102
1,A, 2021-07-28, 101, 104
2,B, 2021-07-28, 101, 102
2,B, 2021-07-28, 101, 104
For application_id 102, I expect to have
Obs,Applicant_name,application_date,application_id,applied_id
3,A, 2017-07-28, 102, 103
For application_id 103, I expect to have
Obs,Applicant_name,application_date,application_id,applied_id
4,A, 2016-07-27, 103
So, in the end, I expect to have
Obs,Applicant_name,application_date,application_id,applied_id
1,A, 2021-07-28, 101, 102
1,A, 2021-07-28, 101, 104
2,B, 2021-07-28, 101, 102
2,B, 2021-07-28, 101, 104
3,A, 2017-07-28, 102, 103
4,A, 2016-07-27, 103
Could you please give me some advice about it. Many thanks in advance.
data have; infile cards dsd dlm=","; input Obs :8. Applicant_name :$8. application_date: YYMMDD10. application_id :8. ; format application_date :YYMMDD10. ; cards; 1,A,2021-07-28,101 2,B,2021-07-28,101 3,A,2017-07-28,102 4,A,2016-07-27,103 5,B,2018-07-28,104 ;;;; run;
This is very confusing. Why does applicant B match 102 in the output?
Also the code provided does not create the expected data. Please fix it.
Hi @ChrisNZ ,
Many thanks for your advice. I rewrite part of my question, and underline the modified part. Please see below.
For each application_id, I expect to 1) identify the applicant which applied the application_id and then 2) identify application_id which applied by applicants over the past 5 years.
By using the following data as an example,
Obs, Applicant_name application_date application_id
1,A, 2021-07-28, 101
2,B, 2021-07-28, 101
3,A, 2017-07-28, 102
4,A, 2016-07-27, 103
5,B, 2018-07-28, 104
For observation application_id 101, it is applied by two applicants, which are A and B. and then I expect to include application_id 101, because it is applied by applicant A on 2017-07-28, which is within 5 years before 2021-07-28. I include application_id 104 for the same reason (because 1) B applied for application 101 in 2021-07-28, 2) B also applied application 104 in 2018-07-28, 3) 2018-07-28 within the 5-years before 2021-07-28). I do not include application_id 103, because although it is applied by applicant A, it is not applied within 5 years before 2021-07-28.
In the end, for application_id 101, I expect to get table like below,
Obs,Applicant_name,application_date,application_id,applied_id
1,A, 2021-07-28, 101, 102
2,B, 2021-07-28, 101, 104
For application_id 102, I expect to have
Obs,Applicant_name,application_date,application_id,applied_id
3,A, 2017-07-28, 102, 103
For application_id 103, I expect to have
Obs,Applicant_name,application_date,application_id,applied_id
4,A, 2016-07-27, 103
So, in the end, I expect to have
Obs,Applicant_name,application_date,application_id,applied_id
1,A, 2021-07-28, 101, 102
2,B, 2021-07-28, 101, 104
3,A, 2017-07-28, 102, 103
4,A, 2016-07-27, 103
data have; infile cards dsd dlm=","; input Obs :8. Applicant_name :$8. application_date: YYMMDD10. application_id :8. ; format application_date :YYMMDD10. ; cards; 1,A,2021-07-28,101 2,B,2021-07-28,101 3,A,2017-07-28,102 4,A,2016-07-27,103 5,B,2018-07-28,104 ;;;; run;
So the expected result has now changed?
at the current stage, I am using
%MACRO Explo(year=); PROC SQL; CREATE TABLE want_p5y_&year. AS SELECT DISTINCT citing1.Applicant_name, citing1.application_date, citing1.application_id, citing2.application_id as applied_id FROM have as citing1 left JOIN have AS citing2 ON citing1.Applicant_name = citing2.Applicant_name where citing1.application_id ^= citing2.application_id group by citing1.application_date having citing2.application_date between intnx('year', min(citing1.application_date), -5, 'S') and citing1.application_date /*a 3-year moving window starting at the earliest publication date of a cited-patent family */ ; QUIT; %MEND Explo; %Explo(year=2021)
and I got the result as
Applicant_name | application_date | application_id | applied_id |
A | 2017/7/28 | 102 | 103 |
A | 2021/7/28 | 101 | 102 |
B | 2021/7/28 | 101 | 104 |
The first problem is I only get 3 rows rather than 4 rows. The second problem is this code spend around 5 hours to get the result of one year. Could you please give me some advice to get the result quicker and earlier?
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.