Hello all,
For each application_id, I expect to 1) identify the applicant which applied the application_id and then 2) identify application_id which applied by applicants over the past 5 years.
By using the following data as an example,
Obs, Applicant_name application_date application_id
1,A, 2021-07-28, 101
2,B, 2021-07-28, 101
3,A, 2017-07-28, 102
4,A, 2016-07-27, 103
5,B, 2018-07-28, 104
For observation application_id 101, it is applied by two applicants, which are A and B. and then I expect to include application_id 101, because it is applied by applicant A on 2017-07-28, which is within 5 years before 2021-07-28. I include application_id 104 for the same reason. I do not include application_id 103, because although it is applied by applicant A, it is not applied within 5 years before 2021-07-28.
In the end, for application_id 101, I expect to get table like below,
Obs,Applicant_name,application_date,application_id,applied_id
1,A, 2021-07-28, 101, 102
1,A, 2021-07-28, 101, 104
2,B, 2021-07-28, 101, 102
2,B, 2021-07-28, 101, 104
For application_id 102, I expect to have
Obs,Applicant_name,application_date,application_id,applied_id
3,A, 2017-07-28, 102, 103
For application_id 103, I expect to have
Obs,Applicant_name,application_date,application_id,applied_id
4,A, 2016-07-27, 103
So, in the end, I expect to have
Obs,Applicant_name,application_date,application_id,applied_id
1,A, 2021-07-28, 101, 102
1,A, 2021-07-28, 101, 104
2,B, 2021-07-28, 101, 102
2,B, 2021-07-28, 101, 104
3,A, 2017-07-28, 102, 103
4,A, 2016-07-27, 103
Could you please give me some advice about it. Many thanks in advance.
data have; infile cards dsd dlm=","; input Obs :8. Applicant_name :$8. application_date: YYMMDD10. application_id :8. ; format application_date :YYMMDD10. ; cards; 1,A,2021-07-28,101 2,B,2021-07-28,101 3,A,2017-07-28,102 4,A,2016-07-27,103 5,B,2018-07-28,104 ;;;; run;
This is very confusing. Why does applicant B match 102 in the output?
Also the code provided does not create the expected data. Please fix it.
Hi @ChrisNZ ,
Many thanks for your advice. I rewrite part of my question, and underline the modified part. Please see below.
For each application_id, I expect to 1) identify the applicant which applied the application_id and then 2) identify application_id which applied by applicants over the past 5 years.
By using the following data as an example,
Obs, Applicant_name application_date application_id
1,A, 2021-07-28, 101
2,B, 2021-07-28, 101
3,A, 2017-07-28, 102
4,A, 2016-07-27, 103
5,B, 2018-07-28, 104
For observation application_id 101, it is applied by two applicants, which are A and B. and then I expect to include application_id 101, because it is applied by applicant A on 2017-07-28, which is within 5 years before 2021-07-28. I include application_id 104 for the same reason (because 1) B applied for application 101 in 2021-07-28, 2) B also applied application 104 in 2018-07-28, 3) 2018-07-28 within the 5-years before 2021-07-28). I do not include application_id 103, because although it is applied by applicant A, it is not applied within 5 years before 2021-07-28.
In the end, for application_id 101, I expect to get table like below,
Obs,Applicant_name,application_date,application_id,applied_id
1,A, 2021-07-28, 101, 102
2,B, 2021-07-28, 101, 104
For application_id 102, I expect to have
Obs,Applicant_name,application_date,application_id,applied_id
3,A, 2017-07-28, 102, 103
For application_id 103, I expect to have
Obs,Applicant_name,application_date,application_id,applied_id
4,A, 2016-07-27, 103
So, in the end, I expect to have
Obs,Applicant_name,application_date,application_id,applied_id
1,A, 2021-07-28, 101, 102
2,B, 2021-07-28, 101, 104
3,A, 2017-07-28, 102, 103
4,A, 2016-07-27, 103
data have; infile cards dsd dlm=","; input Obs :8. Applicant_name :$8. application_date: YYMMDD10. application_id :8. ; format application_date :YYMMDD10. ; cards; 1,A,2021-07-28,101 2,B,2021-07-28,101 3,A,2017-07-28,102 4,A,2016-07-27,103 5,B,2018-07-28,104 ;;;; run;
So the expected result has now changed?
at the current stage, I am using
%MACRO Explo(year=); PROC SQL; CREATE TABLE want_p5y_&year. AS SELECT DISTINCT citing1.Applicant_name, citing1.application_date, citing1.application_id, citing2.application_id as applied_id FROM have as citing1 left JOIN have AS citing2 ON citing1.Applicant_name = citing2.Applicant_name where citing1.application_id ^= citing2.application_id group by citing1.application_date having citing2.application_date between intnx('year', min(citing1.application_date), -5, 'S') and citing1.application_date /*a 3-year moving window starting at the earliest publication date of a cited-patent family */ ; QUIT; %MEND Explo; %Explo(year=2021)
and I got the result as
Applicant_name | application_date | application_id | applied_id |
A | 2017/7/28 | 102 | 103 |
A | 2021/7/28 | 101 | 102 |
B | 2021/7/28 | 101 | 104 |
The first problem is I only get 3 rows rather than 4 rows. The second problem is this code spend around 5 hours to get the result of one year. Could you please give me some advice to get the result quicker and earlier?
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.