BookmarkSubscribeRSS Feed
Alexxxxxxx
Pyrite | Level 9

For each application_id, I expect to 1) identify the applicant which applied the application_id and then 2) identify application_id which applied by applicants over the past 5 years.

 

By using the following data as an example,

data have;
       infile cards dsd  dlm=",";
       input
       Obs :8.
       Applicant_name :$8.
       application_date: YYMMDD10.
       application_id :8.
       ;
       format
       application_date :YYMMDD10.
       ;
       cards;
       1,A,2021-07-28,101
       2,B,2021-07-28,101
       3,A,2017-07-28,102
       4,A,2016-07-27,103
       5,B,2018-07-28,104
       ;;;;

run;

I expect to have the below result.

Obs,Applicant_name,application_date,application_id,applied_id

1,A, 2021-07-28, 101, 102

2,B, 2021-07-28, 101, 104

3,A, 2017-07-28, 102, 103

4,A, 2016-07-27, 103

 

It is because,

For application_id 101, it is applied by two applicants, which are A and B. and then I expect to include application_id 102, because it is applied by applicant A on 2017-07-28, which is within 5 years before 2021-07-28I include application_id 104 for the same reason (because 1) B applied for application 101 in 2021-07-28, 2) B also applied application 104 in 2018-07-28,  3) 2018-07-28 within the 5-years before 2021-07-28). I do not include application_id 103, because although it is applied by applicant A, it is not applied within 5 years before 2021-07-28.

 

I am using the code,

%MACRO Explo(year=);
	PROC SQL;
		CREATE TABLE want_p5y_&year. AS
		SELECT DISTINCT
		citing1.Applicant_name,
		citing1.application_date,
		citing1.application_id,
		citing2.application_id as applied_id
		FROM
		have as citing1
		left JOIN have AS citing2 ON citing1.Applicant_name = citing2.Applicant_name
		where citing1.application_id ^= citing2.application_id
		group by citing1.application_date
		having citing2.application_date between intnx('year', min(citing1.application_date), -5, 'S') and citing1.application_date 
		;
	QUIT;
%MEND Explo;
%Explo(year=2021)

and I got the result as 

Applicant_nameapplication_dateapplication_idapplied_id
A2017/7/28102103
A2021/7/28101102
B2021/7/28101104

 

The problem is

1) I only get 3 rows rather than 4 rows.

2) My dataset has 17883455 rows and 3 columns (from 1985 to 2020). It is too big for me. So I split the dataset by year. Even so, when I run the dataset with the above code, I still get the following results

ERROR: Insufficient space in file WORK.'SASTMP-000000010'n.UTILITY.
ERROR: File WORK.'SASTMP-000000010'n.UTILITY is damaged. I/O processing did not complete.
NOTE: Error was encountered during utility-file processing. You may be able to execute the SQL
      statement successfully if you allocate more space to the WORK library.

NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE SQL used (Total process time):
real time 56:41.45
cpu time 2:14.93

Could you please give me some advice to get the expected result? I guess there should be some smart methods to get the result easily and quickly. but I just do not know.

 

Many thanks in advance for any advice.

 

 

 

2 REPLIES 2
Alexxxxxxx
Pyrite | Level 9
yes, they are the same. Many thanks for your advice. I do not know how to merge these two posts. This post summarizes the question of the previous post.

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 828 views
  • 0 likes
  • 2 in conversation