BookmarkSubscribeRSS Feed
Alexxxxxxx
Pyrite | Level 9

For each application_id, I expect to 1) identify the applicant which applied the application_id and then 2) identify application_id which applied by applicants over the past 5 years.

 

By using the following data as an example,

data have;
       infile cards dsd  dlm=",";
       input
       Obs :8.
       Applicant_name :$8.
       application_date: YYMMDD10.
       application_id :8.
       ;
       format
       application_date :YYMMDD10.
       ;
       cards;
       1,A,2021-07-28,101
       2,B,2021-07-28,101
       3,A,2017-07-28,102
       4,A,2016-07-27,103
       5,B,2018-07-28,104
       ;;;;

run;

I expect to have the below result.

Obs,Applicant_name,application_date,application_id,applied_id

1,A, 2021-07-28, 101, 102

2,B, 2021-07-28, 101, 104

3,A, 2017-07-28, 102, 103

4,A, 2016-07-27, 103

 

It is because,

For application_id 101, it is applied by two applicants, which are A and B. and then I expect to include application_id 102, because it is applied by applicant A on 2017-07-28, which is within 5 years before 2021-07-28I include application_id 104 for the same reason (because 1) B applied for application 101 in 2021-07-28, 2) B also applied application 104 in 2018-07-28,  3) 2018-07-28 within the 5-years before 2021-07-28). I do not include application_id 103, because although it is applied by applicant A, it is not applied within 5 years before 2021-07-28.

 

I am using the code,

%MACRO Explo(year=);
	PROC SQL;
		CREATE TABLE want_p5y_&year. AS
		SELECT DISTINCT
		citing1.Applicant_name,
		citing1.application_date,
		citing1.application_id,
		citing2.application_id as applied_id
		FROM
		have as citing1
		left JOIN have AS citing2 ON citing1.Applicant_name = citing2.Applicant_name
		where citing1.application_id ^= citing2.application_id
		group by citing1.application_date
		having citing2.application_date between intnx('year', min(citing1.application_date), -5, 'S') and citing1.application_date 
		;
	QUIT;
%MEND Explo;
%Explo(year=2021)

and I got the result as 

Applicant_nameapplication_dateapplication_idapplied_id
A2017/7/28102103
A2021/7/28101102
B2021/7/28101104

 

The problem is

1) I only get 3 rows rather than 4 rows.

2) My dataset has 17883455 rows and 3 columns (from 1985 to 2020). It is too big for me. So I split the dataset by year. Even so, when I run the dataset with the above code, I still get the following results

ERROR: Insufficient space in file WORK.'SASTMP-000000010'n.UTILITY.
ERROR: File WORK.'SASTMP-000000010'n.UTILITY is damaged. I/O processing did not complete.
NOTE: Error was encountered during utility-file processing. You may be able to execute the SQL
      statement successfully if you allocate more space to the WORK library.

NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE SQL used (Total process time):
real time 56:41.45
cpu time 2:14.93

Could you please give me some advice to get the expected result? I guess there should be some smart methods to get the result easily and quickly. but I just do not know.

 

Many thanks in advance for any advice.

 

 

 

2 REPLIES 2
Alexxxxxxx
Pyrite | Level 9
yes, they are the same. Many thanks for your advice. I do not know how to merge these two posts. This post summarizes the question of the previous post.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 426 views
  • 0 likes
  • 2 in conversation