BookmarkSubscribeRSS Feed
UshaLatha
Obsidian | Level 7

Hi All,

 

I need some help in optimizing my code.

I have two datasets, sample and history. Where the sample contains randomly selected rows from the history file.

Later I want to inner join the sample with history based on some condition.

 

Sample dataset:

loan_id month var1 var2 var3

1111  3   a1  b1 c1

2222  8   a2  b2 c2

3333  5   a3  b3 c3

4444  2   a4  b4 c4

 

History dataset:

loan_id month var1 var2 var3

1111  3   a1  b1 c1

1111  4   

1111  5  

2222  8   a2  b2 c2

2222  9  

2222  10  

2222  11  

3333  5   a3  b3 c3

3333  6 

3333  7 

3333  8 

3333  9 

4444  2   a4  b4 c4

4444  3  

4444  4   

4444  5  

4444  6  

 

output dataset:

1111  3  a1  b1 c1

1111  4  a1  b1 c1 

1111  5  a1  b1 c1 

2222  8  a2  b2 c2

2222  9  a2  b2 c2

2222  10 a2  b2 c2

2222  11 a2  b2 c2

3333  5  a3  b3 c3

3333  6  a3  b3 c3

3333  7  a3  b3 c3

3333  8  a3  b3 c3

3333  9  a3  b3 c3

4444  2  a4  b4 c4

4444  3  a4  b4 c4

4444  4  a4  b4 c4 

4444  5  a4  b4 c4

4444  6  a4  b4 c4

 

The main aim is to get all the month records from history dataset, but retain the values of var1, var2, var3 from the sample dataset itself. I have achieved this through proc sql. my code is below:

 

proc sql;

create table outds as

select a.loan_id, a.var1, a.var2, a.var3, b.month

from sample a, history b

where a.loan_id = b.loan_id

and a.month <= b.month

quit;

 

I want to optimize this, if there is any alternative way to do this in sas datastep or any other procedure in sas, as I am planning to do further analysis on the sample.

Please help me with this.

 

Thanks

 

 

 

5 REPLIES 5
PeterClemmensen
Tourmaline | Level 20

 

data Sample;
input loan_id$ month var1$ var2$ var3$;
datalines;
1111 3 a1 b1 c1
2222 8 a2 b2 c2
3333 5 a3 b3 c3
4444 2 a4 b4 c4
;

data History;
input loan_id$ month var1$ var2$ var3$;
infile datalines truncover;
datalines;
1111 3 a1 b1 c1
1111 4
1111 5
2222 8 a2 b2 c2
2222 9 
2222 10
2222 11
3333 5 a3 b3 c3
3333 6
3333 7
3333 8
3333 9
4444 2 a4 b4 c4
4444 3
4444 4
4444 5
4444 6
;

proc sort data=sample; by loan_id; run;
proc sort data=History; by loan_id; run;

data outds;
	merge History(keep=loan_id month) Sample;
	by loan_id;
run;
UshaLatha
Obsidian | Level 7

Hi,

 

I forgot to add scenario where the starting month may not be same in history and sample. That was the reason I was joining on month (a.month <= b.month)

 

Also the history may have many more variables... but I want to retain only var1, var2, var3

Eg:

History dataset:

loan_id month var1 var2 var3 var4 var5 var6

1111  1 l r y u d a f

1111  2 f y e t s g

1111  3   a1  b1 c1 x y z

1111  4   p q r s t u

1111  5   e f g h i j

2222  7   k l m n o p

2222  8   a2  b2 c2 x y z

2222  9  

2222  10  

2222  11  

3333  5   a3  b3 c3 x y z

3333  6 

3333  7 

3333  8 

3333  9 

4444  1

4444  2   a4  b4 c4 x y z

4444  3  

4444  4   

4444  5  

4444  6  

 

The output should be:

loan_id month var1 var2 var3 var4 var5 var6

1111  3   a1  b1 c1 x y z

1111  4   a1  b1 c1 s t u

1111  5   a1  b1 c1 h i j

2222  7   a1  b1 c1 n o p

2222  8   a2  b2 c2 x y z

2222  9   a2  b2 c2

2222  10  a2  b2 c2

2222  11  a2  b2 c2

so on....

 

In your merge statement contains only loan_id. This may not give same results...

Kurt_Bremser
Super User

@PeterClemmensen's data step needs a slight modification:

data outds;
  merge
    History (in=a keep=loan_id month)
    Sample (in=b rename=(month=smonth))
  ;
  by loan_id;
  if a and b;
  if smonth <= month;
  drop smonth;
run;
UshaLatha
Obsidian | Level 7

Hi,

 

Since we are not using month in the by statement, few records are being dropped.

The counts are not matching with proc sql results.

Kurt_Bremser
Super User

Post your exact data in a data step (see @PeterClemmensen's example) and the sql you run against it. Use code posting windows, as described here, so that the code can easily be copy/pasted.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 858 views
  • 1 like
  • 3 in conversation