DATA Step, Macro, Functions and more

An alternative approach using sas datastep or proc

Reply
Contributor
Posts: 28

An alternative approach using sas datastep or proc

Hi All,

 

I need some help in optimizing my code.

I have two datasets, sample and history. Where the sample contains randomly selected rows from the history file.

Later I want to inner join the sample with history based on some condition.

 

Sample dataset:

loan_id month var1 var2 var3

1111  3   a1  b1 c1

2222  8   a2  b2 c2

3333  5   a3  b3 c3

4444  2   a4  b4 c4

 

History dataset:

loan_id month var1 var2 var3

1111  3   a1  b1 c1

1111  4   

1111  5  

2222  8   a2  b2 c2

2222  9  

2222  10  

2222  11  

3333  5   a3  b3 c3

3333  6 

3333  7 

3333  8 

3333  9 

4444  2   a4  b4 c4

4444  3  

4444  4   

4444  5  

4444  6  

 

output dataset:

1111  3  a1  b1 c1

1111  4  a1  b1 c1 

1111  5  a1  b1 c1 

2222  8  a2  b2 c2

2222  9  a2  b2 c2

2222  10 a2  b2 c2

2222  11 a2  b2 c2

3333  5  a3  b3 c3

3333  6  a3  b3 c3

3333  7  a3  b3 c3

3333  8  a3  b3 c3

3333  9  a3  b3 c3

4444  2  a4  b4 c4

4444  3  a4  b4 c4

4444  4  a4  b4 c4 

4444  5  a4  b4 c4

4444  6  a4  b4 c4

 

The main aim is to get all the month records from history dataset, but retain the values of var1, var2, var3 from the sample dataset itself. I have achieved this through proc sql. my code is below:

 

proc sql;

create table outds as

select a.loan_id, a.var1, a.var2, a.var3, b.month

from sample a, history b

where a.loan_id = b.loan_id

and a.month <= b.month

quit;

 

I want to optimize this, if there is any alternative way to do this in sas datastep or any other procedure in sas, as I am planning to do further analysis on the sample.

Please help me with this.

 

Thanks

 

 

 

PROC Star
Posts: 1,215

Re: An alternative approach using sas datastep or proc

[ Edited ]
Posted in reply to UshaLatha

 

data Sample;
input loan_id$ month var1$ var2$ var3$;
datalines;
1111 3 a1 b1 c1
2222 8 a2 b2 c2
3333 5 a3 b3 c3
4444 2 a4 b4 c4
;

data History;
input loan_id$ month var1$ var2$ var3$;
infile datalines truncover;
datalines;
1111 3 a1 b1 c1
1111 4
1111 5
2222 8 a2 b2 c2
2222 9 
2222 10
2222 11
3333 5 a3 b3 c3
3333 6
3333 7
3333 8
3333 9
4444 2 a4 b4 c4
4444 3
4444 4
4444 5
4444 6
;

proc sort data=sample; by loan_id; run;
proc sort data=History; by loan_id; run;

data outds;
	merge History(keep=loan_id month) Sample;
	by loan_id;
run;
Contributor
Posts: 28

Re: An alternative approach using sas datastep or proc

Hi,

 

I forgot to add scenario where the starting month may not be same in history and sample. That was the reason I was joining on month (a.month <= b.month)

 

Also the history may have many more variables... but I want to retain only var1, var2, var3

Eg:

History dataset:

loan_id month var1 var2 var3 var4 var5 var6

1111  1 l r y u d a f

1111  2 f y e t s g

1111  3   a1  b1 c1 x y z

1111  4   p q r s t u

1111  5   e f g h i j

2222  7   k l m n o p

2222  8   a2  b2 c2 x y z

2222  9  

2222  10  

2222  11  

3333  5   a3  b3 c3 x y z

3333  6 

3333  7 

3333  8 

3333  9 

4444  1

4444  2   a4  b4 c4 x y z

4444  3  

4444  4   

4444  5  

4444  6  

 

The output should be:

loan_id month var1 var2 var3 var4 var5 var6

1111  3   a1  b1 c1 x y z

1111  4   a1  b1 c1 s t u

1111  5   a1  b1 c1 h i j

2222  7   a1  b1 c1 n o p

2222  8   a2  b2 c2 x y z

2222  9   a2  b2 c2

2222  10  a2  b2 c2

2222  11  a2  b2 c2

so on....

 

In your merge statement contains only loan_id. This may not give same results...

Super User
Posts: 9,923

Re: An alternative approach using sas datastep or proc

[ Edited ]
Posted in reply to UshaLatha

@draycut's data step needs a slight modification:

data outds;
  merge
    History (in=a keep=loan_id month)
    Sample (in=b rename=(month=smonth))
  ;
  by loan_id;
  if a and b;
  if smonth <= month;
  drop smonth;
run;
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Contributor
Posts: 28

Re: An alternative approach using sas datastep or proc

Posted in reply to KurtBremser

Hi,

 

Since we are not using month in the by statement, few records are being dropped.

The counts are not matching with proc sql results.

Super User
Posts: 9,923

Re: An alternative approach using sas datastep or proc

Posted in reply to UshaLatha

Post your exact data in a data step (see @draycut's example) and the sql you run against it. Use code posting windows, as described here, so that the code can easily be copy/pasted.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Ask a Question
Discussion stats
  • 5 replies
  • 182 views
  • 1 like
  • 3 in conversation