Re: An alternative approach using sas datastep or proc

UshaLatha · Posted 11-24-2017 02:54 AM

Hi All,

I need some help in optimizing my code.

I have two datasets, sample and history. Where the sample contains randomly selected rows from the history file.

Later I want to inner join the sample with history based on some condition.

Sample dataset:

loan_id month var1 var2 var3

1111 3 a1 b1 c1

2222 8 a2 b2 c2

3333 5 a3 b3 c3

4444 2 a4 b4 c4

History dataset:

loan_id month var1 var2 var3

1111 3 a1 b1 c1

1111 4

1111 5

2222 8 a2 b2 c2

2222 9

2222 10

2222 11

3333 5 a3 b3 c3

3333 6

3333 7

3333 8

3333 9

4444 2 a4 b4 c4

4444 3

4444 4

4444 5

4444 6

output dataset:

1111 3 a1 b1 c1

1111 4 a1 b1 c1

1111 5 a1 b1 c1

2222 8 a2 b2 c2

2222 9 a2 b2 c2

2222 10 a2 b2 c2

2222 11 a2 b2 c2

3333 5 a3 b3 c3

3333 6 a3 b3 c3

3333 7 a3 b3 c3

3333 8 a3 b3 c3

3333 9 a3 b3 c3

4444 2 a4 b4 c4

4444 3 a4 b4 c4

4444 4 a4 b4 c4

4444 5 a4 b4 c4

4444 6 a4 b4 c4

The main aim is to get all the month records from history dataset, but retain the values of var1, var2, var3 from the sample dataset itself. I have achieved this through proc sql. my code is below:

proc sql;

create table outds as

select a.loan_id, a.var1, a.var2, a.var3, b.month

from sample a, history b

where a.loan_id = b.loan_id

and a.month <= b.month

quit;

I want to optimize this, if there is any alternative way to do this in sas datastep or any other procedure in sas, as I am planning to do further analysis on the sample.

Please help me with this.

Thanks

PeterClemmensen · Posted 11-24-2017 03:13 AM

data Sample;
input loan_id$ month var1$ var2$ var3$;
datalines;
1111 3 a1 b1 c1
2222 8 a2 b2 c2
3333 5 a3 b3 c3
4444 2 a4 b4 c4
;

data History;
input loan_id$ month var1$ var2$ var3$;
infile datalines truncover;
datalines;
1111 3 a1 b1 c1
1111 4
1111 5
2222 8 a2 b2 c2
2222 9 
2222 10
2222 11
3333 5 a3 b3 c3
3333 6
3333 7
3333 8
3333 9
4444 2 a4 b4 c4
4444 3
4444 4
4444 5
4444 6
;

proc sort data=sample; by loan_id; run;
proc sort data=History; by loan_id; run;

data outds;
	merge History(keep=loan_id month) Sample;
	by loan_id;
run;

The DATA to DATA Step Macro
Blog: SASnrd

UshaLatha · Posted 11-24-2017 04:10 AM

Hi,

I forgot to add scenario where the starting month may not be same in history and sample. That was the reason I was joining on month (a.month <= b.month)

Also the history may have many more variables... but I want to retain only var1, var2, var3

Eg:

History dataset:

loan_id month var1 var2 var3 var4 var5 var6

1111 1 l r y u d a f

1111 2 f y e t s g

1111 3 a1 b1 c1 x y z

1111 4 p q r s t u

1111 5 e f g h i j

2222 7 k l m n o p

2222 8 a2 b2 c2 x y z

2222 9

2222 10

2222 11

3333 5 a3 b3 c3 x y z

3333 6

3333 7

3333 8

3333 9

4444 1

4444 2 a4 b4 c4 x y z

4444 3

4444 4

4444 5

4444 6

The output should be:

loan_id month var1 var2 var3 var4 var5 var6

1111 3 a1 b1 c1 x y z

1111 4 a1 b1 c1 s t u

1111 5 a1 b1 c1 h i j

2222 7 a1 b1 c1 n o p

2222 8 a2 b2 c2 x y z

2222 9 a2 b2 c2

2222 10 a2 b2 c2

2222 11 a2 b2 c2

so on....

In your merge statement contains only loan_id. This may not give same results...

Kurt_Bremser · Posted 11-24-2017 04:28 AM

@PeterClemmensen's data step needs a slight modification:

data outds;
  merge
    History (in=a keep=loan_id month)
    Sample (in=b rename=(month=smonth))
  ;
  by loan_id;
  if a and b;
  if smonth <= month;
  drop smonth;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

UshaLatha · Posted 11-28-2017 12:36 PM

Hi,

Since we are not using month in the by statement, few records are being dropped.

The counts are not matching with proc sql results.

Kurt_Bremser · Posted 11-29-2017 02:09 AM

Post your exact data in a data step (see @PeterClemmensen's example) and the sql you run against it. Use code posting windows, as described here, so that the code can easily be copy/pasted.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

An alternative approach using sas datastep or proc