- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi All,
I need some help in optimizing my code.
I have two datasets, sample and history. Where the sample contains randomly selected rows from the history file.
Later I want to inner join the sample with history based on some condition.
Sample dataset:
loan_id month var1 var2 var3
1111 3 a1 b1 c1
2222 8 a2 b2 c2
3333 5 a3 b3 c3
4444 2 a4 b4 c4
History dataset:
loan_id month var1 var2 var3
1111 3 a1 b1 c1
1111 4
1111 5
2222 8 a2 b2 c2
2222 9
2222 10
2222 11
3333 5 a3 b3 c3
3333 6
3333 7
3333 8
3333 9
4444 2 a4 b4 c4
4444 3
4444 4
4444 5
4444 6
output dataset:
1111 3 a1 b1 c1
1111 4 a1 b1 c1
1111 5 a1 b1 c1
2222 8 a2 b2 c2
2222 9 a2 b2 c2
2222 10 a2 b2 c2
2222 11 a2 b2 c2
3333 5 a3 b3 c3
3333 6 a3 b3 c3
3333 7 a3 b3 c3
3333 8 a3 b3 c3
3333 9 a3 b3 c3
4444 2 a4 b4 c4
4444 3 a4 b4 c4
4444 4 a4 b4 c4
4444 5 a4 b4 c4
4444 6 a4 b4 c4
The main aim is to get all the month records from history dataset, but retain the values of var1, var2, var3 from the sample dataset itself. I have achieved this through proc sql. my code is below:
proc sql;
create table outds as
select a.loan_id, a.var1, a.var2, a.var3, b.month
from sample a, history b
where a.loan_id = b.loan_id
and a.month <= b.month
quit;
I want to optimize this, if there is any alternative way to do this in sas datastep or any other procedure in sas, as I am planning to do further analysis on the sample.
Please help me with this.
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data Sample;
input loan_id$ month var1$ var2$ var3$;
datalines;
1111 3 a1 b1 c1
2222 8 a2 b2 c2
3333 5 a3 b3 c3
4444 2 a4 b4 c4
;
data History;
input loan_id$ month var1$ var2$ var3$;
infile datalines truncover;
datalines;
1111 3 a1 b1 c1
1111 4
1111 5
2222 8 a2 b2 c2
2222 9
2222 10
2222 11
3333 5 a3 b3 c3
3333 6
3333 7
3333 8
3333 9
4444 2 a4 b4 c4
4444 3
4444 4
4444 5
4444 6
;
proc sort data=sample; by loan_id; run;
proc sort data=History; by loan_id; run;
data outds;
merge History(keep=loan_id month) Sample;
by loan_id;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I forgot to add scenario where the starting month may not be same in history and sample. That was the reason I was joining on month (a.month <= b.month)
Also the history may have many more variables... but I want to retain only var1, var2, var3
Eg:
History dataset:
loan_id month var1 var2 var3 var4 var5 var6
1111 1 l r y u d a f
1111 2 f y e t s g
1111 3 a1 b1 c1 x y z
1111 4 p q r s t u
1111 5 e f g h i j
2222 7 k l m n o p
2222 8 a2 b2 c2 x y z
2222 9
2222 10
2222 11
3333 5 a3 b3 c3 x y z
3333 6
3333 7
3333 8
3333 9
4444 1
4444 2 a4 b4 c4 x y z
4444 3
4444 4
4444 5
4444 6
The output should be:
loan_id month var1 var2 var3 var4 var5 var6
1111 3 a1 b1 c1 x y z
1111 4 a1 b1 c1 s t u
1111 5 a1 b1 c1 h i j
2222 7 a1 b1 c1 n o p
2222 8 a2 b2 c2 x y z
2222 9 a2 b2 c2
2222 10 a2 b2 c2
2222 11 a2 b2 c2
so on....
In your merge statement contains only loan_id. This may not give same results...
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@PeterClemmensen's data step needs a slight modification:
data outds;
merge
History (in=a keep=loan_id month)
Sample (in=b rename=(month=smonth))
;
by loan_id;
if a and b;
if smonth <= month;
drop smonth;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Since we are not using month in the by statement, few records are being dropped.
The counts are not matching with proc sql results.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Post your exact data in a data step (see @PeterClemmensen's example) and the sql you run against it. Use code posting windows, as described here, so that the code can easily be copy/pasted.