Hi,
I am trying to merge three data sets. First data have 1 record, second data have 2 records and third data set have 4 records.
While merging records, duplicate values gets copied in left data set.
Please suggest how to avoid duplicate values
Example –
Data1
TransactionId |
Premium |
ICM_DEC_20_718256 |
1000000 |
Data2
TransactionId |
Comm% |
Commission |
ICM_DEC_20_718256 |
10 |
100000 |
ICM_DEC_20_718256 |
7.5 |
75000 |
Data3
TransactionId |
Rwd % |
Reward Commission |
ICM_DEC_20_718256 |
10 |
100000 |
ICM_DEC_20_718256 |
7.5 |
75000 |
ICM_DEC_20_718256 |
5 |
3000 |
ICM_DEC_20_718256 |
2 |
10000 |
Syntax -
Data want;
Set data1 (in=a) data2 (in=b) data3 (in=c);
By TransanctionId;
If a;
Run;
output -
TransactionId |
Premium |
Comm % |
Commission |
Rwd % |
Reward |
ICM_DEC_20_718256 |
1000000.00 |
10.00 |
100000.00 |
10 |
100000 |
ICM_DEC_20_718256 |
1000000.00 |
7.50 |
75000.00 |
7.5 |
75000 |
ICM_DEC_20_718256 |
1000000.00 |
7.50 |
75000.00 |
5 |
3000 |
ICM_DEC_20_718256 |
1000000.00 |
7.50 |
75000.00 |
2 |
10000 |
I don't want duplicate values in premium, comm % and commission variables.
data data1;
input TransactionId :$20. Premium;
datalines;
ICM_DEC_20_718256 1000000
;
data data2;
input TransactionId :$20. Comm_percent Commission;
datalines;
ICM_DEC_20_718256 10 100000
ICM_DEC_20_718256 7.5 75000
;
data data3;
input TransactionId :$20. Rwd_percent Reward_Commission;
datalines;
ICM_DEC_20_718256 10 100000
ICM_DEC_20_718256 7.5 75000
ICM_DEC_20_718256 5 3000
ICM_DEC_20_718256 2 10000
;
data want;
if 0 then set data1 data2 data3;
call missing(of _all_);
merge
data1
data2
data3
;
by TransactionId;
run;
Please post data as data steps with datalines, so we do not have to write code to read it into datasets.
PROC SQL;
create table want as
SELECT
a.transactionid
,c.premium
,b.Comm%
,b.Commision
,a.rwd%
,a.reward
FROM
data3 a
left join data2 b on a.transactionid = b.transactionid and a.[reward commision] = b.commision
left join data1 c on a.transactionid = c.transactionid and a.[reward commision] = c.premium
;
quit;
data data1;
input TransactionId :$20. Premium;
datalines;
ICM_DEC_20_718256 1000000
;
data data2;
input TransactionId :$20. Comm_percent Commission;
datalines;
ICM_DEC_20_718256 10 100000
ICM_DEC_20_718256 7.5 75000
;
data data3;
input TransactionId :$20. Rwd_percent Reward_Commission;
datalines;
ICM_DEC_20_718256 10 100000
ICM_DEC_20_718256 7.5 75000
ICM_DEC_20_718256 5 3000
ICM_DEC_20_718256 2 10000
;
data want;
if 0 then set data1 data2 data3;
call missing(of _all_);
merge
data1
data2
data3
;
by TransactionId;
run;
Please post data as data steps with datalines, so we do not have to write code to read it into datasets.
I'm always outdone by Kurt, but my two cents. This code might be use to someone learning lag and DO Whitlock loops.
data have;
infile datalines dlm="09"x;
input TransactionId $17. Premium Comm Commission Rwd Reward;
datalines;
ICMlag_DEClag_20lag_718256 1000000 10 100000 10 100000
ICMlag_DEClag_20lag_718256 1000000 7.5 75000 7.5 75000
ICMlag_DEClag_20lag_718256 1000000 7.5 75000 5 3000
ICMlag_DEClag_20lag_718256 1000000 7.5 75000 2 10000
;
run;
data want;
do until (last.TransactionId);
set have;
by TransactionId;
/*lags*/
lag_Premium=lag(Premium);
lag_Comm =lag(Comm);
lag_Commission=lag(Commission);
if not first.TransactionId then do;
if lag_Premium=(Premium) then Premium=.;
if lag_Comm =(Comm) then Comm=.;
if lag_Commission=(Commission) then Commission=.;
end;
output;
end;
drop lag_:;
stop;
run;
options missing=" ";
proc print noObs;
var _all_;
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Early bird rate extended! Save $200 when you sign up by March 31.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.