BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sanjaymane7
Obsidian | Level 7

Hi,

 

I am trying to merge three data sets. First data have 1 record, second data have 2 records and third data set have 4 records.

While merging records, duplicate values gets copied in left data set.

Please suggest how to avoid duplicate values

Example –

Data1

TransactionId

Premium

ICM_DEC_20_718256

1000000

 

Data2

TransactionId

Comm%

Commission

ICM_DEC_20_718256

10

100000

ICM_DEC_20_718256

7.5

75000

 

Data3

TransactionId

Rwd %

Reward Commission

ICM_DEC_20_718256

10

100000

ICM_DEC_20_718256

7.5

75000

ICM_DEC_20_718256

5

3000

ICM_DEC_20_718256

2

10000

 

Syntax -

Data want;

Set data1 (in=a) data2 (in=b) data3 (in=c);

By TransanctionId;

If a;

Run;

 

output -

TransactionId

Premium

Comm %

Commission

Rwd %

Reward

ICM_DEC_20_718256

1000000.00

10.00

100000.00

10

100000

ICM_DEC_20_718256

1000000.00

7.50

75000.00

7.5

75000

ICM_DEC_20_718256

1000000.00

7.50

75000.00

5

3000

ICM_DEC_20_718256

1000000.00

7.50

75000.00

2

10000

 

I don't want duplicate values in premium, comm % and commission variables.

 

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User
data data1;
input TransactionId :$20. Premium;
datalines;
ICM_DEC_20_718256 1000000
;

data data2;
input TransactionId :$20. Comm_percent Commission;
datalines;
ICM_DEC_20_718256 10 100000
ICM_DEC_20_718256 7.5 75000
;

data data3;
input TransactionId :$20. Rwd_percent Reward_Commission;
datalines;
ICM_DEC_20_718256 10 100000
ICM_DEC_20_718256 7.5 75000
ICM_DEC_20_718256 5 3000
ICM_DEC_20_718256 2 10000
;

data want;
if 0 then set data1 data2 data3;
call missing(of _all_);
merge
  data1
  data2
  data3
;
by TransactionId;
run;

Please post data as data steps with datalines, so we do not have to write code to read it into datasets.

View solution in original post

3 REPLIES 3
utrocketeng
Quartz | Level 8

PROC SQL;
create table want as
SELECT
a.transactionid
,c.premium
,b.Comm%
,b.Commision
,a.rwd%
,a.reward
FROM
data3 a
left join data2 b on a.transactionid = b.transactionid and a.[reward commision] = b.commision
left join data1 c on a.transactionid = c.transactionid and a.[reward commision] = c.premium
;
quit;

Kurt_Bremser
Super User
data data1;
input TransactionId :$20. Premium;
datalines;
ICM_DEC_20_718256 1000000
;

data data2;
input TransactionId :$20. Comm_percent Commission;
datalines;
ICM_DEC_20_718256 10 100000
ICM_DEC_20_718256 7.5 75000
;

data data3;
input TransactionId :$20. Rwd_percent Reward_Commission;
datalines;
ICM_DEC_20_718256 10 100000
ICM_DEC_20_718256 7.5 75000
ICM_DEC_20_718256 5 3000
ICM_DEC_20_718256 2 10000
;

data want;
if 0 then set data1 data2 data3;
call missing(of _all_);
merge
  data1
  data2
  data3
;
by TransactionId;
run;

Please post data as data steps with datalines, so we do not have to write code to read it into datasets.

PhilC
Rhodochrosite | Level 12

I'm always outdone by Kurt, but my two cents.  This code might be use to someone learning lag and DO Whitlock loops.


data have;
  infile datalines dlm="09"x;
  input TransactionId	$17. Premium	Comm Commission	Rwd	Reward;

datalines;
ICMlag_DEClag_20lag_718256	1000000	10	100000	10	100000
ICMlag_DEClag_20lag_718256	1000000	7.5	75000	7.5	75000
ICMlag_DEClag_20lag_718256	1000000	7.5	75000	5	3000
ICMlag_DEClag_20lag_718256	1000000	7.5	75000	2	10000
;
run;

data want;
  do until (last.TransactionId);
    set have;
      by TransactionId;
    /*lags*/
      lag_Premium=lag(Premium);
      lag_Comm =lag(Comm);
      lag_Commission=lag(Commission);
    if not first.TransactionId then do;
      if lag_Premium=(Premium)       then Premium=.;
      if lag_Comm =(Comm)            then Comm=.;
      if lag_Commission=(Commission) then Commission=.;
    end;
    output;
  end;
  drop lag_:;
  stop;
run;
options missing=" ";
proc print noObs;
  var _all_;
run;