Re: Getting rid of duplicate values after using left join

hiteshchauhan1 · Posted 02-17-2020 06:32 AM

I have a dataset (lets call it VADB) which have duplicate values for the variable "Polnum" and then i am joining another table (let's call it CLF) to this table using left join. Now after joining both tables my records are getting increased by twice the duplicates that were in the VADB dataset. i want to remove these extra records which are getting added but since my origional data (VADB) already has duplicates, doing this is getting a little difficult. Ineed something that can remove the extra rows after joining and can keep only those rows which are there in my left table (VADB).

Using proc sort nodupkey is of no use as it eliminates all the duplicates (even those which were in the VADB) but i want those duplicates which are in my left data (VADB).

So basically, i want to join CLF data to VADB data and the no. of rows in this new merged data should be same as were in the VADB data (VADB data is my left data table and i am using left join).

Is there a way to achieve what i want?

Thanks.

ed_sas_member · Posted 02-17-2020 06:37 AM

Hi @hiteshchauhan1

Could you please share some sample data as well as the code you used?

Is the code similar to the following one?

proc sql;
	create table want as
	select coalesce(a.Polnum, b.Polnum) as Polnum, /*other variables*/
	from VADB as a left join CLF as b 
	on a.Polnum = b.Polnum;
run;

Best,

mkeintz · Posted 02-17-2020 08:42 AM

I think "getting rid of duplicate values" is not sufficiently specific for me to understand your request. Yes, LEFT JOIN filters out cases for which POLNUM is not present in A. But any time you have duplicate POLNUM values in B will increase the number of output observations. So which duplicates would you want to drop in order to meet your constraint to have N of rows=N(A)?

Consider the A and B below. POLNUM's 3 and 4 have, respectively a 1-to-many and many-to-many match on POLNUM. Which cases (i.e. which combinations of AREC,BREC) for those POLNUM's do you want to keep?

data a;
  do polnum=1 to 4;
    arec+1;  output;
    if mod(polnum,2)=0 then do; arec+1; output; end;
  end;
run;
data b;
  do polnum=1 to 4; 
    brec+1;   output;
    if polnum>2 then do;         brec+1; output; end;
  end;
run;

proc sql;
  create table want as
    select * 
    from A left join B 
    on a.Polnum = b.Polnum
    order by arec,brec;
quit;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Getting rid of duplicate values after using left join

Re: Getting rid of duplicate values after using left join

Re: Getting rid of duplicate values after using left join

Registration is open