BookmarkSubscribeRSS Feed
hiteshchauhan1
Obsidian | Level 7

I have a dataset (lets call it VADB) which have duplicate values for the variable "Polnum" and then i am joining another table (let's call it CLF) to this table using left join. Now after joining both tables my records are getting increased by twice the duplicates that were in the VADB dataset. i want to remove these extra records which are getting added but since my origional data (VADB) already has duplicates, doing this is getting a little difficult. Ineed something that can remove the extra rows after joining and can keep only those rows which are there in my left table (VADB).

 

Using proc sort nodupkey is of no use as it eliminates all the duplicates (even those which were in the VADB) but i want those duplicates which are in my left data (VADB).

 

So basically, i want to join CLF data to VADB data and the no. of rows in this new merged data should be same as were in the VADB data (VADB data is my left data table and i am using left join).

 

Is there a way to achieve what i want?

 

Thanks.

2 REPLIES 2
ed_sas_member
Meteorite | Level 14

Hi @hiteshchauhan1 

 

Could you please share some sample data as well as the code you used?

Is the code similar to the following one?

proc sql;
	create table want as
	select coalesce(a.Polnum, b.Polnum) as Polnum, /*other variables*/
	from VADB as a left join CLF as b 
	on a.Polnum = b.Polnum;
run;

 

Best,

mkeintz
PROC Star

I think "getting rid of duplicate values" is not sufficiently specific for me to understand your request. Yes, LEFT JOIN filters out cases for which POLNUM is not present in A.  But any time you have duplicate POLNUM values in B will increase the number of output observations.  So which duplicates would you want to drop in order to meet your constraint to have N of rows=N(A)?

 

Consider the A and B below.  POLNUM's 3 and 4 have, respectively a 1-to-many and many-to-many match on POLNUM.  Which cases (i.e. which combinations of AREC,BREC) for those POLNUM's do you want to keep?

 

data a;
  do polnum=1 to 4;
    arec+1;  output;
    if mod(polnum,2)=0 then do; arec+1; output; end;
  end;
run;
data b;
  do polnum=1 to 4; 
    brec+1;   output;
    if polnum>2 then do;         brec+1; output; end;
  end;
run;

proc sql;
  create table want as
    select * 
    from A left join B 
    on a.Polnum = b.Polnum
    order by arec,brec;
quit;

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1089 views
  • 0 likes
  • 3 in conversation