topic Re: Help with Removing Duplicate Rows from Proc SQL Inner Join in SAS Programming

Help with Removing Duplicate Rows from Proc SQL Inner Join

wbaker0621 — Fri, 10 Apr 2020 01:07:26 GMT

Greetings All,

I am a new SAS user and am working on code to perform multiple inner joins using proc sql (Enterprise Guide 8.1). My output table contains multiple rows for each column (code below). I would love advise on how I can remove duplicate rows, keeping only 1 of each variable 'a.TX_ID'.

proc sql;
	create table TX_HR as 
	select distinct a.TX_ID, a.*, b.*, c.* 
		from SRTR.tx_hr as a left join SRTR.immuno as b 
			on a.TRR_ID = b.TRR_ID
		left join SRTR.rec_histo as c
			on a.TX_ID = c.REC_HISTO_TX_ID;
quit;

Thank you so much for your time.

Bill

Re: Help with Removing Duplicate Rows from Proc SQL Inner Join

Tom — Fri, 10 Apr 2020 01:31:40 GMT

In general you cannot limit to one observation per value of TX_ID using SQL syntax. That is just the nature of how the SQL language works.

You could add the DISTINCT keyword to your SELECT and eliminate duplicate observations.

select distinct ....

Otherwise to get down to just one observation per TX_ID you will need to use some SAS code instead of SQL code.

You could use PROC SORT with the NODUPKEY option.

proc sort data=TX_HR out=TX_HR_SUBSET nodupkey ;
  by tx_id;
run;

If it matter which observation you keep you could order the observations within the values ot TX_ID by some other variables that will insure the one you want is at the top and then use data step with BY group processing.

data tx_hr_subset;
  set tx_hr;
  by tx_id;
  if first.tx_id;
run;

Re: Help with Removing Duplicate Rows from Proc SQL Inner Join

wbaker0621 — Fri, 10 Apr 2020 02:21:31 GMT

This is great, thank you so much for the quick and detailed reply.