Solved: Help with Removing Duplicate Rows from Proc SQL Inner Join

wbaker0621 · Posted 04-09-2020 09:07 PM

Greetings All,

I am a new SAS user and am working on code to perform multiple inner joins using proc sql (Enterprise Guide 8.1). My output table contains multiple rows for each column (code below). I would love advise on how I can remove duplicate rows, keeping only 1 of each variable 'a.TX_ID'.

proc sql;
	create table TX_HR as 
	select distinct a.TX_ID, a.*, b.*, c.* 
		from SRTR.tx_hr as a left join SRTR.immuno as b 
			on a.TRR_ID = b.TRR_ID
		left join SRTR.rec_histo as c
			on a.TX_ID = c.REC_HISTO_TX_ID;
quit;

Thank you so much for your time.

Bill

Tom · Posted 04-09-2020 09:31 PM

In general you cannot limit to one observation per value of TX_ID using SQL syntax. That is just the nature of how the SQL language works.

You could add the DISTINCT keyword to your SELECT and eliminate duplicate observations.

select distinct ....

Otherwise to get down to just one observation per TX_ID you will need to use some SAS code instead of SQL code.

You could use PROC SORT with the NODUPKEY option.

proc sort data=TX_HR out=TX_HR_SUBSET nodupkey ;
  by tx_id;
run;

If it matter which observation you keep you could order the observations within the values ot TX_ID by some other variables that will insure the one you want is at the top and then use data step with BY group processing.

data tx_hr_subset;
  set tx_hr;
  by tx_id;
  if first.tx_id;
run;

View solution in original post

Tom · Posted 04-09-2020 09:31 PM

In general you cannot limit to one observation per value of TX_ID using SQL syntax. That is just the nature of how the SQL language works.

You could add the DISTINCT keyword to your SELECT and eliminate duplicate observations.

select distinct ....

Otherwise to get down to just one observation per TX_ID you will need to use some SAS code instead of SQL code.

You could use PROC SORT with the NODUPKEY option.

proc sort data=TX_HR out=TX_HR_SUBSET nodupkey ;
  by tx_id;
run;

If it matter which observation you keep you could order the observations within the values ot TX_ID by some other variables that will insure the one you want is at the top and then use data step with BY group processing.

data tx_hr_subset;
  set tx_hr;
  by tx_id;
  if first.tx_id;
run;

wbaker0621 · Posted 04-09-2020 10:21 PM

This is great, thank you so much for the quick and detailed reply.

Help with Removing Duplicate Rows from Proc SQL Inner Join

Re: Help with Removing Duplicate Rows from Proc SQL Inner Join

Re: Help with Removing Duplicate Rows from Proc SQL Inner Join

Re: Help with Removing Duplicate Rows from Proc SQL Inner Join

Help with Removing Duplicate Rows from Proc SQL Inner Join

Re: Help with Removing Duplicate Rows from Proc SQL Inner Join

Re: Help with Removing Duplicate Rows from Proc SQL Inner Join

Re: Help with Removing Duplicate Rows from Proc SQL Inner Join

Click image to register for webinar

Classroom Training Available!