BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
wbaker0621
Fluorite | Level 6

Greetings All,

 

I am a new SAS user and am working on code to perform multiple inner joins using proc sql (Enterprise Guide 8.1). My output table contains multiple rows for each column (code below). I would love advise on how I can remove duplicate rows, keeping only 1 of each variable 'a.TX_ID'. 

 

proc sql;
	create table TX_HR as 
	select distinct a.TX_ID, a.*, b.*, c.* 
		from SRTR.tx_hr as a left join SRTR.immuno as b 
			on a.TRR_ID = b.TRR_ID
		left join SRTR.rec_histo as c
			on a.TX_ID = c.REC_HISTO_TX_ID;
quit;

Thank you so much for your time.

Bill

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

In general you cannot limit to one observation per value of TX_ID using SQL syntax.  That is just the nature of how the SQL language works.

 

You could add the DISTINCT keyword to your SELECT and eliminate duplicate observations.

select distinct ....

Otherwise to get down to just one observation per TX_ID you will need to use some SAS code instead of SQL code.

 

You could use PROC SORT with the NODUPKEY option.

proc sort data=TX_HR out=TX_HR_SUBSET nodupkey ;
  by tx_id;
run;

If it matter which observation you keep you could order the observations within the values ot TX_ID by some other variables that will insure the one you want is at the top and then use data step with BY group processing.

data tx_hr_subset;
  set tx_hr;
  by tx_id;
  if first.tx_id;
run;

 

 

 

View solution in original post

2 REPLIES 2
Tom
Super User Tom
Super User

In general you cannot limit to one observation per value of TX_ID using SQL syntax.  That is just the nature of how the SQL language works.

 

You could add the DISTINCT keyword to your SELECT and eliminate duplicate observations.

select distinct ....

Otherwise to get down to just one observation per TX_ID you will need to use some SAS code instead of SQL code.

 

You could use PROC SORT with the NODUPKEY option.

proc sort data=TX_HR out=TX_HR_SUBSET nodupkey ;
  by tx_id;
run;

If it matter which observation you keep you could order the observations within the values ot TX_ID by some other variables that will insure the one you want is at the top and then use data step with BY group processing.

data tx_hr_subset;
  set tx_hr;
  by tx_id;
  if first.tx_id;
run;

 

 

 

wbaker0621
Fluorite | Level 6
This is great, thank you so much for the quick and detailed reply.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 2657 views
  • 0 likes
  • 2 in conversation