BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
wbaker0621
Fluorite | Level 6

Greetings All,

 

I am a new SAS user and am working on code to perform multiple inner joins using proc sql (Enterprise Guide 8.1). My output table contains multiple rows for each column (code below). I would love advise on how I can remove duplicate rows, keeping only 1 of each variable 'a.TX_ID'. 

 

proc sql;
	create table TX_HR as 
	select distinct a.TX_ID, a.*, b.*, c.* 
		from SRTR.tx_hr as a left join SRTR.immuno as b 
			on a.TRR_ID = b.TRR_ID
		left join SRTR.rec_histo as c
			on a.TX_ID = c.REC_HISTO_TX_ID;
quit;

Thank you so much for your time.

Bill

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

In general you cannot limit to one observation per value of TX_ID using SQL syntax.  That is just the nature of how the SQL language works.

 

You could add the DISTINCT keyword to your SELECT and eliminate duplicate observations.

select distinct ....

Otherwise to get down to just one observation per TX_ID you will need to use some SAS code instead of SQL code.

 

You could use PROC SORT with the NODUPKEY option.

proc sort data=TX_HR out=TX_HR_SUBSET nodupkey ;
  by tx_id;
run;

If it matter which observation you keep you could order the observations within the values ot TX_ID by some other variables that will insure the one you want is at the top and then use data step with BY group processing.

data tx_hr_subset;
  set tx_hr;
  by tx_id;
  if first.tx_id;
run;

 

 

 

View solution in original post

2 REPLIES 2
Tom
Super User Tom
Super User

In general you cannot limit to one observation per value of TX_ID using SQL syntax.  That is just the nature of how the SQL language works.

 

You could add the DISTINCT keyword to your SELECT and eliminate duplicate observations.

select distinct ....

Otherwise to get down to just one observation per TX_ID you will need to use some SAS code instead of SQL code.

 

You could use PROC SORT with the NODUPKEY option.

proc sort data=TX_HR out=TX_HR_SUBSET nodupkey ;
  by tx_id;
run;

If it matter which observation you keep you could order the observations within the values ot TX_ID by some other variables that will insure the one you want is at the top and then use data step with BY group processing.

data tx_hr_subset;
  set tx_hr;
  by tx_id;
  if first.tx_id;
run;

 

 

 

wbaker0621
Fluorite | Level 6
This is great, thank you so much for the quick and detailed reply.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 2395 views
  • 0 likes
  • 2 in conversation