- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Greetings All,
I am a new SAS user and am working on code to perform multiple inner joins using proc sql (Enterprise Guide 8.1). My output table contains multiple rows for each column (code below). I would love advise on how I can remove duplicate rows, keeping only 1 of each variable 'a.TX_ID'.
proc sql;
create table TX_HR as
select distinct a.TX_ID, a.*, b.*, c.*
from SRTR.tx_hr as a left join SRTR.immuno as b
on a.TRR_ID = b.TRR_ID
left join SRTR.rec_histo as c
on a.TX_ID = c.REC_HISTO_TX_ID;
quit;
Thank you so much for your time.
Bill
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
In general you cannot limit to one observation per value of TX_ID using SQL syntax. That is just the nature of how the SQL language works.
You could add the DISTINCT keyword to your SELECT and eliminate duplicate observations.
select distinct ....
Otherwise to get down to just one observation per TX_ID you will need to use some SAS code instead of SQL code.
You could use PROC SORT with the NODUPKEY option.
proc sort data=TX_HR out=TX_HR_SUBSET nodupkey ;
by tx_id;
run;
If it matter which observation you keep you could order the observations within the values ot TX_ID by some other variables that will insure the one you want is at the top and then use data step with BY group processing.
data tx_hr_subset;
set tx_hr;
by tx_id;
if first.tx_id;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
In general you cannot limit to one observation per value of TX_ID using SQL syntax. That is just the nature of how the SQL language works.
You could add the DISTINCT keyword to your SELECT and eliminate duplicate observations.
select distinct ....
Otherwise to get down to just one observation per TX_ID you will need to use some SAS code instead of SQL code.
You could use PROC SORT with the NODUPKEY option.
proc sort data=TX_HR out=TX_HR_SUBSET nodupkey ;
by tx_id;
run;
If it matter which observation you keep you could order the observations within the values ot TX_ID by some other variables that will insure the one you want is at the top and then use data step with BY group processing.
data tx_hr_subset;
set tx_hr;
by tx_id;
if first.tx_id;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content