BookmarkSubscribeRSS Feed
Marco_park
Calcite | Level 5
 
Hi,

I am trying to create a query in sas from postgresql which eliminates duplicate rows based on the value on a subset of columns. Let's assume that my database is as follows:

name var1 var2 var3
   a    1   2   10   
a
1 2 26
b
3 56 47
c
4 78 50

For my purposes I would like to drop the second row (or the first one, it doesn't really matter) in order to have the table as:

name var1 var2 var3
   a    1   2   10   
b
3 56 47
c
4 78 50

I have tried withe the following:

SELECT DISTINCT on (name, var1, var2)
name, var1, var2, var3 FROM table;

but without any success, can anyone help?

In Postgresq this would perform the distinct operation considering only the variables name, var1, var2 but keeping name,var1,var2,var3

 

Thanks

10 REPLIES 10
novinosrin
Tourmaline | Level 20

I do not know what is Postgresq

 

In SAS, you can achieve with a proc sort or data step

 

proc sort data=your_dataset nodupkey;

by name;

run;

 

Marco_park
Calcite | Level 5

Can I do that into a proc sql? I need it to be done in proc sql?

 

Thank you

novinosrin
Tourmaline | Level 20

Something like this?

 

data have;
input name $ var1 var2 var3;
cards;
   a    1   2   10   
   a    1   2   26   
   b    3   56  47   
   c    4   78  50
   ;


proc sql;
create table want as
select distinct name, var1, var2, var3
from have
group by name
having var3=min(var3);
quit;
tomcmacdonald
Quartz | Level 8

I'm used to that functionality in PostgreSQL as well and had to resort to using PROC SORT.  However it does look like it works with PROC FEDSQL:

 

PROC FEDSQL;
	DROP TABLE example FORCE;
	CREATE TABLE example (
		id INTEGER,
		col1 INTEGER
	);
	INSERT INTO example(id, col1) VALUES (1, 2);
	INSERT INTO example(id, col1) VALUES (1, 3);
	INSERT INTO example(id, col1) VALUES (1, 4);
	SELECT DISTINCT ON (id) *
	FROM example;
QUIT;

 

Result:

 

1  4

Marco_park
Calcite | Level 5

Basically I have to translate this code

 

proc sort data=my_db nodupkey;
	by var1 var2 var3 var4;
run;

in a proc sql, however the database has more then only 4 variables.

How can I do it?

MadhuKorni
Quartz | Level 8

This might work only if there are distinct values in Var3 for each group in name variable because if the values of Var3 are 10 and 10 against the data values "a" in name variable then you will get both the records.

Marco_park
Calcite | Level 5

Yes the entries in var1, var2 and var3 are all different. This is not the case in the var 4-10 but I don't care which ones will be dropped.

Is it possible to do it in a proc sql?

 

Thanks

novinosrin
Tourmaline | Level 20

@MadhuKorni No, you obviously haven't tested my code. Your thought is handled by my having clause. Please test 

MadhuKorni
Quartz | Level 8

If the first three variables of a by group are same then your code works.

What If the value in var2 of the 2nd row is changed to 1. Try executing your code with the below dataset.

data have;
input name $ var1 var2 var3;
cards;
a 1 2 10
a 1 1 10
b 3 56 47
c 4 78 50
;

novinosrin
Tourmaline | Level 20

@MadhuKorni  Fair point, that leads me to now think statistically,

 

Please test with some intuitive combinations and let me know

 

data have;
input name $ var1 var2 var3;
cards;
a 1 2 10 
a 1 1 10 
b 3 56 47 
c 4 78 50
;

proc sql;
create table want(drop=std) as
select  distinct name, var1, var2, var3,std(var1,var2,var3) as std
from have
group by name
having std=min(std)  ;
quit;

 

OK, The above may still cause a problem for values like

a 1 2 1
a 2 1 1

 

This can be fixed by weigting on of the variables multiplied with a constant. For example,

 

proc sql;
create table want(drop=std) as
select  distinct name, var1, var2, var3,std(var1*2,var2,var3) as std
from have
group by name
having min(std)=std;
quit;

 

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 3816 views
  • 0 likes
  • 4 in conversation