DATA Step, Macro, Functions and more

PROC SQL SELECT DISTINCT

Accepted Solution Solved
Reply
Contributor
Posts: 27
Accepted Solution

PROC SQL SELECT DISTINCT

I'm attempting to use distinct for FLT.vehicle_no, but get the following from line 20: 

 

16 proc sql;
17 create table InstallDate as
18 SELECT
19 FLT.vehicle_header,
20 distinct(FLT.vehicle_no),
21 FLT.occur_date,
22 FLT.FAULT_CODE,
23 FLT.FAULT_DESCRIPTION,
24 datepart(min(FLT.occur_date)) as sw_install_date format date9.
25 /*min(FLT.occur_date) as sw_install_date format dtdate9.*/
26 FROM RMDEOAP.GETS_DW_EOA_FAULTS FLT
27 WHERE
28 FLT.vehicle_header = 'NS' and
29 datepart(FLT.occur_date) between '01Sep2016'd and datetime()
30 /*FLT.occur_date between '25Jul2017:00:00'dt and datetime()*/
31 group by FLT.vehicle_no
32 ;
NOTE: No CMP or C functions found in library work.userfuncs.
ERROR: Function DISTINCT could not be located.


Accepted Solutions
Solution
Thursday
Super User
Super User
Posts: 6,495

Re: PROC SQL SELECT DISTINCT

If you only want one observation per VEHICLE_NO then tell SQL that.

proc sql;
create table InstallDate as
  SELECT FLT.vehicle_no,
       , min(FLT.occur_date) as sw_install_date format dtdate9
  FROM RMDEOAP.GETS_DW_EOA_FAULTS FLT
  WHERE FLT.vehicle_header = 'NS'
    and datepart(FLT.occur_date) between '01Sep2016'd and today()
  group by FLT.vehicle_no
;
quit;

You can add other variables, but if they are not 1 to 1 with VEHICLE_NO then they will cause you to get more observations.

In particular if you add variables that are neither listed in the GROUP BY clause or are aggregate functions like MIN() then SAS will happlily remerge your aggregate variables back onto all of the input data and return that.

View solution in original post


All Replies
PROC Star
Posts: 162

Re: PROC SQL SELECT DISTINCT

remove the parenthesis around distinct. It's not a function

Contributor
Posts: 27

Re: PROC SQL SELECT DISTINCT

novinosrin,

Thanks for the quick response. I tried your suggestion and got the following:

 

16 proc sql;
17 create table InstallDate as
18 SELECT
19 FLT.vehicle_header,
20 distinct FLT.vehicle_no,
___
22
ERROR 22-322: Syntax error, expecting one of the following: !, !!, &, (, *, **, +, ',', -, /, <, <=, <>, =, >, >=, ?, AND, BETWEEN,
CONTAINS, EQ, EQT, FROM, GE, GET, GT, GTT, LE, LET, LIKE, LT, LTT, NE, NET, OR, ^=, |, ||, ~=.

21 FLT.occur_date,
22 FLT.FAULT_CODE,
23 FLT.FAULT_DESCRIPTION,
24 datepart(min(FLT.occur_date)) as sw_install_date format date9.
25 /*min(FLT.occur_date) as sw_install_date format dtdate9.*/
26 FROM RMDEOAP.GETS_DW_EOA_FAULTS FLT
27 WHERE
28 FLT.vehicle_header = 'NS' and
29 datepart(FLT.occur_date) between '01Sep2016'd and datetime()
30 /*FLT.occur_date between '25Jul2017:00:00'dt and datetime()*/
31 group by FLT.vehicle_no
32 ;
NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.
33 quit;

Trusted Advisor
Posts: 1,353

Re: PROC SQL SELECT DISTINCT

Distinct is used without parethasis, like:

select distinct FLT.vehicle_header,  FLT.vehicle_no, ...

PROC Star
Posts: 162

Re: PROC SQL SELECT DISTINCT

[ Edited ]

Like @Shmuel mentioned Distinct is a keyword that follows select statement "select clause" immediately. It picks unique combinations of columns that write in the select clause. 

 

For example- it should flow like

Select distinct var1, var2, var3..............

and not select var1 distinct..............

Contributor
Posts: 27

Re: PROC SQL SELECT DISTINCT

Thanks for the comment. It doesn't appear to work in this case. The indented code is below:

 

proc sql;
	create table InstallDate as
		SELECT
			FLT.vehicle_header,
			distinct FLT.vehicle_no,
			FLT.occur_date,
			FLT.FAULT_CODE,
			FLT.FAULT_DESCRIPTION,
			datepart(min(FLT.occur_date)) as sw_install_date format date9.
			/*min(FLT.occur_date) as sw_install_date format dtdate9.*/
		FROM RMDEOAP.GETS_DW_EOA_FAULTS FLT
			WHERE 
				FLT.vehicle_header = 'NS' and
	datepart(FLT.occur_date) between '01Sep2016'd and datetime()
	/*FLT.occur_date between '25Jul2017:00:00'dt and datetime()*/
	group by FLT.vehicle_no
	;
quit;
Trusted Advisor
Posts: 1,353

Re: PROC SQL SELECT DISTINCT

Does next code creates results you wnat ? If not please post an example of your data with the desired output.

proc sql;
	create table InstallDate as
		SELECT  DISTINCT
			FLT.vehicle_header,
		        FLT.vehicle_no,
			FLT.occur_date,
			FLT.FAULT_CODE,
			FLT.FAULT_DESCRIPTION,
			datepart(min(FLT.occur_date)) as sw_install_date format date9.
			/*min(FLT.occur_date) as sw_install_date format dtdate9.*/
		FROM RMDEOAP.GETS_DW_EOA_FAULTS FLT
			WHERE 
				FLT.vehicle_header = 'NS' and
	datepart(FLT.occur_date) between '01Sep2016'd and today()
	/*FLT.occur_date between '25Jul2017:00:00'dt and datetime()*/
	group by FLT.vehicle_no
	;
quit;

PAY ATTENTION:

1) to the usage of DISTINCT

2) in the WHERE you should check same type data:

     not:  datepart(FLT.occur_date) between '01Sep2016'd and datetime()

I have changed the datetime() to  today()

Contributor
Posts: 27

Re: PROC SQL SELECT DISTINCT

Thanks for your comments, however, the code below generates nearly 10M rows of data where the unique vehicle_no should be ~40. 'distinct' doesn't appear to be operating on vehicle_no. 

 

proc sql;
	create table InstallDate as
		SELECT distinct FLT.vehicle_no,
			FLT.vehicle_header,
			FLT.occur_date,
			FLT.FAULT_CODE,
			FLT.FAULT_DESCRIPTION,
			datepart(min(FLT.occur_date)) as sw_install_date format date9.
			/*min(FLT.occur_date) as sw_install_date format dtdate9.*/
		FROM RMDEOAP.GETS_DW_EOA_FAULTS FLT
			WHERE 
				FLT.vehicle_header = 'NS' and
	datepart(FLT.occur_date) between '01Sep2016'd and datetime()
	/*FLT.occur_date between '25Jul2017:00:00'dt and datetime()*/
	group by FLT.vehicle_no
	;
quit;
Solution
Thursday
Super User
Super User
Posts: 6,495

Re: PROC SQL SELECT DISTINCT

If you only want one observation per VEHICLE_NO then tell SQL that.

proc sql;
create table InstallDate as
  SELECT FLT.vehicle_no,
       , min(FLT.occur_date) as sw_install_date format dtdate9
  FROM RMDEOAP.GETS_DW_EOA_FAULTS FLT
  WHERE FLT.vehicle_header = 'NS'
    and datepart(FLT.occur_date) between '01Sep2016'd and today()
  group by FLT.vehicle_no
;
quit;

You can add other variables, but if they are not 1 to 1 with VEHICLE_NO then they will cause you to get more observations.

In particular if you add variables that are neither listed in the GROUP BY clause or are aggregate functions like MIN() then SAS will happlily remerge your aggregate variables back onto all of the input data and return that.

Contributor
Posts: 27

Re: PROC SQL SELECT DISTINCT

Thanks Tom, this is very helpful and may be a partial solution.
capam
Super User
Super User
Posts: 6,495

Re: PROC SQL SELECT DISTINCT

[ Edited ]

SELECT DISTINCT will select distinct rows. It is not something you can apply only to some of the variables in the row.

select distinct a,b,c,d from have ;

Perhaps you are confusing with the ability to COUNT(distinct ...)?

select id
     , count(distinct date) as n_dates
from have
group by 1
;

 Or perhaps you wanted to GROUP BY that variable?

create table max_per_day as
  select id, date, max(amount) as max_amount
  from have
  group by 1,2
;
Super User
Posts: 10,461

Re: PROC SQL SELECT DISTINCT

Example data and desired output for that example starting data.

Contributor
Posts: 27

Re: PROC SQL SELECT DISTINCT

Hi Ballardw,

 

Excellent comment.

 

Have table with var1, var2, var3, var4, var5.

 

var1 & var3 have limits and var3 has the minimum value output.

 

grouped by var2.

 

var2 is unique/distinct so that the minimum value of var3 is in reference to each var2.

 

Hope that is more clear.

 

Thanks again.

Super User
Posts: 10,461

Re: PROC SQL SELECT DISTINCT

An example data set works best. Without values to examine and results to see if logic is correct it is very hard to diagnose or suggest data manipulation techniques.

 

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.

 

You only need to provide enough variables and values to demonstrate the cases of operations you need. Best is to then provide a data step with the desired result.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 13 replies
  • 154 views
  • 0 likes
  • 5 in conversation