BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
capam
Pyrite | Level 9

I'm attempting to use distinct for FLT.vehicle_no, but get the following from line 20: 

 

16 proc sql;
17 create table InstallDate as
18 SELECT
19 FLT.vehicle_header,
20 distinct(FLT.vehicle_no),
21 FLT.occur_date,
22 FLT.FAULT_CODE,
23 FLT.FAULT_DESCRIPTION,
24 datepart(min(FLT.occur_date)) as sw_install_date format date9.
25 /*min(FLT.occur_date) as sw_install_date format dtdate9.*/
26 FROM RMDEOAP.GETS_DW_EOA_FAULTS FLT
27 WHERE
28 FLT.vehicle_header = 'NS' and
29 datepart(FLT.occur_date) between '01Sep2016'd and datetime()
30 /*FLT.occur_date between '25Jul2017:00:00'dt and datetime()*/
31 group by FLT.vehicle_no
32 ;
NOTE: No CMP or C functions found in library work.userfuncs.
ERROR: Function DISTINCT could not be located.

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

If you only want one observation per VEHICLE_NO then tell SQL that.

proc sql;
create table InstallDate as
  SELECT FLT.vehicle_no,
       , min(FLT.occur_date) as sw_install_date format dtdate9
  FROM RMDEOAP.GETS_DW_EOA_FAULTS FLT
  WHERE FLT.vehicle_header = 'NS'
    and datepart(FLT.occur_date) between '01Sep2016'd and today()
  group by FLT.vehicle_no
;
quit;

You can add other variables, but if they are not 1 to 1 with VEHICLE_NO then they will cause you to get more observations.

In particular if you add variables that are neither listed in the GROUP BY clause or are aggregate functions like MIN() then SAS will happlily remerge your aggregate variables back onto all of the input data and return that.

View solution in original post

13 REPLIES 13
novinosrin
Tourmaline | Level 20

remove the parenthesis around distinct. It's not a function

capam
Pyrite | Level 9

novinosrin,

Thanks for the quick response. I tried your suggestion and got the following:

 

16 proc sql;
17 create table InstallDate as
18 SELECT
19 FLT.vehicle_header,
20 distinct FLT.vehicle_no,
___
22
ERROR 22-322: Syntax error, expecting one of the following: !, !!, &, (, *, **, +, ',', -, /, <, <=, <>, =, >, >=, ?, AND, BETWEEN,
CONTAINS, EQ, EQT, FROM, GE, GET, GT, GTT, LE, LET, LIKE, LT, LTT, NE, NET, OR, ^=, |, ||, ~=.

21 FLT.occur_date,
22 FLT.FAULT_CODE,
23 FLT.FAULT_DESCRIPTION,
24 datepart(min(FLT.occur_date)) as sw_install_date format date9.
25 /*min(FLT.occur_date) as sw_install_date format dtdate9.*/
26 FROM RMDEOAP.GETS_DW_EOA_FAULTS FLT
27 WHERE
28 FLT.vehicle_header = 'NS' and
29 datepart(FLT.occur_date) between '01Sep2016'd and datetime()
30 /*FLT.occur_date between '25Jul2017:00:00'dt and datetime()*/
31 group by FLT.vehicle_no
32 ;
NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.
33 quit;

Shmuel
Garnet | Level 18

Distinct is used without parethasis, like:

select distinct FLT.vehicle_header,  FLT.vehicle_no, ...

novinosrin
Tourmaline | Level 20

Like @Shmuel mentioned Distinct is a keyword that follows select statement "select clause" immediately. It picks unique combinations of columns that write in the select clause. 

 

For example- it should flow like

Select distinct var1, var2, var3..............

and not select var1 distinct..............

capam
Pyrite | Level 9

Thanks for the comment. It doesn't appear to work in this case. The indented code is below:

 

proc sql;
	create table InstallDate as
		SELECT
			FLT.vehicle_header,
			distinct FLT.vehicle_no,
			FLT.occur_date,
			FLT.FAULT_CODE,
			FLT.FAULT_DESCRIPTION,
			datepart(min(FLT.occur_date)) as sw_install_date format date9.
			/*min(FLT.occur_date) as sw_install_date format dtdate9.*/
		FROM RMDEOAP.GETS_DW_EOA_FAULTS FLT
			WHERE 
				FLT.vehicle_header = 'NS' and
	datepart(FLT.occur_date) between '01Sep2016'd and datetime()
	/*FLT.occur_date between '25Jul2017:00:00'dt and datetime()*/
	group by FLT.vehicle_no
	;
quit;
Shmuel
Garnet | Level 18

Does next code creates results you wnat ? If not please post an example of your data with the desired output.

proc sql;
	create table InstallDate as
		SELECT  DISTINCT
			FLT.vehicle_header,
		        FLT.vehicle_no,
			FLT.occur_date,
			FLT.FAULT_CODE,
			FLT.FAULT_DESCRIPTION,
			datepart(min(FLT.occur_date)) as sw_install_date format date9.
			/*min(FLT.occur_date) as sw_install_date format dtdate9.*/
		FROM RMDEOAP.GETS_DW_EOA_FAULTS FLT
			WHERE 
				FLT.vehicle_header = 'NS' and
	datepart(FLT.occur_date) between '01Sep2016'd and today()
	/*FLT.occur_date between '25Jul2017:00:00'dt and datetime()*/
	group by FLT.vehicle_no
	;
quit;

PAY ATTENTION:

1) to the usage of DISTINCT

2) in the WHERE you should check same type data:

     not:  datepart(FLT.occur_date) between '01Sep2016'd and datetime()

I have changed the datetime() to  today()

capam
Pyrite | Level 9

Thanks for your comments, however, the code below generates nearly 10M rows of data where the unique vehicle_no should be ~40. 'distinct' doesn't appear to be operating on vehicle_no. 

 

proc sql;
	create table InstallDate as
		SELECT distinct FLT.vehicle_no,
			FLT.vehicle_header,
			FLT.occur_date,
			FLT.FAULT_CODE,
			FLT.FAULT_DESCRIPTION,
			datepart(min(FLT.occur_date)) as sw_install_date format date9.
			/*min(FLT.occur_date) as sw_install_date format dtdate9.*/
		FROM RMDEOAP.GETS_DW_EOA_FAULTS FLT
			WHERE 
				FLT.vehicle_header = 'NS' and
	datepart(FLT.occur_date) between '01Sep2016'd and datetime()
	/*FLT.occur_date between '25Jul2017:00:00'dt and datetime()*/
	group by FLT.vehicle_no
	;
quit;
Tom
Super User Tom
Super User

If you only want one observation per VEHICLE_NO then tell SQL that.

proc sql;
create table InstallDate as
  SELECT FLT.vehicle_no,
       , min(FLT.occur_date) as sw_install_date format dtdate9
  FROM RMDEOAP.GETS_DW_EOA_FAULTS FLT
  WHERE FLT.vehicle_header = 'NS'
    and datepart(FLT.occur_date) between '01Sep2016'd and today()
  group by FLT.vehicle_no
;
quit;

You can add other variables, but if they are not 1 to 1 with VEHICLE_NO then they will cause you to get more observations.

In particular if you add variables that are neither listed in the GROUP BY clause or are aggregate functions like MIN() then SAS will happlily remerge your aggregate variables back onto all of the input data and return that.

capam
Pyrite | Level 9
Thanks Tom, this is very helpful and may be a partial solution.
capam
Tom
Super User Tom
Super User

SELECT DISTINCT will select distinct rows. It is not something you can apply only to some of the variables in the row.

select distinct a,b,c,d from have ;

Perhaps you are confusing with the ability to COUNT(distinct ...)?

select id
     , count(distinct date) as n_dates
from have
group by 1
;

 Or perhaps you wanted to GROUP BY that variable?

create table max_per_day as
  select id, date, max(amount) as max_amount
  from have
  group by 1,2
;
ballardw
Super User

Example data and desired output for that example starting data.

capam
Pyrite | Level 9

Hi Ballardw,

 

Excellent comment.

 

Have table with var1, var2, var3, var4, var5.

 

var1 & var3 have limits and var3 has the minimum value output.

 

grouped by var2.

 

var2 is unique/distinct so that the minimum value of var3 is in reference to each var2.

 

Hope that is more clear.

 

Thanks again.

ballardw
Super User

An example data set works best. Without values to examine and results to see if logic is correct it is very hard to diagnose or suggest data manipulation techniques.

 

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.

 

You only need to provide enough variables and values to demonstrate the cases of operations you need. Best is to then provide a data step with the desired result.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 13 replies
  • 91410 views
  • 1 like
  • 5 in conversation