I'm attempting to use distinct for FLT.vehicle_no, but get the following from line 20:
16 proc sql;
17 create table InstallDate as
18 SELECT
19 FLT.vehicle_header,
20 distinct(FLT.vehicle_no),
21 FLT.occur_date,
22 FLT.FAULT_CODE,
23 FLT.FAULT_DESCRIPTION,
24 datepart(min(FLT.occur_date)) as sw_install_date format date9.
25 /*min(FLT.occur_date) as sw_install_date format dtdate9.*/
26 FROM RMDEOAP.GETS_DW_EOA_FAULTS FLT
27 WHERE
28 FLT.vehicle_header = 'NS' and
29 datepart(FLT.occur_date) between '01Sep2016'd and datetime()
30 /*FLT.occur_date between '25Jul2017:00:00'dt and datetime()*/
31 group by FLT.vehicle_no
32 ;
NOTE: No CMP or C functions found in library work.userfuncs.
ERROR: Function DISTINCT could not be located.
If you only want one observation per VEHICLE_NO then tell SQL that.
proc sql;
create table InstallDate as
SELECT FLT.vehicle_no,
, min(FLT.occur_date) as sw_install_date format dtdate9
FROM RMDEOAP.GETS_DW_EOA_FAULTS FLT
WHERE FLT.vehicle_header = 'NS'
and datepart(FLT.occur_date) between '01Sep2016'd and today()
group by FLT.vehicle_no
;
quit;
You can add other variables, but if they are not 1 to 1 with VEHICLE_NO then they will cause you to get more observations.
In particular if you add variables that are neither listed in the GROUP BY clause or are aggregate functions like MIN() then SAS will happlily remerge your aggregate variables back onto all of the input data and return that.
remove the parenthesis around distinct. It's not a function
novinosrin,
Thanks for the quick response. I tried your suggestion and got the following:
16 proc sql;
17 create table InstallDate as
18 SELECT
19 FLT.vehicle_header,
20 distinct FLT.vehicle_no,
___
22
ERROR 22-322: Syntax error, expecting one of the following: !, !!, &, (, *, **, +, ',', -, /, <, <=, <>, =, >, >=, ?, AND, BETWEEN,
CONTAINS, EQ, EQT, FROM, GE, GET, GT, GTT, LE, LET, LIKE, LT, LTT, NE, NET, OR, ^=, |, ||, ~=.
21 FLT.occur_date,
22 FLT.FAULT_CODE,
23 FLT.FAULT_DESCRIPTION,
24 datepart(min(FLT.occur_date)) as sw_install_date format date9.
25 /*min(FLT.occur_date) as sw_install_date format dtdate9.*/
26 FROM RMDEOAP.GETS_DW_EOA_FAULTS FLT
27 WHERE
28 FLT.vehicle_header = 'NS' and
29 datepart(FLT.occur_date) between '01Sep2016'd and datetime()
30 /*FLT.occur_date between '25Jul2017:00:00'dt and datetime()*/
31 group by FLT.vehicle_no
32 ;
NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.
33 quit;
Distinct is used without parethasis, like:
select distinct FLT.vehicle_header, FLT.vehicle_no, ...
Like @Shmuel mentioned Distinct is a keyword that follows select statement "select clause" immediately. It picks unique combinations of columns that write in the select clause.
For example- it should flow like
Select distinct var1, var2, var3..............
and not select var1 distinct..............
Thanks for the comment. It doesn't appear to work in this case. The indented code is below:
proc sql;
create table InstallDate as
SELECT
FLT.vehicle_header,
distinct FLT.vehicle_no,
FLT.occur_date,
FLT.FAULT_CODE,
FLT.FAULT_DESCRIPTION,
datepart(min(FLT.occur_date)) as sw_install_date format date9.
/*min(FLT.occur_date) as sw_install_date format dtdate9.*/
FROM RMDEOAP.GETS_DW_EOA_FAULTS FLT
WHERE
FLT.vehicle_header = 'NS' and
datepart(FLT.occur_date) between '01Sep2016'd and datetime()
/*FLT.occur_date between '25Jul2017:00:00'dt and datetime()*/
group by FLT.vehicle_no
;
quit;
Does next code creates results you wnat ? If not please post an example of your data with the desired output.
proc sql;
create table InstallDate as
SELECT DISTINCT
FLT.vehicle_header,
FLT.vehicle_no,
FLT.occur_date,
FLT.FAULT_CODE,
FLT.FAULT_DESCRIPTION,
datepart(min(FLT.occur_date)) as sw_install_date format date9.
/*min(FLT.occur_date) as sw_install_date format dtdate9.*/
FROM RMDEOAP.GETS_DW_EOA_FAULTS FLT
WHERE
FLT.vehicle_header = 'NS' and
datepart(FLT.occur_date) between '01Sep2016'd and today()
/*FLT.occur_date between '25Jul2017:00:00'dt and datetime()*/
group by FLT.vehicle_no
;
quit;
PAY ATTENTION:
1) to the usage of DISTINCT
2) in the WHERE you should check same type data:
not: datepart(FLT.occur_date) between '01Sep2016'd and datetime()
I have changed the datetime() to today()
Thanks for your comments, however, the code below generates nearly 10M rows of data where the unique vehicle_no should be ~40. 'distinct' doesn't appear to be operating on vehicle_no.
proc sql;
create table InstallDate as
SELECT distinct FLT.vehicle_no,
FLT.vehicle_header,
FLT.occur_date,
FLT.FAULT_CODE,
FLT.FAULT_DESCRIPTION,
datepart(min(FLT.occur_date)) as sw_install_date format date9.
/*min(FLT.occur_date) as sw_install_date format dtdate9.*/
FROM RMDEOAP.GETS_DW_EOA_FAULTS FLT
WHERE
FLT.vehicle_header = 'NS' and
datepart(FLT.occur_date) between '01Sep2016'd and datetime()
/*FLT.occur_date between '25Jul2017:00:00'dt and datetime()*/
group by FLT.vehicle_no
;
quit;
If you only want one observation per VEHICLE_NO then tell SQL that.
proc sql;
create table InstallDate as
SELECT FLT.vehicle_no,
, min(FLT.occur_date) as sw_install_date format dtdate9
FROM RMDEOAP.GETS_DW_EOA_FAULTS FLT
WHERE FLT.vehicle_header = 'NS'
and datepart(FLT.occur_date) between '01Sep2016'd and today()
group by FLT.vehicle_no
;
quit;
You can add other variables, but if they are not 1 to 1 with VEHICLE_NO then they will cause you to get more observations.
In particular if you add variables that are neither listed in the GROUP BY clause or are aggregate functions like MIN() then SAS will happlily remerge your aggregate variables back onto all of the input data and return that.
SELECT DISTINCT will select distinct rows. It is not something you can apply only to some of the variables in the row.
select distinct a,b,c,d from have ;
Perhaps you are confusing with the ability to COUNT(distinct ...)?
select id
, count(distinct date) as n_dates
from have
group by 1
;
Or perhaps you wanted to GROUP BY that variable?
create table max_per_day as
select id, date, max(amount) as max_amount
from have
group by 1,2
;
Example data and desired output for that example starting data.
Hi Ballardw,
Excellent comment.
Have table with var1, var2, var3, var4, var5.
var1 & var3 have limits and var3 has the minimum value output.
grouped by var2.
var2 is unique/distinct so that the minimum value of var3 is in reference to each var2.
Hope that is more clear.
Thanks again.
An example data set works best. Without values to examine and results to see if logic is correct it is very hard to diagnose or suggest data manipulation techniques.
Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.
You only need to provide enough variables and values to demonstrate the cases of operations you need. Best is to then provide a data step with the desired result.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.