Hi all SAS Users,
Today I faced a problem that where clause did not work in my datastep
My code is
data filter;
set var_cal;
where n(ajexdi,prccd_abs_,trfd)=3 or n(prccd_abs_,prchd,prcld)=3
and
prccd_abs_ > 0
and
tpci='0'
and
prcstd=3 or (prcstd=4 and LOC='CAN') or prcstd=10
and
input(SIC, 4.) not in (4900:4949, 6000:6999)
and
raw_return<2
;
run;
I have a look on the dataset and see that in the dataset filter, return >2 still exists, could you please let me know what is wrong with my code? This is the proc means for variable "raw_return" in two files.
Warmest regards.
This:
(INPUT(SIC, 4.) not = INT(INPUT(SIC, 4.)))
is to exclude non-integers from possible list of values.
B.
Because of the precedence of AND before OR, your condition is equivalent to:
where
n(ajexdi,prccd_abs_,trfd)=3
or (
n(prccd_abs_,prchd,prcld)=3
and
prccd_abs_ > 0
and
tpci='0'
and
prcstd=3
)
or (
prcstd=4
and
LOC='CAN'
)
or (
prcstd=10
and
input(SIC, 4.) not in (4900:4949, 6000:6999)
and
raw_return<2
)
;
Check if this is what you intended.
How the brackets should be set in the where clause?? You have bot OR and AND operator there so the condition my be evaluated different that you expect it to be.
Look into the log and see how SAS resolved the condition from the where clause and if it is what you expect it to be.
For example:
data filter;
set var_cal;
where n(ajexdi,prccd_abs_,trfd)=3 or n(prccd_abs_,prchd,prcld)=3
and
prccd_abs_ > 0
and
tpci='0'
and
( prcstd=3 or (prcstd=4 and LOC='CAN') or prcstd=10 ) /* brackets setup 1*/
and
input(SIC, 4.) not in (4900:4949, 6000:6999)
and
raw_return<2
;
run;
data filter;
set var_cal;
where ( n(ajexdi,prccd_abs_,trfd)=3 or n(prccd_abs_,prchd,prcld)=3
and
prccd_abs_ > 0
and
tpci='0'
and
prcstd=3) or ( (prcstd=4 and LOC='CAN') ) or ( prcstd=10 /* brackets setup 2*/
and
input(SIC, 4.) not in (4900:4949, 6000:6999)
and
raw_return<2)
;
run;
All the best
Bart
Hi @KurtBremser and @yabwon
Thank you for your explanation, it is my fault that I forget to set the brackets reasonably. My new code is as below, it means that the new code need to satisfy all rows from "where" clause.
data filter;
set var_cal;
where (n(ajexdi,prccd_abs_,trfd)=3 or n(prccd_abs_,prchd,prcld)=3)
and
prccd_abs_ > 0
and
tpci='0'
and
(prcstd=3 or (prcstd=4 and LOC='CAN') or prcstd=10)
and
(input(SIC, 4.) not in (4900:4949, 6000:6999))
and
raw_return<2
;
run;
However, the log is quite strange to me regarding the code
(input(SIC, 4.) not in (4900:4949, 6000:6999))
The log is
NOTE: There were 8368199 observations read from the data set WORK.VAR_CAL. WHERE ((N(ajexdi, prccd_abs_, trfd)=3) or (N(prccd_abs_, prchd, prcld)=3)) and (prccd_abs_>0) and (tpci='0') and (prcstd in (3, 10) or ((prcstd=4) and (LOC='CAN'))) and (raw_return<200) and ((INPUT(SIC, 4.) not = INT(INPUT(SIC, 4.))) or not ((INPUT(SIC, 4.)>=4900 and INPUT(SIC, 4.)<=4949) or (INPUT(SIC, 4.)>=6000 and INPUT(SIC, 4.)<=6999)));
What I want to achieve for such line of code is to exclude all firms having SIC (4-digit number under character type) in (4900:4949, 6000:6999)
I do not know why the log shows this one (I did not set this condition in my filter)
(INPUT(SIC, 4.) not = INT(INPUT(SIC, 4.))) or
Can you please help me to sort it out?
Warm regards.
This:
(INPUT(SIC, 4.) not = INT(INPUT(SIC, 4.)))
is to exclude non-integers from possible list of values.
B.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.