Hi
I am relatively new to the world of SQL and am now using SAS Studio as part of my role, so please forgive me if this is an obvious question.
I am trying to write a SQl statement that will tell me how many values in one list don't match those in another list, in this instance Cost Centres, but when i run the report it returns hundreds of thousands of rows rather than an expected 30 or 40. Can anyone signpost me as to where I am going wrong?
proc sql;
create table DQ As
select case when r.CostCentre ^= d.CostCentredq then 1
else 0
end as WrongCostCentre
from data.resources r, valibla.dq_res d;
quit;
Thanks in advance for any help
You can make this easier by doing a subquery, for example if we create a data set of records from SASHELP.CLASS of those aged under 15 like this:
data under_15;
set sashelp.class(where=(age<15));
run;
We can then get a count of all records in SASHELP.CLASS where age is 15 or over with a subquery:
proc sql;
select count(age)
from sashelp.class
where age not in(select age from under_15);
quit;
If you want to create a data set with the over 15's in then do this:
proc sql;
create table over_15
as select *
from sashelp.class
where age not in(select age from under_15);
quit;
To get good answers we need to see what your working with, so post test data in the form of a datastep.
This is just a guess:
proc sql; create table WANT as select count(COST_CENTER) from (select distinct COST_CENTER from HAVE1 except select distinct COST_CENTER from HAVE2) group by COST_CENTER; quit;
This may have been answered adequately already but I didn't see any closure...
The most direct answer to your question is that you forgot the where clause after the join.
proc sql;
create table DQ As
select
case
when r.CostCentre ^= d.CostCentredq then 1
else 0
end as WrongCostCentre
from data.resources r, valibla.dq_res d
where r.apples=d.apples AND r.oranges=d.oranges;
quit;
If you really just want the rows where WrongCostCentre =1 then add to the where clause, or add the having clause:
proc sql;
create table DQ As
select
case
when r.CostCentre ^= d.CostCentredq then 1
else 0
end as WrongCostCentre
from data.resources r, valibla.dq_res d
where r.apples=d.apples AND r.oranges=d.oranges
having WrongCostCentre=1; /* Some would consider this an abuse of the having clause */
quit;
Of course, if you only want WrongCostCentre=1 rows, you don't need the CASE statement in the query. It would just be another condition in the where clause.
I believe you are doing a cartesian comparison of the Nr cases in r vs the Nd cases in d. Even if Nr=Nd (=N) and every cost center was present in both , you'd still have N*N - N observations with a value of wrongcentre =1.
What I really think you wnat is a list of centre's present in only one of the two data sets. If so, I'd suggest this:
proc sql;
create table DQ as
select rcostcentre, costcentredq from
data.resources as r full outer join valibla.dq_res as d
on r.costcentre=d.costcentredq
except
select rcostcentre, costcentredq from
data.resources as r inner join valibla.dq_res as d
on r.costcentre=d.costcentredq;
quit;
I did a test of this with sashelp.names:
data class1 (rename=(name=name1)) class2 (rename=(name=name2));
set sashelp.class (keep=name) end=eoc;
output class1;
if name='William' then name='XXXXX';
output class2;
run;
proc sql;
create table test as
select name1, name2 from
class1 FULL JOIN class2 on name1=name2
EXCEPT
select name1, name2 from
class1 INNER JOIN class2 on name1=name2;
quit;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.