Hi
I am relatively new to the world of SQL and am now using SAS Studio as part of my role, so please forgive me if this is an obvious question.
I am trying to write a SQl statement that will tell me how many values in one list don't match those in another list, in this instance Cost Centres, but when i run the report it returns hundreds of thousands of rows rather than an expected 30 or 40. Can anyone signpost me as to where I am going wrong?
proc sql;
create table DQ As
select case when r.CostCentre ^= d.CostCentredq then 1
else 0
end as WrongCostCentre
from data.resources r, valibla.dq_res d;
quit;
Thanks in advance for any help
You can make this easier by doing a subquery, for example if we create a data set of records from SASHELP.CLASS of those aged under 15 like this:
data under_15;
set sashelp.class(where=(age<15));
run;
We can then get a count of all records in SASHELP.CLASS where age is 15 or over with a subquery:
proc sql;
select count(age)
from sashelp.class
where age not in(select age from under_15);
quit;
If you want to create a data set with the over 15's in then do this:
proc sql;
create table over_15
as select *
from sashelp.class
where age not in(select age from under_15);
quit;
To get good answers we need to see what your working with, so post test data in the form of a datastep.
This is just a guess:
proc sql; create table WANT as select count(COST_CENTER) from (select distinct COST_CENTER from HAVE1 except select distinct COST_CENTER from HAVE2) group by COST_CENTER; quit;
This may have been answered adequately already but I didn't see any closure...
The most direct answer to your question is that you forgot the where clause after the join.
proc sql;
create table DQ As
select
case
when r.CostCentre ^= d.CostCentredq then 1
else 0
end as WrongCostCentre
from data.resources r, valibla.dq_res d
where r.apples=d.apples AND r.oranges=d.oranges;
quit;
If you really just want the rows where WrongCostCentre =1 then add to the where clause, or add the having clause:
proc sql;
create table DQ As
select
case
when r.CostCentre ^= d.CostCentredq then 1
else 0
end as WrongCostCentre
from data.resources r, valibla.dq_res d
where r.apples=d.apples AND r.oranges=d.oranges
having WrongCostCentre=1; /* Some would consider this an abuse of the having clause */
quit;
Of course, if you only want WrongCostCentre=1 rows, you don't need the CASE statement in the query. It would just be another condition in the where clause.
I believe you are doing a cartesian comparison of the Nr cases in r vs the Nd cases in d. Even if Nr=Nd (=N) and every cost center was present in both , you'd still have N*N - N observations with a value of wrongcentre =1.
What I really think you wnat is a list of centre's present in only one of the two data sets. If so, I'd suggest this:
proc sql;
create table DQ as
select rcostcentre, costcentredq from
data.resources as r full outer join valibla.dq_res as d
on r.costcentre=d.costcentredq
except
select rcostcentre, costcentredq from
data.resources as r inner join valibla.dq_res as d
on r.costcentre=d.costcentredq;
quit;
I did a test of this with sashelp.names:
data class1 (rename=(name=name1)) class2 (rename=(name=name2));
set sashelp.class (keep=name) end=eoc;
output class1;
if name='William' then name='XXXXX';
output class2;
run;
proc sql;
create table test as
select name1, name2 from
class1 FULL JOIN class2 on name1=name2
EXCEPT
select name1, name2 from
class1 INNER JOIN class2 on name1=name2;
quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.