BookmarkSubscribeRSS Feed
andrew928
Calcite | Level 5

Hi

 

I am relatively new to the world of SQL and am now using SAS Studio as part of my role, so please forgive me if this is an obvious question.

 

I am trying to write a SQl statement that will tell me how many values in one list don't match those in another list, in this instance Cost Centres, but when i run the report it returns hundreds of thousands of rows rather than an expected 30 or 40. Can anyone signpost me as to where I am going wrong?

 

proc sql;

 

create table DQ As

select case when r.CostCentre ^= d.CostCentredq then 1
else 0
end as WrongCostCentre

from data.resources r, valibla.dq_res d;

 

quit;

 

Thanks in advance for any help

4 REPLIES 4
ChrisBrooks
Ammonite | Level 13

You can make this easier by doing a subquery, for example if we create a data set of records from SASHELP.CLASS of those aged under 15 like this:

 

data under_15;
	set sashelp.class(where=(age<15));
run;

We can then get a count of all records in SASHELP.CLASS where age is 15 or over with a subquery:

 

proc sql;
	select count(age)
	from sashelp.class
	where age not in(select age from under_15);
quit;

If you want to create a data set with the over 15's in then do this:

 

proc sql;
	create table over_15
	as select *
	from sashelp.class
	where age not in(select age from under_15);
quit;
	
RW9
Diamond | Level 26 RW9
Diamond | Level 26

To get good answers we need to see what your working with, so post test data in the form of a datastep.

This is just a guess:

proc sql;
  create table WANT as
  select  count(COST_CENTER)
  from    (select distinct COST_CENTER from HAVE1 except select distinct COST_CENTER from HAVE2)
  group by  COST_CENTER;
quit;
  
optimist55116
Calcite | Level 5

This may have been answered adequately already but I didn't see any closure... 

 

The most direct answer to your question is that you forgot the where clause after the join.   

 

proc sql;

    create table DQ As

      select  

           case  

              when r.CostCentre ^= d.CostCentredq then 1
              else 0
            end as WrongCostCentre

      from data.resources r,  valibla.dq_res d

      where r.apples=d.apples AND r.oranges=d.oranges;

quit;  

 

If you really just want the rows where WrongCostCentre  =1 then add to the where clause, or add the having clause:

 

proc sql;

    create table DQ As

      select  

           case  

              when r.CostCentre ^= d.CostCentredq then 1
              else 0
            end as WrongCostCentre

      from data.resources r,  valibla.dq_res d

      where r.apples=d.apples AND r.oranges=d.oranges

      having WrongCostCentre=1;  /* Some would consider this an abuse of the having clause */  

quit;  

 

Of course, if you only want WrongCostCentre=1 rows, you don't need the CASE statement in the query. It would just be another condition in the where clause. 

 

mkeintz
PROC Star

I believe you are doing a cartesian comparison of the Nr cases in r vs the Nd cases in d.  Even if Nr=Nd (=N) and every cost center was present in both , you'd still have N*N - N observations with a value of wrongcentre =1.

 

What I really think you wnat is a list of centre's present in only one of the two data sets.  If so, I'd suggest this:

proc sql;
  create table DQ as
    select rcostcentre, costcentredq from
      data.resources as r full outer join valibla.dq_res as d
      on r.costcentre=d.costcentredq 
    except  
    select rcostcentre, costcentredq from
      data.resources as r inner join valibla.dq_res as d
      on r.costcentre=d.costcentredq;
quit;

 

I did a test of this with sashelp.names:

data class1 (rename=(name=name1)) class2 (rename=(name=name2));
  set sashelp.class (keep=name) end=eoc;
  output class1;
  if name='William' then name='XXXXX';
  output class2;
run;

proc sql;
  create table test as
    select name1, name2 from
       class1 FULL JOIN class2 on name1=name2
    EXCEPT
    select name1, name2 from
       class1 INNER JOIN class2 on name1=name2;
quit;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Develop Code with SAS Studio

Get started using SAS Studio to write, run and debug your SAS programs.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 2414 views
  • 0 likes
  • 5 in conversation