BookmarkSubscribeRSS Feed
andrew928
Calcite | Level 5

Hi

 

I am relatively new to the world of SQL and am now using SAS Studio as part of my role, so please forgive me if this is an obvious question.

 

I am trying to write a SQl statement that will tell me how many values in one list don't match those in another list, in this instance Cost Centres, but when i run the report it returns hundreds of thousands of rows rather than an expected 30 or 40. Can anyone signpost me as to where I am going wrong?

 

proc sql;

 

create table DQ As

select case when r.CostCentre ^= d.CostCentredq then 1
else 0
end as WrongCostCentre

from data.resources r, valibla.dq_res d;

 

quit;

 

Thanks in advance for any help

4 REPLIES 4
ChrisBrooks
Ammonite | Level 13

You can make this easier by doing a subquery, for example if we create a data set of records from SASHELP.CLASS of those aged under 15 like this:

 

data under_15;
	set sashelp.class(where=(age<15));
run;

We can then get a count of all records in SASHELP.CLASS where age is 15 or over with a subquery:

 

proc sql;
	select count(age)
	from sashelp.class
	where age not in(select age from under_15);
quit;

If you want to create a data set with the over 15's in then do this:

 

proc sql;
	create table over_15
	as select *
	from sashelp.class
	where age not in(select age from under_15);
quit;
	
RW9
Diamond | Level 26 RW9
Diamond | Level 26

To get good answers we need to see what your working with, so post test data in the form of a datastep.

This is just a guess:

proc sql;
  create table WANT as
  select  count(COST_CENTER)
  from    (select distinct COST_CENTER from HAVE1 except select distinct COST_CENTER from HAVE2)
  group by  COST_CENTER;
quit;
  
optimist55116
Calcite | Level 5

This may have been answered adequately already but I didn't see any closure... 

 

The most direct answer to your question is that you forgot the where clause after the join.   

 

proc sql;

    create table DQ As

      select  

           case  

              when r.CostCentre ^= d.CostCentredq then 1
              else 0
            end as WrongCostCentre

      from data.resources r,  valibla.dq_res d

      where r.apples=d.apples AND r.oranges=d.oranges;

quit;  

 

If you really just want the rows where WrongCostCentre  =1 then add to the where clause, or add the having clause:

 

proc sql;

    create table DQ As

      select  

           case  

              when r.CostCentre ^= d.CostCentredq then 1
              else 0
            end as WrongCostCentre

      from data.resources r,  valibla.dq_res d

      where r.apples=d.apples AND r.oranges=d.oranges

      having WrongCostCentre=1;  /* Some would consider this an abuse of the having clause */  

quit;  

 

Of course, if you only want WrongCostCentre=1 rows, you don't need the CASE statement in the query. It would just be another condition in the where clause. 

 

mkeintz
PROC Star

I believe you are doing a cartesian comparison of the Nr cases in r vs the Nd cases in d.  Even if Nr=Nd (=N) and every cost center was present in both , you'd still have N*N - N observations with a value of wrongcentre =1.

 

What I really think you wnat is a list of centre's present in only one of the two data sets.  If so, I'd suggest this:

proc sql;
  create table DQ as
    select rcostcentre, costcentredq from
      data.resources as r full outer join valibla.dq_res as d
      on r.costcentre=d.costcentredq 
    except  
    select rcostcentre, costcentredq from
      data.resources as r inner join valibla.dq_res as d
      on r.costcentre=d.costcentredq;
quit;

 

I did a test of this with sashelp.names:

data class1 (rename=(name=name1)) class2 (rename=(name=name2));
  set sashelp.class (keep=name) end=eoc;
  output class1;
  if name='William' then name='XXXXX';
  output class2;
run;

proc sql;
  create table test as
    select name1, name2 from
       class1 FULL JOIN class2 on name1=name2
    EXCEPT
    select name1, name2 from
       class1 INNER JOIN class2 on name1=name2;
quit;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1571 views
  • 0 likes
  • 5 in conversation