PROC SQL sum where

Reply
Contributor
Posts: 63

PROC SQL sum where

I want to calculate failures for past 7 years for a company. Following is the example set

 

Data Have 

Year  Company     failure     

1991   A                 1

1992   A                 1

1993   A                 1

2000   A                 1

 

Data Want

Year   Company    failure    count for past 7 years

1991   A                 1            0

1992   A                 1            1

1993   A                 1            2

2000  A                   1           1

 

I have written following code

 

proc sql;
update want as a
set failurecount = (select coalesce(sum(failure),0) from have where (Company = a.Company) and Year between a.Year-8 and a.Year-1);

quit; 

 

The ouptput is confusing. Somr records are populated with correct values while others are 0. 



Grand Advisor
Posts: 17,333

Re: PROC SQL sum where

[ Edited ]

Your code appears to work correctly for me. Can you illustrate the issue with it?

You may want to verify your boundary conditions match your word statement, ie If year=2000 then year-1 and year-8 would be from 1999 to 1992 for your sample data which means you would have a failure count of 2, not 1 as indicated.

Grand Advisor
Posts: 10,210

Re: PROC SQL sum where

You may need to provide more input, the output for that input and more details as to what is incorrect.

Without more details I would suspect the issue may be the coded year range and the actual values for year.

Super User
Super User
Posts: 6,326

Re: PROC SQL sum where

That does not look lke the type of query that you can do with an UPDATE statement.

Just make a normal query.

 

data have ;
  input year company $ failure expect @@;
cards;
1991 A 1 0 1992 A 1 1 1993 A 1 2 2000 A 1 1
;;;;

proc sql ;
 create table want as
   select a.*
        , coalesce(sum(b.failure),0) as count
   from have a left join have b
     on a.company = b.company
     and b.year between a.year - 7 and a.year - 1
   group by a.company, a.year
   order by a.company, a.year
 ;
quit;
proc print; run;
Esteemed Advisor
Esteemed Advisor
Posts: 7,203

Re: PROC SQL sum where

The output is correct per your logic?

Contributor hbi
Contributor
Posts: 66

Re: PROC SQL sum where

[ Edited ]

This solution is very similar to Tom's SQL solution, but it uses a subselect within a select. 

 

DATA have;
length Year 8 Company $8 Failure 8;
input Year Company $ Failure;
datalines;
1991   A   1
1992   A   1
1993   A   1
2000   A   1
;
RUN;

PROC SQL;
  CREATE TABLE want AS 
  SELECT Year
       , Company
       , Failure
       , (SELECT COALESCE(SUM(Failure), 0)
          FROM have AS B
          WHERE A.Company = B.Company
            AND A.Year - B.Year BETWEEN 1 AND 7) AS Count_In_7_Years
  FROM have AS A;
QUIT;

 

A lookup within an update is possible, but you need to add your new field first using an ALTER TABLE statement. 

 

If you run the update statement as-is, you will get an error message that I have pasted below. You would have to uncomment the highlighted line to suppress the error. 

  

PROC SQL;
  ALTER TABLE have
  ADD Count_In_7_Years num;
QUIT;

/* uncomment the next line to suppress the error */ 
/* OPTIONS SQLUNDOPOLICY=NONE; */ 
PROC SQL;
  UPDATE have AS A
  SET Count_In_7_Years = (SELECT COALESCE(SUM(Failure), 0)
                          FROM have AS B
                          WHERE A.Company = B.Company
                            AND A.Year - B.Year BETWEEN 1 AND 7);
QUIT;

 

WARNING: A value expression of the SET clause references the data set being updated.
ERROR: You cannot reopen WORK.HAVE.DATA for update access with member-level control because WORK.HAVE.DATA is in use by you in
resource environment SQL (2).
ERROR: PROC SQL could not undo this statement if an ERROR were to happen as it could not obtain exclusive access to the data set.
This statement will not execute as the SQL option UNDO_POLICY=REQUIRED is in effect.

 

Esteemed Advisor
Esteemed Advisor
Posts: 7,203

Re: PROC SQL sum where

Two things to note there:

- It is called a SubQuery.

- The warning is there for a reason.  This is not an SQL system, SAS likes to have sole access to a dataset for operation.  Opening the dataset within a SubQuery in this manner will mean you have that dataset open twice, once for reading, once for writing, both in memory.  Which will ruin memory usage if your data is large.  Better to create a new dataset.

Contributor hbi
Contributor
Posts: 66

Re: PROC SQL sum where

Yes RW9, very true. 

 

I posted on a different thread ( https://communities.sas.com/t5/Base-SAS-Programming/Problem-with-PROC-SQL-UPDATE/m-p/225920#U225920 ) that had a related problem, but the problem appeared to be magnified because the same dataset was being referenced in both the "set" clause and the "where" clause. 

 

-- hbi Robot Happy

Super User
Super User
Posts: 6,326

Re: PROC SQL sum where

I would never consider using PROC SQL to do this type of thing.  I would consider just treating SQL as a query tool, it wasn't called Structured Query Language by mistake.  If you want to do data management activities in SAS then use things like the MODIFY command. Other system that have been built using on SQL as the users programming language might have made optimizations and enhancements to SQL to allow you to use it for Data Management, but SQL is not the base language of SAS.

Contributor
Posts: 63

Re: PROC SQL sum where

My code was write just that input date had some issues. Now it works.  

 

But I agree with Tom that SQL may not be the best way to modify. For this simple updation, my program takes around 20 to 25 minutes.

There are many more calculations lined up and I can not imagine spending a day to get the results.

 

Let me try MODIFY.

 

Thanks all.  

Valued Guide
Posts: 854

Re: PROC SQL sum where

Completely different solution but I think it works for your issue:

 

data have;
infile cards dsd;
input year company$ failure;
cards;
1991,A,1
1992,A,1
1993,A,1
2000,A,1
;

data want;
set have;
by company;
    %macro lag;
    %do i = 1 %to 7;
    _lyear&i = lag&i(year);
        if _lyear&i < year - 7 then _lyear&i = .;
        else _lyear&i = 1;
    %end;
    %mend;
    %lag;
count = sum(of _lyear1 - _lyear7);
drop _:;
run;

Contributor
Posts: 74

Re: PROC SQL sum where

[ Edited ]

Hello,

I'm not as well-versed in SQL as some of you and I needed to be able to visualize it better to get the results, so I used a combination of data step programming and SQL.  I believe it produces the desired results.  

 

/* reference table */
DATA have(rename = (year = ref_year
                                    failure = ref_failure));
    length Year 8 Company $8 Failure 8;
    input Year Company $ Failure;
datalines;
1991 A 1
1992 A 1
1993 A 1
2000 A 1
;
RUN;

 

/* comparison table */
data have2;
    set have (rename = (ref_year = comp_year
                                     company = comp_company
                                     ref_failure = comp_failure));
run;

 

/* full join to visualize reference year and comparison years */
proc sql;
    create table temp1 as
    select a.ref_year, a.company, a.ref_failure, b.comp_year, b.comp_failure
    from have as a full join have2 as b
    on a.company = b.comp_company
    order by company, ref_year, comp_year;
quit;

/* do comparison, increment count, output last in group */
data want (drop = comp_year comp_failure);
    set temp1;
    by company ref_year comp_year;
    retain countf 0;
    if first.company or first.ref_year then countf = 0;
    if (ref_year-7) le comp_year le (ref_year-1) then countf = countf + comp_failure;
    if last.ref_year then output;
run;

 

 

Ask a Question
Discussion stats
  • 11 replies
  • 563 views
  • 2 likes
  • 8 in conversation