Re: PROC SQL sum where

buckeyefisher · Posted 09-21-2015 06:18 PM

I want to calculate failures for past 7 years for a company. Following is the example set

Data Have

Year Company failure

1991 A 1

1992 A 1

1993 A 1

2000 A 1

Data Want

Year Company failure count for past 7 years

1991 A 1 0

1992 A 1 1

1993 A 1 2

2000 A 1 1

I have written following code

proc sql;
update want as a
set failurecount = (select coalesce(sum(failure),0) from have where (Company = a.Company) and Year between a.Year-8 and a.Year-1);

quit;

The ouptput is confusing. Somr records are populated with correct values while others are 0.

Reeza · Posted 09-21-2015 06:24 PM

Your code appears to work correctly for me. Can you illustrate the issue with it?

You may want to verify your boundary conditions match your word statement, ie If year=2000 then year-1 and year-8 would be from 1999 to 1992 for your sample data which means you would have a failure count of 2, not 1 as indicated.

ballardw · Posted 09-21-2015 06:42 PM

You may need to provide more input, the output for that input and more details as to what is incorrect.

Without more details I would suspect the issue may be the coded year range and the actual values for year.

Tom · Posted 09-22-2015 01:58 AM

That does not look lke the type of query that you can do with an UPDATE statement.

Just make a normal query.

data have ;
  input year company $ failure expect @@;
cards;
1991 A 1 0 1992 A 1 1 1993 A 1 2 2000 A 1 1
;;;;

proc sql ;
 create table want as
   select a.*
        , coalesce(sum(b.failure),0) as count
   from have a left join have b
     on a.company = b.company
     and b.year between a.year - 7 and a.year - 1
   group by a.company, a.year
   order by a.company, a.year
 ;
quit;
proc print; run;

RW9 · Posted 09-22-2015 04:58 AM

The output is correct per your logic?

hbi · Posted 09-22-2015 06:04 PM

This solution is very similar to Tom's SQL solution, but it uses a subselect within a select.

DATA have;
length Year 8 Company $8 Failure 8;
input Year Company $ Failure;
datalines;
1991   A   1
1992   A   1
1993   A   1
2000   A   1
;
RUN;

PROC SQL;
  CREATE TABLE want AS 
  SELECT Year
       , Company
       , Failure
       , (SELECT COALESCE(SUM(Failure), 0)
          FROM have AS B
          WHERE A.Company = B.Company
            AND A.Year - B.Year BETWEEN 1 AND 7) AS Count_In_7_Years
  FROM have AS A;
QUIT;

A lookup within an update is possible, but you need to add your new field first using an ALTER TABLE statement.

If you run the update statement as-is, you will get an error message that I have pasted below. You would have to uncomment the highlighted line to suppress the error.

PROC SQL;
  ALTER TABLE have
  ADD Count_In_7_Years num;
QUIT;

/* uncomment the next line to suppress the error */ 
/* OPTIONS SQLUNDOPOLICY=NONE; */ 
PROC SQL;
  UPDATE have AS A
  SET Count_In_7_Years = (SELECT COALESCE(SUM(Failure), 0)
                          FROM have AS B
                          WHERE A.Company = B.Company
                            AND A.Year - B.Year BETWEEN 1 AND 7);
QUIT;

WARNING: A value expression of the SET clause references the data set being updated.
ERROR: You cannot reopen WORK.HAVE.DATA for update access with member-level control because WORK.HAVE.DATA is in use by you in
resource environment SQL (2).
ERROR: PROC SQL could not undo this statement if an ERROR were to happen as it could not obtain exclusive access to the data set.
This statement will not execute as the SQL option UNDO_POLICY=REQUIRED is in effect.

RW9 · Posted 09-23-2015 04:05 AM

Two things to note there:

- It is called a SubQuery.

- The warning is there for a reason. This is not an SQL system, SAS likes to have sole access to a dataset for operation. Opening the dataset within a SubQuery in this manner will mean you have that dataset open twice, once for reading, once for writing, both in memory. Which will ruin memory usage if your data is large. Better to create a new dataset.

hbi · Posted 09-23-2015 05:11 PM

Yes RW9, very true.

I posted on a different thread ( https://communities.sas.com/t5/Base-SAS-Programming/Problem-with-PROC-SQL-UPDATE/m-p/225920#U225920 ) that had a related problem, but the problem appeared to be magnified because the same dataset was being referenced in both the "set" clause and the "where" clause.

-- hbi

Tom · Posted 09-23-2015 06:21 PM

I would never consider using PROC SQL to do this type of thing. I would consider just treating SQL as a query tool, it wasn't called Structured Query Language by mistake. If you want to do data management activities in SAS then use things like the MODIFY command. Other system that have been built using on SQL as the users programming language might have made optimizations and enhancements to SQL to allow you to use it for Data Management, but SQL is not the base language of SAS.

buckeyefisher · Posted 10-05-2015 10:14 AM

My code was write just that input date had some issues. Now it works.

But I agree with Tom that SQL may not be the best way to modify. For this simple updation, my program takes around 20 to 25 minutes.

There are many more calculations lined up and I can not imagine spending a day to get the results.

Let me try MODIFY.

Thanks all.

Steelers_In_DC · Posted 09-23-2015 12:01 PM

Completely different solution but I think it works for your issue:

data have;
infile cards dsd;
input year company$ failure;
cards;
1991,A,1
1992,A,1
1993,A,1
2000,A,1
;

data want;
set have;
by company;
   %macro lag;
   %do i = 1 %to 7;
   _lyear&i = lag&i(year);
       if _lyear&i < year - 7 then _lyear&i = .;
       else _lyear&i = 1;
   %end;
   %mend;
   %lag;
count = sum(of _lyear1 - _lyear7);
drop _:;
run;

evp000 · Posted 09-24-2015 01:52 PM

Hello,

I'm not as well-versed in SQL as some of you and I needed to be able to visualize it better to get the results, so I used a combination of data step programming and SQL. I believe it produces the desired results.

/* reference table */
DATA have(rename = (year = ref_year
failure = ref_failure));
length Year 8 Company $8 Failure 8;
input Year Company $ Failure;
datalines;
1991 A 1
1992 A 1
1993 A 1
2000 A 1
;
RUN;

/* comparison table */
data have2;
set have (rename = (ref_year = comp_year
company = comp_company
ref_failure = comp_failure));
run;

/* full join to visualize reference year and comparison years */
proc sql;
create table temp1 as
select a.ref_year, a.company, a.ref_failure, b.comp_year, b.comp_failure
from have as a full join have2 as b
on a.company = b.comp_company
order by company, ref_year, comp_year;
quit;

/* do comparison, increment count, output last in group */
data want (drop = comp_year comp_failure);
set temp1;
by company ref_year comp_year;
retain countf 0;
if first.company or first.ref_year then countf = 0;
if (ref_year-7) le comp_year le (ref_year-1) then countf = countf + comp_failure;
if last.ref_year then output;
run;

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away