I want to calculate failures for past 7 years for a company. Following is the example set
Data Have
Year Company failure
1991 A 1
1992 A 1
1993 A 1
2000 A 1
Data Want
Year Company failure count for past 7 years
1991 A 1 0
1992 A 1 1
1993 A 1 2
2000 A 1 1
I have written following code
proc sql;
update want as a
set failurecount = (select coalesce(sum(failure),0) from have where (Company = a.Company) and Year between a.Year-8 and a.Year-1);
quit;
The ouptput is confusing. Somr records are populated with correct values while others are 0.
Your code appears to work correctly for me. Can you illustrate the issue with it?
You may want to verify your boundary conditions match your word statement, ie If year=2000 then year-1 and year-8 would be from 1999 to 1992 for your sample data which means you would have a failure count of 2, not 1 as indicated.
You may need to provide more input, the output for that input and more details as to what is incorrect.
Without more details I would suspect the issue may be the coded year range and the actual values for year.
That does not look lke the type of query that you can do with an UPDATE statement.
Just make a normal query.
data have ;
input year company $ failure expect @@;
cards;
1991 A 1 0 1992 A 1 1 1993 A 1 2 2000 A 1 1
;;;;
proc sql ;
create table want as
select a.*
, coalesce(sum(b.failure),0) as count
from have a left join have b
on a.company = b.company
and b.year between a.year - 7 and a.year - 1
group by a.company, a.year
order by a.company, a.year
;
quit;
proc print; run;
The output is correct per your logic?
This solution is very similar to Tom's SQL solution, but it uses a subselect within a select.
DATA have;
length Year 8 Company $8 Failure 8;
input Year Company $ Failure;
datalines;
1991 A 1
1992 A 1
1993 A 1
2000 A 1
;
RUN;
PROC SQL;
CREATE TABLE want AS
SELECT Year
, Company
, Failure
, (SELECT COALESCE(SUM(Failure), 0)
FROM have AS B
WHERE A.Company = B.Company
AND A.Year - B.Year BETWEEN 1 AND 7) AS Count_In_7_Years
FROM have AS A;
QUIT;
A lookup within an update is possible, but you need to add your new field first using an ALTER TABLE statement.
If you run the update statement as-is, you will get an error message that I have pasted below. You would have to uncomment the highlighted line to suppress the error.
PROC SQL;
ALTER TABLE have
ADD Count_In_7_Years num;
QUIT;
/* uncomment the next line to suppress the error */
/* OPTIONS SQLUNDOPOLICY=NONE; */
PROC SQL;
UPDATE have AS A
SET Count_In_7_Years = (SELECT COALESCE(SUM(Failure), 0)
FROM have AS B
WHERE A.Company = B.Company
AND A.Year - B.Year BETWEEN 1 AND 7);
QUIT;
WARNING: A value expression of the SET clause references the data set being updated.
ERROR: You cannot reopen WORK.HAVE.DATA for update access with member-level control because WORK.HAVE.DATA is in use by you in
resource environment SQL (2).
ERROR: PROC SQL could not undo this statement if an ERROR were to happen as it could not obtain exclusive access to the data set.
This statement will not execute as the SQL option UNDO_POLICY=REQUIRED is in effect.
Two things to note there:
- It is called a SubQuery.
- The warning is there for a reason. This is not an SQL system, SAS likes to have sole access to a dataset for operation. Opening the dataset within a SubQuery in this manner will mean you have that dataset open twice, once for reading, once for writing, both in memory. Which will ruin memory usage if your data is large. Better to create a new dataset.
Yes RW9, very true.
I posted on a different thread ( https://communities.sas.com/t5/Base-SAS-Programming/Problem-with-PROC-SQL-UPDATE/m-p/225920#U225920 ) that had a related problem, but the problem appeared to be magnified because the same dataset was being referenced in both the "set" clause and the "where" clause.
-- hbi
I would never consider using PROC SQL to do this type of thing. I would consider just treating SQL as a query tool, it wasn't called Structured Query Language by mistake. If you want to do data management activities in SAS then use things like the MODIFY command. Other system that have been built using on SQL as the users programming language might have made optimizations and enhancements to SQL to allow you to use it for Data Management, but SQL is not the base language of SAS.
My code was write just that input date had some issues. Now it works.
But I agree with Tom that SQL may not be the best way to modify. For this simple updation, my program takes around 20 to 25 minutes.
There are many more calculations lined up and I can not imagine spending a day to get the results.
Let me try MODIFY.
Thanks all.
Completely different solution but I think it works for your issue:
data have;
infile cards dsd;
input year company$ failure;
cards;
1991,A,1
1992,A,1
1993,A,1
2000,A,1
;
data want;
set have;
by company;
%macro lag;
%do i = 1 %to 7;
_lyear&i = lag&i(year);
if _lyear&i < year - 7 then _lyear&i = .;
else _lyear&i = 1;
%end;
%mend;
%lag;
count = sum(of _lyear1 - _lyear7);
drop _:;
run;
Hello,
I'm not as well-versed in SQL as some of you and I needed to be able to visualize it better to get the results, so I used a combination of data step programming and SQL. I believe it produces the desired results.
/* reference table */
DATA have(rename = (year = ref_year
failure = ref_failure));
length Year 8 Company $8 Failure 8;
input Year Company $ Failure;
datalines;
1991 A 1
1992 A 1
1993 A 1
2000 A 1
;
RUN;
/* comparison table */
data have2;
set have (rename = (ref_year = comp_year
company = comp_company
ref_failure = comp_failure));
run;
/* full join to visualize reference year and comparison years */
proc sql;
create table temp1 as
select a.ref_year, a.company, a.ref_failure, b.comp_year, b.comp_failure
from have as a full join have2 as b
on a.company = b.comp_company
order by company, ref_year, comp_year;
quit;
/* do comparison, increment count, output last in group */
data want (drop = comp_year comp_failure);
set temp1;
by company ref_year comp_year;
retain countf 0;
if first.company or first.ref_year then countf = 0;
if (ref_year-7) le comp_year le (ref_year-1) then countf = countf + comp_failure;
if last.ref_year then output;
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.