Note: My code above does not meet your newly refined requirements.
If you need ternary logic, three value logic, then I think you'll need to maintain two counters, one that counts number of negative value to trigger the Two/Three mask to a value of one, but you would likely need to count the number of non-missing sequential values to trigger the zero value. I think I have the program logic correct (maybe auto-correction will work 😉
I'll assume you are ok without the SQL code. I might take the approach of creating something like a three year rolling total table for each CoID/year before 2000 similar to SQL originally purposed. However instead of a total, you would track the number of non-missing profits and the number of negative profits. You would then need to reprocess that rolling table to determine whether the company would have a value of missing, zero, or one. A case expression that would be something like If max(count)=3 then '1' else max(count_nm)=3 then '0' otherwise '.' And that's quite a bit of overhead compared to the data step.
data have;
infile cards dsd;
input CoID$ Year Profit a ;
cards;
123,1995,-100,1
123,1996,-100,2
123,1998,-100,3
123,1999,-100,4
123,2000,-150,5
123,2001,200,6
223,1995,-100,1
223,1996,-100,2
223,1998,0,3
223,1999,-100,4
223,2000,-150,5
223,2001,200,6
456,1996,-10,1
456,1997,-10,2
456,1998,.,3
456,1999,-10,4
456,2000,-10,5
456,2001,.,6
456,2002,.,7
789,1999,-100,1
789,2000,-100,2
789,2001,50,3
789,2002,100,4
;
data need;
set have(keep=coid year profit) ; *first pass check for runs;
by coid;
if first.coid then do;
count=0;
count_nm=0;
call missing( TwoInRow, ThreeInRow);
retain TwoInRow ThreeInRow;
end;
if year<=2000 then do; *exclude if check for all years;
if profit=. then do; count=0; count_nm=0; end;
else do;
count_nm+1; *count all non-missing;
if profit<0 then count+1; *count continuous neg;
else count=0; *if ge 0 reset;
end;
*Ternary logic, missing zero and one ;
if count_nm=2 and TwoInRow=. then TwoInRow=0; *Trigger zero value;
else if count_nm=3 and ThreeInRow=. then ThreeInRow=0;
if count=2 then TwoInRow=1; * *Trigger one value;
else if count=3 then ThreeInRow=1;
drop count: ;
end;
if last.coid;
do until(last.coid); *if last, re-read the BY group;
set have;
by coid; *reset first./last. ;
output;
end;
run;
Hi Roger,
Thanks for all your help. With the example you put in the previous post, I edited some parts to make the answer the questions that I have. Here is what I have and what I need.
CoID | Year | Profit | TA | TwoinaRow | ThreeinaRow | MorethanTwo |
123 | 1995 | -100 | 1 | 1 | 1 | 1 |
123 | 1996 | -100 | 2 | 1 | 1 | 1 |
123 | 1998 | -100 | 3 | 1 | 1 | 1 |
123 | 1999 | -100 | 4 | 1 | 1 | 1 |
123 | 2000 | -150 | 5 | 1 | 1 | 1 |
123 | 2001 | 200 | 6 | 1 | 1 | 1 |
223 | 1995 | -100 | 1 | 1 | 0 | 1 |
223 | 1996 | -100 | 2 | 1 | 0 | 1 |
223 | 1998 | 10 | 3 | 1 | 0 | 1 |
223 | 1999 | -100 | 4 | 1 | 0 | 1 |
223 | 2000 | -150 | 5 | 1 | 0 | 1 |
223 | 2001 | 200 | 6 | 1 | 0 | 1 |
456 | 1996 | -10 | 1 | 1 | . | 1 |
456 | 1997 | -10 | 2 | 1 | . | 1 |
456 | 1998 | . | 3 | 1 | . | 1 |
456 | 1999 | -10 | 4 | 1 | . | 1 |
456 | 2000 | -10 | 5 | 1 | . | 1 |
456 | 2001 | . | 6 | 1 | . | 1 |
789 | 1997 | -100 | 0 | 0 | . | 1 |
789 | 1998 | . | 1 | 0 | . | 1 |
789 | 1999 | 100 | 2 | 0 | . | 1 |
789 | 2000 | -100 | 3 | 0 | . | 1 |
789 | 2001 | . | 4 | 0 | . | 1 |
789 | 2002 | . | 5 | 0 | . | 1 |
890 | 1998 | . | 1 | . | . | . |
890 | 1999 | . | 2 | . | . | . |
890 | 2000 | -100 | 3 | . | . | . |
890 | 2001 | -100 | 4 | . | . | . |
890 | 2002 | 100 | 5 | . | . | . |
890 | 2003 | 200 | 6 | . | . | . |
890 | 2004 | 300 | 7 | . | . | . |
891 | 1997 | 100 | 1 | 1 | 0 | 1 |
891 | 1998 | -100 | 2 | 1 | 0 | 1 |
891 | 1999 | -200 | 3 | 1 | 0 | 1 |
891 | 2000 | 200 | 4 | 1 | 0 | 1 |
891 | 2001 | 200 | 5 | 1 | 0 | 1 |
891 | 2002 | 100 | 6 | 1 | 0 | 1 |
892 | 1998 | -100 | 1 | . | . | 0 |
892 | 1999 | . | 2 | . | . | 0 |
892 | 2000 | 100 | 3 | . | . | 0 |
892 | 2001 | 100 | 4 | . | . | 0 |
So the following code is what I wrote down:
data have;
input coid year profit ta;
datalines;
123 1995 -100 1
123 1996 -100 2
123 1998 -100 3
123 1999 -100 4
123 2000 -150 5
123 2001 200 6
223 1995 -100 1
223 1996 -100 2
223 1998 10 3
223 1999 -100 4
223 2000 -150 5
223 2001 200 6
456 1996 -10 1
456 1997 -10 2
456 1998 . 3
456 1999 -10 4
456 2000 -10 5
456 2001 . 6
789 1997 -100 0
789 1998 . 1
789 1999 100 2
789 2000 -100 3
789 2001 . 4
789 2002 . 5
890 1998 . 1
890 1999 . 2
890 2000 -100 3
890 2001 -100 4
890 2002 100 5
890 2003 200 6
890 2004 300 7
891 1997 100 1
891 1998 -100 2
891 1999 -200 3
891 2000 200 4
891 2001 200 5
891 2002 100 6
892 1998 -100 1
892 1999 . 2
892 2000 100 3
892 2001 100 4
;
run;
proc sort data=have;
by coid year;
run;
proc sql;
create table want as
select *,
case
when (select count(*) from have where coid=a.coid and year le 2000 and profit lt 0) gt 1 then 1
else 0
end as MorethanTwo
from have as a
group by coid ;
quit;
data need;
set want(keep=coid year profit) ; *first pass check for runs;
by coid;
if first.coid then do;
count=0;
count_nm=0;
call missing( TwoInRow, ThreeInRow);
retain TwoInRow ThreeInRow;
end;
if year<=2000 then do; *exclude if check for all years;
if profit=. then do; count=0; count_nm=0; end;
else do;
count_nm+1; *count all non-missing;
if profit<0 then count+1; *count continuous neg;
else count=0; *if ge 0 reset;
end;
*Ternary logic, missing zero and one ;
if count_nm=2 and TwoInRow=. then TwoInRow=0; *Trigger zero value;
else if count_nm=3 and ThreeInRow=. then ThreeInRow=0;
if count=2 then TwoInRow=1; * *Trigger one value;
else if count=3 then ThreeInRow=1;
drop count: ;
end;
if last.coid;
do until(last.coid); *if last, re-read the BY group;
set want;
by coid; *reset first./last. ;
output;
end;
run;
proc sort data=need;
by coid year;
run;
The result is where is gets interesting.
Your code works perfectly. I get the result that I requested. What I have problem is with the proc sql code. Somebody told in the first page that the sql code that I wrote in my original post counts 2 years or more with negative profit as 1 otherwise it counts as 0.
The second code in my code runs the proc sql code but some reason it does not do what it is supposed to do.
Can you please help me with the code that will give a variable called "MorethanTwo" if there are two years (not consecutive) prior to 2001 that will give me 1 if the profit is negative, 0 if there is at least two years of information but they are not negative, and missing (.) if I don't have at least two years of information on profit prior to 2001?
Thanks so much,
The Rock,
Given this data...
data have;
infile datalines dlm='09'x truncover ;
input coid year profit ta;
datalines;
123 1995 -100 1
123 1996 -100 2
123 1997 -100 3
123 1998 -100 4
123 1999 -100 5
123 2000 -150 6
123 2001 200 7
223 1995 -100 1
223 1996 -100 2
223 1998 10 3
223 1999 -100 4
223 2000 -150 5
223 2001 200 6
456 1996 -10 1
456 1997 -10 2
456 1998 . 3
456 1999 -10 4
456 2000 -10 5
456 2001 . 6
789 1997 -100 0
789 1998 . 1
789 1999 100 2
789 2000 -100 3
789 2001 . 4
789 2002 . 5
890 1998 . 1
890 1999 . 2
890 2000 -100 3
890 2001 -100 4
890 2002 100 5
890 2003 200 6
890 2004 300 7
891 1997 100 1
891 1998 -100 2
891 1999 -200 3
891 2000 200 4
891 2001 200 5
891 2002 100 6
892 1998 -100 1
892 1999 . 2
892 2000 100 3
892 2001 100 4
;
run;
You would create rolling totals number of negative and number of non-missing for each year prior to 2000, the threeYear and twoYEar tables below. As they would represent multiple years prior to 2000, you would then need to summary the annual rolling count to determin the ternary logic. So if there are three negative years, then "1", else if there are three non-missing value, then "0", otherwise missing. You can review the rolling totals in the tables, and the pre-ternary summary results from the select clauses.
proc sql;
create table threeYear as
select a.coid, a.year, sum(b.profit<0 and b.profit is not missing) as ThreeBadYears,
sum(b.profit is not missing) as c3
from have as a, have as b
where a.coid=b.coid and (a.year between b.year-0 and b.year-2 and
a.year<=1998)
group by a.coid, a.year;
select coid, max(threeBadYears) as Bad3,
min(c3) as NM3
from threeyear
group by coid;
*if Bad3=3 and NM3=3 then 1
if Bad2<3 and NM3=3 then 0
if NM3<3 then . ;
create table twoYear as
select a.coid, a.year, sum(b.profit<=0 and b.profit is not missing ) as TwoBadYears,
sum(b.profit=.) as C2,
from have as a, have as b
where a.coid=b.coid and (a.year between b.year-0 and b.year-1 and
a.year<=1999)
group by a.coid, a.year;
select coid, max(TwoBadYears) as Bad2,
max(C2) as NM2
from TwoYear
group by coid;
*if Bad2=2 then 1
if NM=2 then 0
if NM<2 then . ;
quit;
You can combine those queries as inline views. Instead of include the maximum value, you would use the maximum values in the CASE expression to determine the ternary value.
As not all company might appears in the Two and Three years rolling total tables, you would need to use a LEFT JOIN so as not to remove company not represented in these two table. The ternary values would be returned for companies with pre-2000 data and the LEFT JOIN would set companies without data to missing.
proc sql;
select have.*, y2.TwoInRow, y3.ThreeInRow
from
(have
left join
(select coid,
case when (max(TwoBadYears)=2) then 1
when (max(C2)=2) then 0
else .
end as TwoInRow
from (
select a.coid, a.year,
sum(b.profit<0 and b.profit is not missing) as TwoBadYears,
sum(b.profit is not missing) as C2
from have as a, have as b
where a.coid=b.coid and (a.year between b.year-1 and b.year+0 and
a.year<=1999)
group by a.coid, a.year
)
group by coid) as Y2
on have.coid = y2.coid )
left join
(select coid,
case when (max(ThreeBadYears)=3) then 1
when (max(C3)=3) then 0
end as ThreeInRow
from (
select c.coid, c.year,
sum(d.profit<0 and d.profit is not missing) as ThreeBadYears,
sum(d.profit is not missing) as C3
from have as c, have as d
where c.coid=d.coid and (c.year between d.year-2 and d.year+0 and
c.year<=1998)
group by c.coid, c.year
)
group by coid) as Y3
on have.coid=y3.coid ;
quit;
The SQL code is highly recursive as SQL does not handle sequential processing as efficiently as the data step. The HAVE table is ready 5 times, and 4 intermediate summary tables are re-read as inline views (albeit they are narrow tables and if HAVE is small the OS might have cached the file). The DATA step is likely to read the table just once with a smaller memory/cpu footprint.
Looks like there was an errant comma...
create table twoYear as
select a.coid, a.year,
sum(b.profit<=0 and b.profit is not missing ) as TwoBadYears,
sum(b.profit=.) as C2
from have as a, have as b
where a.coid=b.coid and (a.year between b.year-0 and b.year-1 and
a.year<=1999)
group by a.coid, a.year;
... before the FROM clause.
I suppose that you realize that was intermediate explanatory code to explain the 40 or so lines of SQLgetti that followed (which did not have the errant comma). Good luck
I wasn't sure of your ternary logic (it seems a little contradictory)...
*MoreThanTwo |
Three Years |
||
Two Years |
. |
0 |
1 |
. |
. |
d.n.e |
d.n.e |
0 |
1 |
0 |
d.n.e |
1 |
1 |
1 |
1 |
It appears that if TwoYears=0, you want MoreThanTwo to be '1'? Is that correct("456")?
select have.*, y2.TwoInRow, y3.ThreeInRow case when y2.TwoInRow=1 then 1 when y2.TwoInRow=0 annd y3.ThreeInRow=0 then 0 otherwise . end as MoreThanTwo from ... ;
In the table you show situations where both TwoInRow and ThreeInRow are both missing ("890" and "892") but they have different ternary results both missing as well as 0? The program logic for MoreThanTwo applies to the results table that is shown above. Do you have more logic to consider? In other words, what makes MoreThanTwo different than TwoInRow?
Oh, that.... LT 0 includes missing and you only code boolean logic, not ternary logic...
Which is why I used..
sum(b.profit<0 and b.profit is not missing)
Additionally, you don't need a GROUP BY, as you are running a correlated query. But, as you now now need three value logic based on a different set of logic...I believe you'll need to run two correlated queries to SAS_STATS original code, as you have three value logic...
proc sql;
create table want as
select *,
case
when (select count(*) from have
where coid=a.coid and year le 2000
and (profit lt 0 and profit is not missing))
gt 1 then 1
when (select count(*) from have
where coid=a.coid and year le 2000
and (profit is not missing ))
gt 1 then 0
else . end as MorethanTwo
from have as a
order by coid, year ;
quit;
Sorry, as this might come off as lecturing, but this is SAScommunities, and not meant to be SQLcommunities.
I was just hoping to promote the efficiency of SAS code even if it is SQL... however, new lesson, old story, MASK variables.
The summary query could be written as to capture Found2 (negative) and Any2 (reported). However, two columns returned would not make it suitable for the correlated query in the CASE expression, which can only return one column. So I was thinking, we could improve SAS_STAT code by MASKing the results from the two columns...
proc sql;
select coid, sum(profit lt 0 and profit is not mising) gt 1 as Found2,
sum(profit is not missing ) gt 1 as Any2,
from have as mtr
where mtr.year le 2000
group by coid ;
*Combine the columns into one mask variable... ;
select coid,
((sum(profit lt 0 and profit is not mising) gt 1) * 10 ) +
(sum(mtr.profit is not missing) gt 1)
as Mask
from have as MTR
where mtr.year le 2000
group by mtr.coid ;
quit;
So, let's go back to your original question and SAS_STAT's code and insert the MASK query, so instead possibly two subqueries for each row, you could have just one ONE correlated subquery query for every row...
*So the correlated query might look like this... ;
proc sql;
create table want as
select *,
case
( select
((sum(profit lt 0 and profit is not missing) gt 1) * 10 ) +
(sum(profit is not missing) gt 1) as Mask
from have where coid=a.coid and year le 2000
)
when (11) then 1
when (01) then 0
when (00) then .
else .
end as MoreThanTwo
from have as a
order by coid, year ;
quit;
Typically, I would say that you should avoid the correlated query when you can create a summary table. The ternary logic... Found2 1, AnyTwo , 0, less than 1 found (00), missing. Thus adding a third left join might look a little like this...
proc sql;
select have.*, y2.TwoInRow, y3.ThreeInRow,
case (y4.mask)
when (11) then 1
when (01) then 0
when (00) then .
else .
end as MoreThanTwo
from
(have
....
) as y3
left join
( ... MASK query here or Found2/Any2 query... ) as y4
on y1.coid = y4.coid ;
The alternative SAS code with likely one pass of the data...
data need;
set have(keep=coid year profit) ; *first pass check for runs;
by coid;
if first.coid then do;
count=0;
count_nm=0;
call missing( TwoInRow, ThreeInRow, Found2, Any2);
retain TwoInRow ThreeInRow Found2 Any2;
end;
if year<=2000 then do; *exclude if check for all years;
if ( first.coid ne 1 and year-1 ne lag(year)) or profit=. then do;
count=0; count_nm=0;
end; *reset counters, non-neg and any sequential year counters;
else do; *positive and negative profits, missing excluded;
count_nm+1; *count all non-missing;
Any2+1;
if profit<0 then do;
count+1; *count continuous neg;
Found2+1; *any neg count;
end;
else do; *non-negative profit;
count=0; *reset continuous neg;
end;
end;
*Ternary logic -- zero, one, default missing ;
if count_nm=2 and TwoInRow=. then TwoInRow=0; *Trigger zero value;
else if count_nm=3 and ThreeInRow=. then ThreeInRow=0;
if count=2 then TwoInRow=1; * *Trigger one value;
else if count=3 then ThreeInRow=1;
drop count: ;
end;
if last.coid;
if Found2>1 then MoreThan2=1;
else if Any2>1 then MoreThan2=0;
*drop any2 found2;
do until(last.coid); *if last, re-read the BY group;
set have;
by coid; *reset first./last. ;
output;
end;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.