BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
PGStats
Opal | Level 21

Note: My code above does not meet your newly refined requirements. 

PG
RogerSpeas
SAS Employee

If you need ternary logic, three value logic, then I think you'll need to maintain two counters, one that counts number of negative value to trigger the Two/Three mask to a value of one, but you would likely need to count the number of non-missing sequential values to trigger the zero value.   I think I have the program logic correct (maybe auto-correction will work 😉

 

I'll assume you are ok without the SQL code.  I might take the approach of creating something like a three year rolling total table for each CoID/year before 2000 similar to SQL originally purposed.   However instead of a total, you would track the number of non-missing profits and the number of negative profits.  You would then need to reprocess that rolling table to determine whether the company would have a value of missing, zero, or one.  A case expression that would be something like If max(count)=3 then '1' else max(count_nm)=3 then '0' otherwise '.'   And that's quite a bit of overhead compared to the data step. 

 

data have;

infile cards dsd;

input CoID$ Year Profit a ;

cards;

123,1995,-100,1

123,1996,-100,2

123,1998,-100,3

123,1999,-100,4

123,2000,-150,5

123,2001,200,6

223,1995,-100,1

223,1996,-100,2

223,1998,0,3

223,1999,-100,4

223,2000,-150,5

223,2001,200,6

456,1996,-10,1

456,1997,-10,2

456,1998,.,3

456,1999,-10,4

456,2000,-10,5

456,2001,.,6

456,2002,.,7

789,1999,-100,1

789,2000,-100,2

789,2001,50,3

789,2002,100,4

;

data need;

   set have(keep=coid year profit) ; *first pass check for runs;

   by coid;

 

   if first.coid then do;

      count=0;

      count_nm=0;

      call missing( TwoInRow, ThreeInRow);

      retain TwoInRow ThreeInRow;

      end;

 

   if year<=2000 then do; *exclude if check for all years;

      if profit=. then do; count=0; count_nm=0; end;

       else do;

         count_nm+1; *count all non-missing;

         if profit<0 then count+1; *count continuous neg;

         else count=0; *if ge 0 reset;

         end;

 

      *Ternary logic, missing zero and one ;

      if count_nm=2 and TwoInRow=. then TwoInRow=0; *Trigger zero value;

      else if count_nm=3 and ThreeInRow=. then ThreeInRow=0;

 

      if count=2 then TwoInRow=1; * *Trigger one value;

      else if count=3 then ThreeInRow=1;

      drop count: ;

      end;

 

   if last.coid;

   do until(last.coid); *if last, re-read the BY group;

      set have;

      by coid; *reset first./last. ;

      output;

      end;

run;

 

 

therock
Calcite | Level 5

Hi Roger,

 

Thanks for all your help. With the example you put in the previous post, I edited some parts to make the answer the questions that I have. Here is what I have and what I need. 

CoIDYearProfitTATwoinaRowThreeinaRowMorethanTwo
1231995-1001111
1231996-1002111
1231998-1003111
1231999-1004111
1232000-1505111
12320012006111
2231995-1001101
2231996-1002101
2231998103101
2231999-1004101
2232000-1505101
22320012006101
4561996-1011.1
4561997-1021.1
4561998.31.1
4561999-1041.1
4562000-1051.1
4562001.61.1
7891997-10000.1
7891998.10.1
789199910020.1
7892000-10030.1
7892001.40.1
7892002.50.1
8901998.1...
8901999.2...
8902000-1003...
8902001-1004...
89020021005...
89020032006...
89020043007...
89119971001101
8911998-1002101
8911999-2003101
89120002004101
89120012005101
89120021006101
8921998-1001..0
8921999.2..0
89220001003..0
89220011004..0


So the following code is what I wrote down: 

 

data have;
input coid year profit ta;
datalines;
123	1995	-100	1
123	1996	-100	2
123	1998	-100	3
123	1999	-100	4
123	2000	-150	5
123	2001	200	6
223	1995	-100	1
223	1996	-100	2
223	1998	10	3
223	1999	-100	4
223	2000	-150	5
223	2001	200	6
456	1996	-10	1
456	1997	-10	2
456	1998	.	3
456	1999	-10	4
456	2000	-10	5
456	2001	.	6
789	1997	-100	0
789	1998	.	1
789	1999	100	2
789	2000	-100	3
789	2001	.	4
789	2002	.	5
890	1998	.	1
890	1999	.	2
890	2000	-100	3
890	2001	-100	4
890	2002	100	5
890	2003	200	6
890	2004	300	7
891	1997	100	1
891	1998	-100	2
891	1999	-200	3
891	2000	200	4
891	2001	200	5
891	2002	100	6
892	1998	-100	1
892	1999	.	2
892	2000	100	3
892	2001	100	4

;
run;
proc sort data=have;
by coid year;
run;

proc sql;
create table want as 
 select *,
   case 
   when (select count(*) from have where coid=a.coid and year le 2000 and profit lt 0) gt 1 then 1
   else 0
   end as MorethanTwo
   from have as a
   group by coid ;
quit;
data need;
   set want(keep=coid year profit) ; *first pass check for runs;
   by coid;
 
   if first.coid then do;
      count=0;
      count_nm=0;
      call missing( TwoInRow, ThreeInRow);
      retain TwoInRow ThreeInRow;
      end;
 
   if year<=2000 then do; *exclude if check for all years;
      if profit=. then do; count=0; count_nm=0; end;
       else do;
         count_nm+1; *count all non-missing;
         if profit<0 then count+1; *count continuous neg;
         else count=0; *if ge 0 reset;
         end;
 
      *Ternary logic, missing zero and one ;
      if count_nm=2 and TwoInRow=. then TwoInRow=0; *Trigger zero value;
      else if count_nm=3 and ThreeInRow=. then ThreeInRow=0;
 
      if count=2 then TwoInRow=1; * *Trigger one value;
      else if count=3 then ThreeInRow=1;
      drop count: ;
      end;
 
   if last.coid;
   do until(last.coid); *if last, re-read the BY group;
      set want;
      by coid; *reset first./last. ;
      output;
      end;
run;
proc sort data=need;
by coid year;
run;

The result is where is gets interesting.

 

Your code works perfectly. I get the result that I requested. What I have problem is with the proc sql code. Somebody told in the first page that the sql code that I wrote in my original post counts 2 years or more with negative profit as 1 otherwise it counts as 0. 

 

The second code in my code runs the proc sql code but some reason it does not do what it is supposed to do.

 

Can you please help me with the code that will give a variable called "MorethanTwo" if there are two years (not consecutive) prior to 2001 that will give me 1 if the profit is negative, 0 if there is at least two years of information but they are not negative, and missing (.) if I don't have at least two years of information on profit prior to 2001?

 

Thanks so much,

The Rock,

 

 

 

 

therock
Calcite | Level 5
Anybody else who can help? Please?
RogerSpeas
SAS Employee

Given this data...

 

data have;

infile datalines dlm='09'x truncover ;

input coid year profit ta;

datalines;

123 1995 -100 1

123 1996 -100 2

123 1997 -100 3

123 1998 -100 4

123 1999 -100 5

123 2000 -150 6

123 2001 200 7

223 1995 -100 1

223 1996 -100 2

223 1998 10 3

223 1999 -100 4

223 2000 -150 5

223 2001 200 6

456 1996 -10 1

456 1997 -10 2

456 1998 . 3

456 1999 -10 4

456 2000 -10 5

456 2001 . 6

789 1997 -100 0

789 1998 . 1

789 1999 100 2

789 2000 -100 3

789 2001 . 4

789 2002 . 5

890 1998 . 1

890 1999 . 2

890 2000 -100 3

890 2001 -100 4

890 2002 100 5

890 2003 200 6

890 2004 300 7

891 1997 100 1

891 1998 -100 2

891 1999 -200 3

891 2000 200 4

891 2001 200 5

891 2002 100 6

892 1998 -100 1

892 1999 . 2

892 2000 100 3

892 2001 100 4

;

run;

 

You would create rolling totals number of negative and number of non-missing for each year prior to 2000, the threeYear and twoYEar tables below.   As they would represent multiple years prior to 2000, you would then need to summary the annual rolling count to determin the ternary logic.  So if there are three negative years, then "1", else if there are three non-missing value, then "0", otherwise missing.  You can review the rolling totals in the tables, and the pre-ternary summary results from the select clauses.

 

proc sql;

create table threeYear as

   select a.coid, a.year, sum(b.profit<0 and b.profit is not missing) as ThreeBadYears,

      sum(b.profit is not missing) as c3

      from have as a, have as b

      where a.coid=b.coid and (a.year between b.year-0 and b.year-2 and

         a.year<=1998)

      group by a.coid, a.year;

 

select coid, max(threeBadYears) as Bad3,

       min(c3) as NM3

    from threeyear

    group by coid;

      *if Bad3=3 and NM3=3 then 1

            if Bad2<3 and NM3=3 then 0

                  if NM3<3 then . ;

 

create table twoYear as

   select a.coid, a.year, sum(b.profit<=0 and b.profit is not missing ) as TwoBadYears,

          sum(b.profit=.) as C2,

       from have as a, have as b

       where a.coid=b.coid and (a.year between b.year-0 and b.year-1 and

             a.year<=1999)

        group by a.coid, a.year;

 

 select coid, max(TwoBadYears) as Bad2,

        max(C2) as NM2

      from TwoYear

      group by coid;

      *if Bad2=2 then 1

                 if NM=2 then 0

             if NM<2 then . ;

 

quit;

 

You can combine those queries as inline views.   Instead of include the maximum value, you would use the maximum values in the CASE expression to determine the ternary value. 

 

 As not all company might appears in the Two and Three years rolling total tables, you would need to use a LEFT JOIN so as not to remove company not represented in these two table.  The ternary values would be returned for companies with pre-2000 data and the LEFT JOIN would set companies without data to missing.

 

proc sql;

   select have.*, y2.TwoInRow, y3.ThreeInRow

      from

      (have

         left join

      (select coid,

            case when (max(TwoBadYears)=2) then 1

                when (max(C2)=2) then 0

                else .

                end as TwoInRow

            from (

                  select a.coid, a.year,

                       sum(b.profit<0 and b.profit is not missing) as TwoBadYears,

                       sum(b.profit is not missing) as C2

                     from have as a, have as b

                     where a.coid=b.coid and (a.year between b.year-1 and b.year+0 and

                           a.year<=1999)

                     group by a.coid, a.year

                  )

            group by coid) as Y2

 

            on have.coid = y2.coid )

 

            left join

 

 

            (select coid,

                  case when (max(ThreeBadYears)=3) then 1

                  when (max(C3)=3) then 0

                  end as ThreeInRow

                  from (

                     select c.coid, c.year,

                         sum(d.profit<0 and d.profit is not missing) as ThreeBadYears,

                            sum(d.profit is not missing) as C3

                       from have as c, have as d

                       where c.coid=d.coid and (c.year between d.year-2 and d.year+0 and

                             c.year<=1998)

                       group by c.coid, c.year

                       )

                  group by coid) as Y3

 

            on have.coid=y3.coid ;

quit;

 

 

The SQL code is highly recursive as SQL does not handle sequential processing as efficiently as the data step.  The HAVE table is ready 5 times, and 4 intermediate summary tables are re-read as inline views (albeit they are narrow tables and if HAVE is small the OS might have cached the file).  The DATA step is likely to read the table just once with a smaller memory/cpu footprint.

therock
Calcite | Level 5
Hi Roger,

The code gives me error when I try to put it in. In the first proc sql code there is an error. Please help me with that.

For what I need, I don't care whether it is data step or proc sql. However, I liked the previous data step procedure, because I was able to follow you. In the proc sql step, I am losing you because I am not familiar with it.

If you can help me with fixing this code or create a code in the original data step that you created to create a "MorethanTwo" variable, I would very much appreciate it.

Thanks,
Therock
RogerSpeas
SAS Employee

Looks like there was an errant comma...

 

 

create table twoYear as
   select a.coid, a.year, 
          sum(b.profit<=0 and b.profit is not missing ) as TwoBadYears,
          sum(b.profit=.) as C2
       from have as a, have as b
       where a.coid=b.coid and (a.year between b.year-0 and b.year-1 and
             a.year<=1999)
        group by a.coid, a.year;

 

 

... before the FROM clause.  

 

I suppose that you realize that was intermediate explanatory code to explain the 40 or so lines of SQLgetti that followed (which did not have the errant comma). Good luck

therock
Calcite | Level 5
Thanks Roger so much for your help.

Although the solution worked and I thank you for all your help, I still need the column with the title "MorethanTwo" which is not created by either of the codes.

If you can, that will be much appreciated. Thanks and have a great holiday season!
RogerSpeas
SAS Employee

I wasn't sure of your ternary logic (it seems a little contradictory)...

 

  *MoreThanTwo

Three Years

 

Two Years

.

0

1

.

.

d.n.e

d.n.e

0

1

0

d.n.e

1

1

1

1

 

 

It appears that if TwoYears=0, you want MoreThanTwo to be '1'?  Is that correct("456")?

 

   select have.*, y2.TwoInRow, y3.ThreeInRow
         case when y2.TwoInRow=1 then 1
              when y2.TwoInRow=0  annd y3.ThreeInRow=0 then 0
	      otherwise . end as MoreThanTwo
      from ...
      ;

 

 

In the table you show situations where both TwoInRow and ThreeInRow are both missing ("890" and "892") but they have different ternary results both missing as well as 0?   The program logic for MoreThanTwo applies to the results table that is shown above.   Do you have more logic to consider?   In other words, what makes MoreThanTwo different than TwoInRow?

therock
Calcite | Level 5
The logic is the following:

I might have company where profit information is given like this from 1998-2001 (COid 892) : -, ., +, +. If the company has any two years of negative profit prior to 2001 (not necessarily consecutive), then MorethanTwo will be 1. In Company 892 case, there is two years of profit information is given prior to 2001 but they are not consecutive and they are not both negative. Therefore MorethanTwo will 0, while TwoinRow and ThreeinRow will be missing.

Can you please help me with the code that will give a variable called "MorethanTwo" if there are two years (not consecutive) prior to 2001 that will give me 1 if the profit is negative, 0 if there is at least two years of information but they are not negative, and missing (.) if I don't have at least two years of information on profit prior to 2001?

Hope this clarifies the confusion. I thought that my original proc sql code does that. But for some reason, it does not. Here is the code:

proc sql;
create table want as
select *,
case
when (select count(*) from have where coid=a.coid and year le 2000 and profit lt 0) gt 1 then 1
else 0
end as MorethanTwo
from have as a
group by coid ;
quit;

Thanks,
Therock
RogerSpeas
SAS Employee

Oh, that.... LT 0 includes missing and you only code boolean logic, not ternary logic... 

 

Which is why I used..

 

sum(b.profit<0 and b.profit is not missing)

 

Additionally, you don't need a GROUP BY,  as you are running a correlated query.  But, as you now now need three value logic based on a different set of logic...I believe you'll need to run two correlated queries to SAS_STATS original code, as you have three value logic...

 

proc sql;
create table want as
select *,
   case 
          when (select count(*) from have 
                  where coid=a.coid and year le 2000 
                  and (profit lt 0 and profit is not missing)) 
				                          gt 1 then 1 
          when (select count(*) from have 
                  where coid=a.coid and year le 2000 
                  and (profit is not missing )) 
				                          gt 1 then 0
       	  else . end as MorethanTwo
      from have as a
      order by coid, year ;
quit;

 

 

 

 

 

 

 

 

Sorry, as this might come off as lecturing, but this is SAScommunities, and not meant to be SQLcommunities.  

 

I was just hoping to promote the efficiency of SAS code even if it is SQL... however, new lesson, old story, MASK variables.

 

The summary query could be written as to capture Found2 (negative) and Any2 (reported).   However, two columns returned would not make it suitable for the correlated query in the CASE expression, which can only return one column.   So I was thinking, we could improve SAS_STAT code by MASKing the results from the two columns...

 

proc sql;
select coid,  sum(profit lt 0 and profit is not mising) gt 1 as Found2,
              sum(profit is not missing ) gt 1 as Any2,        
      from have as mtr
	  where mtr.year le 2000
      group by coid ;

*Combine the columns into one mask variable... ;
select coid, ((sum(profit lt 0 and profit is not mising) gt 1) * 10 ) + (sum(mtr.profit is not missing) gt 1) as Mask from have as MTR where mtr.year le 2000 group by mtr.coid ; quit;

 

 

So, let's go back to your original question and SAS_STAT's code and insert the MASK query, so instead possibly two subqueries for each row, you could have just one ONE correlated subquery query for every row...

 

 

*So the correlated query might look like this... ;

proc sql;
create table want as
    select *,
      case
             ( select 
                  ((sum(profit lt 0 and profit is not missing) gt 1) * 10 ) +
                   (sum(profit is not missing) gt 1) as Mask
                 from have where coid=a.coid and year le 2000
             ) 
         when (11)  then 1
         when (01)  then 0
         when (00)  then .
         else . 
       end as MoreThanTwo 
   from have as a
   order by coid, year ;
quit;

 

 

 

Typically, I would say that you should avoid the correlated query when you can create a summary table.  The ternary logic... Found2  1,  AnyTwo , 0, less than 1 found (00), missing.  Thus adding a third left join might look a little like this...

 

proc sql;

   select have.*, y2.TwoInRow, y3.ThreeInRow,
      case (y4.mask)
	           when (11)  then 1
		   when (01)  then 0
		   when (00)  then .
	           else . 
       end as MoreThanTwo 
      from
       (have       
	   ....
	         ) as y3 
	   left join 

	       ( ... MASK query here or Found2/Any2 query... )  as y4
		   on y1.coid = y4.coid ;

 

 

 

 

The alternative SAS code with likely one pass of the data...

 

 data need; 
   set have(keep=coid year profit) ; *first pass check for runs;
   by coid; 
    if first.coid then do; 
      count=0;
      count_nm=0; 
      call missing( TwoInRow, ThreeInRow, Found2, Any2); 
      retain TwoInRow ThreeInRow Found2 Any2; 
      end;
   if year<=2000 then do; *exclude if check for all years;
      if ( first.coid ne 1 and year-1 ne lag(year)) or profit=. then do; 
          count=0; count_nm=0; 
          end;  *reset counters, non-neg and any sequential year counters; 
      else do;  *positive and negative profits, missing excluded; 
         count_nm+1; *count all non-missing;
         Any2+1;
         if profit<0 then do;  
              count+1;  *count continuous neg;
              Found2+1; *any neg count; 
	      end; 
         else do;  *non-negative profit; 
	      count=0;  *reset continuous neg; 
              end;
         end; 
 
      *Ternary logic --  zero, one, default missing ;
      if count_nm=2 and TwoInRow=. then TwoInRow=0; *Trigger zero value;
      else if count_nm=3 and ThreeInRow=. then ThreeInRow=0; 
 
      if count=2 then TwoInRow=1; * *Trigger one value;
      else if count=3 then ThreeInRow=1; 

      drop count: ;

      end; 
 
   if last.coid;
      if Found2>1 then MoreThan2=1;
	  else if Any2>1  then MoreThan2=0;
      *drop any2 found2;  

   do until(last.coid); *if last, re-read the BY group;
      set have;
      by coid; *reset first./last. ;
      output;
      end; 
run; 

 

 


 

 

 

 

 

 

 

 

 

 

 

 

 

   

 

therock
Calcite | Level 5
Hi Roger,

Thanks so much for help. It worked perfectly. Now I can use this and modify on my data with over 30000 data fields and 100 variables! I have to do at least 100 analysis.

Have a wonderful holiday season!

Thanks,
The rock

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 26 replies
  • 2691 views
  • 2 likes
  • 5 in conversation