SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Flagging rows based on certain criteria

Reply
Frequent Contributor
Posts: 89

Flagging rows based on certain criteria

Hello,

I'd appreciate some advice on flagging certain rows.

Imagine that you have the following input table:

Deal_NoDeal_Status

454

1
45476
45476
33411
33412
334176
42652
426577
51021
51022
51022
510277

The Deal_Status values 1 and 2 indicate active deals, while 76 and 77 indicate inactive deals.

I need to write user written code that sets a flag (for example called Active_Deal_Count) to "1" on the last row where Deal_Status is either 1 or 2, per Deal_No.

The output should look like this:

Deal_NoDeal_StatusActive_Deal_Count
45411
45476
45476
33411
334121
334176
426521
426577
51021
51022
510221
510277

This should preferably be done in a data step.

Thanks for your time and any advice you might provide. Smiley Happy

Frequent Contributor
Posts: 89

Re: Flagging rows based on certain criteria

Posted in reply to TurnTheBacon

Here's what I've got so far (adjusted to fit the simplified example data). I only need help filling out the red part of the code. Should be easy, but I must be getting rusty... can't seem to find the correct logic.

proc sort data = &_input1;

    by deal_no deal_status;

run;

data &_output1;

    set &_input1;

    by deal_no;

    retain deal_is_counted;

    if first.deal_no then do;

        retain deal_is_counted = 0;

    end;

    if deal_is_counted = 0 & <the max deal_status that isn't 76 or 77 (per deal_no)> then do

        active_deal_count = 1;

    end;

run;

Respected Advisor
Posts: 3,799

Re: Flagging rows based on certain criteria

Posted in reply to TurnTheBacon

If you had a way to have status 1 and 2 be equal BY levels you could use LAST.DEAL_STATUS.  And there is.

data deal;
   input no status;
   cards;
  454   1   
  454  76   
  454  76   
3341  1   
3341  2   
3341  76   
4265  2   
4265  77   
5102  1   
5102  2   
5102  2   
5102  77
;;;;
   run;
proc format;
  
value deal 1,2='Active' other='Inactive';
  
run;
data deal2;
   set deal;
   format status deal.;
  
by no status groupformat notsorted;
  
if strip(vvalue(status)) eq 'Active' and last.status then flag=1;
  
run;
proc print;
  
format _all_;
   run;
Super User
Posts: 5,431

Re: Flagging rows based on certain criteria

Posted in reply to TurnTheBacon

If you are quite sure that these codes stable, perhaps a hard  coding technique could be acceptable.

I assume that there is some kind of date column involved as well?

Can we assume that once a deal is inactive, it can never be active again (by looking at your attached example)?

If so, first sort your data according to deal and status, and create a status group columns (1,2 = Active, 76,77 = Inactive).

Then use data step with BY and if last. logic on that result (if last.status_group and status_group = 'Active' then active_deal_count = 1).

Data never sleeps
Frequent Contributor
Posts: 89

Re: Flagging rows based on certain criteria

Thanks for the input so far, data_null & LinusH!

I've added a field called "ActiveFlg" to the input table, so that it looks like this:

Deal_NoDeal_StatusActiveFlg
45411
454760
454760
334111
334121
3341760
426521
4265770
510211
510221
510221
5102770

I'm also using this simplified code:

proc sort data = &_input1;

    by deal_no deal_status

run;

data &_output1;

    set &_input1;

    by deal_no deal_status

    if last.deal_status & activeflg ="1" then do

        active_deal_count = 1;

    end;

run;

This works for all deals, except the ones where the last deal_status is a cancelled one. How do I improve the red line so that it selects the last deal_status that has an activeflg, instead of selecting the last deal_status then excluding those without activelfg?

PS: It is correct that once a deal is inactive, it can never be active again.

Super User
Posts: 5,431

Re: Flagging rows based on certain criteria

Posted in reply to TurnTheBacon

I don't understand your reservation about your red line. What is the problem? The & means that the if statement will evaluate both conditions at the same time....?!

Data never sleeps
Frequent Contributor
Posts: 89

Re: Flagging rows based on certain criteria

If I run the above code based on this input:

Deal_NoDeal_StatusActiveFlg
12311
12321
12321
45611
456760

I get this output:

Deal_NoDeal_StatusActive_Deal_Count
1231
1232
12321
4561
45676

So as you can see it correctly sets Active_Deal_Count to "1" for the last row of deal 123 (an active row), but it doesn't set a value for deal 456 at all. All the deals that end with an inactive row don't get Active_Deal_Count = "1" at all, even though they do have rows with active statuses. In the table above, the 4th row should have gotten Active_Deal_Count = "1" if the logic was correct.

Do you have any idea what might be wrong?

PROC Star
Posts: 7,474

Re: Flagging rows based on certain criteria

Posted in reply to TurnTheBacon

: I agree with everyone else that DataNull's suggested code is the most elegant approach and appears to accomplish what you want.  Your code doesn't work because, without the format, your use of last.deal_status is in effect when you don't want it to be.

If you wanted to accomplish the task without a using a format, you could do it with a double DOW.  e.g.:

data want (drop=recnum target);

  do until (last.Deal_No);

    set have;

    by Deal_No ;

    retain target;

    if first.Deal_No then do;

      recnum=1;

      target=0;

    end;

    else recnum +1;

    if Deal_Status in (1,2) then target=recnum;

  end;

  do until (last.Deal_No);

    set have;

    by Deal_No ;

    if first.Deal_No then recnum=1;

    else recnum +1;

    if target eq recnum then Active_Deal_Count=1;

    else call missing(Active_Deal_Count);

    output;

  end;

run;

Frequent Contributor
Posts: 89

Re: Flagging rows based on certain criteria

Thanks for all the feedback. I'm trying to adapt data_null's solution for my dataset (significantly larger and more complex than the example), but am having some trouble. I take it the cards are to be substituted with my actual input data set, so that I start with the proc format section, but how would I adjust the code below in order to reference both the data set (&_input1) and the column with status values (deal_status)?

proc format;
  
value deal 1,2='Active' other='Inactive';
run;


I work in DI Studio 99% of the time, and definitely lack experience working with things such as cards.


Perhaps I'll see things clearer on Monday... I've been staring at this problem for the entire day, so my mind is rather muddy.


Thanks again for your time and patience.

Respected Advisor
Posts: 3,799

Re: Flagging rows based on certain criteria

Posted in reply to TurnTheBacon

you don't need cards;

Can you create the format that groups your active and inactive codes?  Do you know the data?

Frequent Contributor
Posts: 89

Re: Flagging rows based on certain criteria

Posted in reply to data_null__

Hello again,

I've adapted your code for my data set data_null_, but there's one problem. When a deal has numerous statuses, and the last status is inactive, your code correctly sets the flag on the last row with an active status. However, when the deal has numerous statuses, and several of them are active, your code sets the flag on all the rows with active statuses. Each deal should only have one flag, which should be on the last row with an active status.

Then again I may have adjusted your code incorrectly... Smiley Happy

Respected Advisor
Posts: 3,799

Re: Flagging rows based on certain criteria

Posted in reply to TurnTheBacon

Give example of that data.

Respected Advisor
Posts: 3,156

Re: Flagging rows based on certain criteria

Posted in reply to TurnTheBacon

You don't really need add a field or sort it if the sole purpose of it is for the flagging, Data _null_'s approach is sufficient and elegant, it gave you the exact results you asked for.  If you don't want the format attached, just remove it after the process,

proc datasets lib=work memtype=data;

  modify deal2;

  attrib status format=;

run;

Respected Advisor
Posts: 3,799

Re: Flagging rows based on certain criteria

Posted in reply to TurnTheBacon

You just need to add an inactive flag.

data deal;
   input no status;
   cards;
  454   1   
  454  76   
  454  76   
3341  1   
3341  2   
3341  76   
4265  2   
4265  77   
4265  1
5102  1   
5102  2   
5102  2   
5102  77
;;;;
   run;
proc format;
  
value deal 1,2='Active' other='Inactive';
  
run;
data deal2;
   set deal;
   format status deal.;
  
by no status groupformat notsorted;
  
if first.no then iflag=0;
   _status = first(left(vvalue(status)));
  
if _status eq 'I' then iflag+1;
  
if not iflag and _status eq 'A' and last.status then flag=1;
  
run;
proc print;
  
format _all_;
   run;
Contributor
Posts: 44

Re: Flagging rows based on certain criteria

Posted in reply to TurnTheBacon

With Proc SQL it's quite easy:

proc sql;

create table Active as

select deal.no,

         deal.Status,

         sub.Active_deal_count

from   deal left join (select distinct

                                      no,

                                      status,

                                      1 as Active_deal_count

                            from deal

                            where status not in (76,77)

                            group by no

                            having status = max(status) ) sub

          on deal.no = sub.no

         and deal.status = sub.status;

quit;


Ask a Question
Discussion stats
  • 14 replies
  • 744 views
  • 7 likes
  • 6 in conversation