Hello,
I'd appreciate some advice on flagging certain rows.
Imagine that you have the following input table:
Deal_No | Deal_Status |
---|---|
454 | 1 |
454 | 76 |
454 | 76 |
3341 | 1 |
3341 | 2 |
3341 | 76 |
4265 | 2 |
4265 | 77 |
5102 | 1 |
5102 | 2 |
5102 | 2 |
5102 | 77 |
The Deal_Status values 1 and 2 indicate active deals, while 76 and 77 indicate inactive deals.
I need to write user written code that sets a flag (for example called Active_Deal_Count) to "1" on the last row where Deal_Status is either 1 or 2, per Deal_No.
The output should look like this:
Deal_No | Deal_Status | Active_Deal_Count |
---|---|---|
454 | 1 | 1 |
454 | 76 | |
454 | 76 | |
3341 | 1 | |
3341 | 2 | 1 |
3341 | 76 | |
4265 | 2 | 1 |
4265 | 77 | |
5102 | 1 | |
5102 | 2 | |
5102 | 2 | 1 |
5102 | 77 |
This should preferably be done in a data step.
Thanks for your time and any advice you might provide.
Here's what I've got so far (adjusted to fit the simplified example data). I only need help filling out the red part of the code. Should be easy, but I must be getting rusty... can't seem to find the correct logic.
proc sort data = &_input1;
by deal_no deal_status;
run;
data &_output1;
set &_input1;
by deal_no;
retain deal_is_counted;
if first.deal_no then do;
retain deal_is_counted = 0;
end;
if deal_is_counted = 0 & <the max deal_status that isn't 76 or 77 (per deal_no)> then do
active_deal_count = 1;
end;
run;
If you had a way to have status 1 and 2 be equal BY levels you could use LAST.DEAL_STATUS. And there is.
If you are quite sure that these codes stable, perhaps a hard coding technique could be acceptable.
I assume that there is some kind of date column involved as well?
Can we assume that once a deal is inactive, it can never be active again (by looking at your attached example)?
If so, first sort your data according to deal and status, and create a status group columns (1,2 = Active, 76,77 = Inactive).
Then use data step with BY and if last. logic on that result (if last.status_group and status_group = 'Active' then active_deal_count = 1).
Thanks for the input so far, data_null & LinusH!
I've added a field called "ActiveFlg" to the input table, so that it looks like this:
Deal_No | Deal_Status | ActiveFlg |
---|---|---|
454 | 1 | 1 |
454 | 76 | 0 |
454 | 76 | 0 |
3341 | 1 | 1 |
3341 | 2 | 1 |
3341 | 76 | 0 |
4265 | 2 | 1 |
4265 | 77 | 0 |
5102 | 1 | 1 |
5102 | 2 | 1 |
5102 | 2 | 1 |
5102 | 77 | 0 |
I'm also using this simplified code:
proc sort data = &_input1;
by deal_no deal_status
run;
data &_output1;
set &_input1;
by deal_no deal_status
if last.deal_status & activeflg ="1" then do
active_deal_count = 1;
end;
run;
This works for all deals, except the ones where the last deal_status is a cancelled one. How do I improve the red line so that it selects the last deal_status that has an activeflg, instead of selecting the last deal_status then excluding those without activelfg?
PS: It is correct that once a deal is inactive, it can never be active again.
I don't understand your reservation about your red line. What is the problem? The & means that the if statement will evaluate both conditions at the same time....?!
If I run the above code based on this input:
Deal_No | Deal_Status | ActiveFlg |
---|---|---|
123 | 1 | 1 |
123 | 2 | 1 |
123 | 2 | 1 |
456 | 1 | 1 |
456 | 76 | 0 |
I get this output:
Deal_No | Deal_Status | Active_Deal_Count |
---|---|---|
123 | 1 | |
123 | 2 | |
123 | 2 | 1 |
456 | 1 | |
456 | 76 |
So as you can see it correctly sets Active_Deal_Count to "1" for the last row of deal 123 (an active row), but it doesn't set a value for deal 456 at all. All the deals that end with an inactive row don't get Active_Deal_Count = "1" at all, even though they do have rows with active statuses. In the table above, the 4th row should have gotten Active_Deal_Count = "1" if the logic was correct.
Do you have any idea what might be wrong?
: I agree with everyone else that DataNull's suggested code is the most elegant approach and appears to accomplish what you want. Your code doesn't work because, without the format, your use of last.deal_status is in effect when you don't want it to be.
If you wanted to accomplish the task without a using a format, you could do it with a double DOW. e.g.:
data want (drop=recnum target);
do until (last.Deal_No);
set have;
by Deal_No ;
retain target;
if first.Deal_No then do;
recnum=1;
target=0;
end;
else recnum +1;
if Deal_Status in (1,2) then target=recnum;
end;
do until (last.Deal_No);
set have;
by Deal_No ;
if first.Deal_No then recnum=1;
else recnum +1;
if target eq recnum then Active_Deal_Count=1;
else call missing(Active_Deal_Count);
output;
end;
run;
Thanks for all the feedback. I'm trying to adapt data_null's solution for my dataset (significantly larger and more complex than the example), but am having some trouble. I take it the cards are to be substituted with my actual input data set, so that I start with the proc format section, but how would I adjust the code below in order to reference both the data set (&_input1) and the column with status values (deal_status)?
proc format;
value deal 1,2='Active' other='Inactive';
run;
I work in DI Studio 99% of the time, and definitely lack experience working with things such as cards.
Perhaps I'll see things clearer on Monday... I've been staring at this problem for the entire day, so my mind is rather muddy.
Thanks again for your time and patience.
you don't need cards;
Can you create the format that groups your active and inactive codes? Do you know the data?
Hello again,
I've adapted your code for my data set data_null_, but there's one problem. When a deal has numerous statuses, and the last status is inactive, your code correctly sets the flag on the last row with an active status. However, when the deal has numerous statuses, and several of them are active, your code sets the flag on all the rows with active statuses. Each deal should only have one flag, which should be on the last row with an active status.
Then again I may have adjusted your code incorrectly...
Give example of that data.
You don't really need add a field or sort it if the sole purpose of it is for the flagging, Data _null_'s approach is sufficient and elegant, it gave you the exact results you asked for. If you don't want the format attached, just remove it after the process,
proc datasets lib=work memtype=data;
modify deal2;
attrib status format=;
run;
You just need to add an inactive flag.
With Proc SQL it's quite easy:
proc sql;
create table Active as
select deal.no,
deal.Status,
sub.Active_deal_count
from deal left join (select distinct
no,
status,
1 as Active_deal_count
from deal
where status not in (76,77)
group by no
having status = max(status) ) sub
on deal.no = sub.no
and deal.status = sub.status;
quit;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.