Hello everyone,
I have this table without Def_flag colum. I want create a new column Def_flag with value "1" or "0" when the value of column "RAT" is equal to R then 12 months before in colum Def_flag goes value "1" else "0". And for each ID must by separated. Result you can see below.
ID | DATE | RAT | Def_flag |
9 | 201705 | N | 0 |
9 | 201704 | N | 0 |
9 | 201703 | N | 0 |
9 | 201702 | N | 0 |
9 | 201701 | N | 0 |
9 | 201612 | N | 0 |
9 | 201611 | N | 0 |
9 | 201610 | N | 0 |
9 | 201609 | N | 0 |
9 | 201608 | N | 0 |
9 | 201607 | N | 0 |
9 | 201606 | N | 0 |
9 | 201605 | N | 0 |
9 | 201604 | R | 1 |
9 | 201603 | N | 1 |
9 | 201602 | N | 1 |
9 | 201601 | N | 1 |
9 | 201512 | N | 1 |
9 | 201511 | N | 1 |
9 | 201510 | N | 1 |
9 | 201509 | N | 1 |
9 | 201508 | N | 1 |
9 | 201507 | N | 1 |
9 | 201506 | N | 1 |
9 | 201505 | N | 1 |
9 | 201504 | N | 0 |
9 | 201503 | N | 0 |
9 | 201502 | N | 0 |
9 | 201501 | N | 0 |
9 | 201412 | N | 0 |
11 | 201705 | N | 0 |
11 | 201704 | N | 0 |
11 | 201703 | N | 0 |
11 | 201702 | N | 0 |
11 | 201701 | N | 0 |
11 | 201612 | N | 0 |
11 | 201611 | N | 0 |
11 | 201610 | N | 0 |
11 | 201609 | N | 0 |
11 | 201608 | N | 0 |
11 | 201607 | N | 0 |
11 | 201606 | N | 0 |
11 | 201605 | N | 0 |
11 | 201604 | N | 0 |
11 | 201603 | R | 1 |
11 | 201602 | N | 1 |
11 | 201601 | N | 1 |
11 | 201512 | N | 1 |
11 | 201511 | N | 1 |
11 | 201510 | N | 1 |
11 | 201509 | N | 1 |
11 | 201508 | N | 1 |
11 | 201507 | N | 1 |
11 | 201506 | N | 1 |
11 | 201505 | N | 1 |
11 | 201504 | N | 1 |
11 | 201503 | N | 0 |
11 | 201502 | N | 0 |
11 | 201501 | N | 0 |
11 | 201412 | N | 0 |
Can someone give me a help to create a data step?
Thanks!!!!
Sort the data in reverse, then when you hit R keep a counter and set the other varible, untested code - post test data as a datastep in future:
proc sort data=have out=want; by id descending date; run; data want; set want; retain cnt def_flag; by id; if first.id or cnt > 12 then do; cnt=0; chk=0; def_flag=0; end; else if chk then do; cnt=cnt+1; def_flag=1; end; else if rat="R" then do; chk=1; cnt=1; def_flag=1; end; run;
I posted excel data and there is result colum how it should look like. The code is working but value "1" goes to 17 rows insted 12. I made screen of my data.
Here is the screen as result of your code.
Solution will depend upon whether date is a date field, a number that looks like a date, or a character field that looks like a numeric date.
Here is a method of a number that looks like a date:
data want; do until (last.id); set have; by id; if rat eq 'R' then check=date; end; do until (last.id); set have; by id; if intnx('month',input(put(check,6.),yymmn6.),-11) le input(put(date,6.),yymmn6.) le intnx('month',input(put(check,6.),yymmn6.),11) then Def_flag=1; else Def_flag=0; output; end; run;
Art, CEO, AnalystFinder.com
It is numeric field that looks like a date.
Can the same ID have more than one "R" observation?
Yes it can have more the one "R" observation.
Here's an approach that can handle multiple "R" records within the same ID. First, create a map that shows every conceivable record that should be flagged:
data flag_me;
set have (keep=id rat);
if rat='R' ;
do obsno = _N_ - 12 to _N_ + 12;
start = catx('+', id, obsno);
output;
end;
keep start;
run;
This list can include some duplicates, and can include some combinations that won't actually appear in the data. Those situations won't cause a problem. Next, dedup:
proc sort data=flag_me nodupkey;
by start;
run;
Then add the rest of the information needed to convert ID + _N_ into a format:
data flag_me;
set flag_me end=done;
retain fmtname '$def_flag';
label='1' ;
output;
if done;
hlo='O' ;
label='0' ;
output;
run;
proc format cntlin=flag_me;
run;
Finally, apply the format:
data want;
set have;
def_flag = put( catx('+', id, _n_), $def_flag.);
run;
This is untested at the moment, I'm hoping I can get to that later. And it creates DEF_FLAG as a character field. It can be converted to numeric applying the INPUT function, or alternatively the program could create an INFORMAT instead of a FORMAT.
Sorry, disregard this. It does what I intended it to do, but I was solving the wrong problem. This flags up to 25 records per R: the 12 before the "R", the 12 after the "R", and the "R" itself. I will suggest something else as a separate response.
Based on your revised requirements, here is a revised version of my suggested code:
data want (drop=check checkdate); set have; retain check checkdate; by id; if first.id then check=0; if rat eq 'R' then do; check=1; checkdate=date; end; Def_flag=0; if check eq 1 then do; if intnx('month',input(put(checkdate,6.),yymmn6.),-11) le input(put(date,6.),yymmn6.) le input(put(checkdate,6.),yymmn6.) then Def_flag=1; end; run;
Art, CEO, AnalystFinder.com
Here's an alternate solution that doesn't involve the dates. For it to work, the dates have to be complete (no months skipped in the middle) and the observations have to be in order. It just flags "R" observations and up to 12 observations following each "R" observation.
data want;
set have;
by id;
if first.id then r_count=20;
if rat='R' then r_count=0;
r_count + 1;
if r_count <= 13 then def_flag=1;
else def_flag=0;
drop r_count;
run;
It wasn't 100% clear whether you wanted to flag the "R" record plus 12 additional records (which this does) or only 11 additional records (which your example illustrates). But it's easy enough to change the "13" to "12" in this program if need be.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.