I have the following dataset.
What I would like to do is
| SubjectID | final_stop_date_time | start_date_time_1 | stop_date_time_1 | group |
| 11 | 12FEB2020:10:59:00 | 11FEB2020:15:30:00 | 11FEB2020:16:08:00 | 2 |
| 11 | 12FEB2020:10:59:00 | 11FEB2020:16:08:00 | 12FEB2020:10:59:00 | 2 |
| 11 | 12FEB2020:10:59:00 | 12FEB2020:21:40:00 | . | 2 |
Want:
| SubjectID | final_stop_date_time | start_date_time_1 | stop_date_time_1 | group | keep |
| 11 | 12FEB2020:10:59:00 | 11FEB2020:15:30:00 | 11FEB2020:16:08:00 | 2 | 1 |
| 11 | 12FEB2020:10:59:00 | 11FEB2020:16:08:00 | 12FEB2020:10:59:00 | 2 | 1 |
| 11 | 12FEB2020:10:59:00 | 12FEB2020:21:40:00 | . | 2 | 0 |
I tried the following and doesn't work
data want; set have;
if group = '2' and
(stop_date_time_1 <= final_stop_date_time) then keep = 1 ;
else if
(group = '2' and ((start_date_time_1 > final_stop_date_time ) and missing(stop_date_time_1)) OR
(group = '2' and (stop_date_time_2 > final_stop_date_time))
then keep = 0;
run;
Doesn't work is awful vague.
Are there errors in the log?: Post the code and log in a code box opened with the <> to maintain formatting of error messages.
No output? Post any log in a code box.
Unexpected output? Provide input data in the form of data step code pasted into a code box, the actual results and the expected results. Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the <> icon or attached as text to show exactly what you have and that we can test code against.
I'm going to go out on a limb and guess that part of the problem is you get Keep=1 for missing values of the Stop_date_time.
if group = '2' and not missing(stop_date_time) and (stop_date_time_1 <= final_stop_date_time) then keep = 1 ;
Missing values are always less than any given value. So you need to explicitly limit that part.
Alternate could be to provide a lower limit instead of the missing function:
if group = '2' and ( 0 <= stop_date_time_1 <= final_stop_date_time) then keep = 1 ;
but that approach may not be appropriate if you do not know what your minimum actual value of the stop_date_time should be.
I am assuming that your datetime values are SAS datetimes valued. If they are character then any comparison with < or > is almost always a very bad idea. 'January' comes after 'April' in terms of inequalities.
There is no error but this how final dataset looks. The last record should keep = 0 because stop_date_time_2 is null and start_date_time_1 occurs after final_stop_date_time
| SubjectID | final_stop_date_time | start_date_time_1 | stop_date_time_2 | group | keep |
| 11 | 12FEB2020:10:59:00 | 11FEB2020:15:30:00 | 11FEB2020:16:08:00 | 2 | 1 |
| 11 | 12FEB2020:10:59:00 | 11FEB2020:16:08:00 | 12FEB2020:10:59:00 | 2 | 1 |
| 11 | 12FEB2020:10:59:00 | 12FEB2020:21:40:00 | . | 2 | 1 |
Count the ( and ) in this:
(group = '2' and ((start_date_time_1 > final_stop_date_time ) and missing(stop_date_time_1)) OR (group = '2' and (stop_date_time_2 > final_stop_date_time))
you have an unbalanced () somewhere.
Not nice to include variables in code without example data.
Since every single one of your conditions involves group='2' a good style choice is to factor out the common element:
If group='2' then do; <condition one> <next condition> <etc> end;
which will make it easier to read and understand that this block all involves group='2'. And sometimes you can introduce logic errors with minor typos for repeated code.
data new;
informat final_stop_date_time start_date_time_1 stop_date_time_1 datetime19.;
input SubjectID final_stop_date_time start_date_time_1 stop_date_time_1 group $;
format final_stop_date_time start_date_time_1 stop_date_time_1 datetime19.;
if group = '2' and
(stop_date_time_1 <= final_stop_date_time) then keep = 1 ;
if
(group = '2' and ((start_date_time_1 >= final_stop_date_time ) and missing(stop_date_time_1))) OR
(group = '2' and (stop_date_time_1 > final_stop_date_time))
then keep = 0;
cards;;;
11 12FEB2020:10:59:00 11FEB2020:15:30:00 11FEB2020:16:08:00 2
11 12FEB2020:10:59:00 11FEB2020:16:08:00 12FEB2020:10:59:00 2
11 12FEB2020:10:59:00 12FEB2020:21:40:00 . 2
;;;
run;
reasons why it did not work before:
1. (stop_date_time_2 > final_stop_date_time) stop_date_time_2 does not exist
2. It did not go into else condition as when the date column is missing first if condition is always true
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.