I'm sorry but I couldn't get the data to load correctly but here it is.
id_num | fstdt | snddt | thirddt |
175253072 | . | 10JAN2021:20:40:22.000 | 11JAN2021:21:16:00.000 |
175253072. | . | 12JAN2021:20:17:53.000 | 11JAN2021:21:16:00.000 |
176039146 | . | 20JAN2021:21:51:05.000 | 21JAN2021:17:07:00.000 |
176039146 | . | 20JAN2021:22:18:23.000 | 21JAN2021:17:07:00.000 |
176249875 | . | 01FEB2021:15:05:12.000 | 01FEB2021:13:39:00.000 |
176249875 | 3FEB2021:18:19:51.000 | 03FEB2021:13:39:00.000 | 02FEB2021:18:19:51.000 |
The dates in the second column are all missing except for the last row.
What I'm trying to do is take the lesser of the fstdt or snddt column (call it comparedt) and use it to compare to the thirddt column. If the thirddt column is less than the comparedt then create a new column called validdt and put the comparedt in it. If the thirddt is not less than the comparedt then 'NA' goes in the new validdt column.
Note that there are 2 rows and in the live data there could be more than 2 rows per id_num. In the case of the first id_num the snddt is less than the thirddt so we go to the 2nd row of that id_num and see that the snddt is greater than the thirddt which is really what I want so we put the snddt in the new validdt column.
In the second set of rows for id_num 176039146 the snddt on both rows is less than the thirddt so 'NA' goes in the validdt column.
In the last set of id_num rows both the fstdt and snddt dates are greater than the thirddt column so we want to use one of those for the validdt column. Since the snddt is earlier than the fstdt column, it goes in the validdt column.
I hope that is clear enough. If not please ask questions.
Thanks for any help anyone can provide.
To do proper comparisons your "dates" need to be numeric SAS date (or time or datetime) values. As such if you are placing one of those "date" values into a variable then the variable would be numeric and you cannot assign "NA" to a value in a numeric column as that is character. There are ways to get something like that but it involves formats and special missing values.
If the TIME component you show is important then your values are datetime values and not dates. This is critical to understand because SAS uses count of days from 1 Jan 1960 to measure dates, and datetimes uses seconds from midnight on the same day. This is very important the functions that do things like determine how many months are between two dates (or datetimes) or advance dates/datetimes will return the wrong value if you use the syntax for a "date" with datetimes or vice versa.
The best way to share data is as data step code such as this pasted into a text box (the </> icon) or code box (the little running man icon) to preserve formatting. The forum re-formats pasted text in the main message windows and the result may not run when copied into an editor in SAS and executed.
data have; input id_num :$10. (fstdt snddt thirddt) (:datetime22.) ; format fstdt snddt thirddt datetime22.3; datalines; 175253072 . 10JAN2021:20:40:22.000 11JAN2021:21:16:00.000 175253072. . 12JAN2021:20:17:53.000 11JAN2021:21:16:00.000 176039146 . 20JAN2021:21:51:05.000 21JAN2021:17:07:00.000 176039146 . 20JAN2021:22:18:23.000 21JAN2021:17:07:00.000 176249875 . 01FEB2021:15:05:12.000 01FEB2021:13:39:00.000 176249875 3FEB2021:18:19:51.000 03FEB2021:13:39:00.000 02FEB2021:18:19:51.000 ;
If the fractions of a second are not actually used anywhere the format to display them could be a DATETIME18. ;
Now, using something similar to the above code show what the expected results would look like.
To do proper comparisons your "dates" need to be numeric SAS date (or time or datetime) values. As such if you are placing one of those "date" values into a variable then the variable would be numeric and you cannot assign "NA" to a value in a numeric column as that is character. There are ways to get something like that but it involves formats and special missing values.
If the TIME component you show is important then your values are datetime values and not dates. This is critical to understand because SAS uses count of days from 1 Jan 1960 to measure dates, and datetimes uses seconds from midnight on the same day. This is very important the functions that do things like determine how many months are between two dates (or datetimes) or advance dates/datetimes will return the wrong value if you use the syntax for a "date" with datetimes or vice versa.
The best way to share data is as data step code such as this pasted into a text box (the </> icon) or code box (the little running man icon) to preserve formatting. The forum re-formats pasted text in the main message windows and the result may not run when copied into an editor in SAS and executed.
data have; input id_num :$10. (fstdt snddt thirddt) (:datetime22.) ; format fstdt snddt thirddt datetime22.3; datalines; 175253072 . 10JAN2021:20:40:22.000 11JAN2021:21:16:00.000 175253072. . 12JAN2021:20:17:53.000 11JAN2021:21:16:00.000 176039146 . 20JAN2021:21:51:05.000 21JAN2021:17:07:00.000 176039146 . 20JAN2021:22:18:23.000 21JAN2021:17:07:00.000 176249875 . 01FEB2021:15:05:12.000 01FEB2021:13:39:00.000 176249875 3FEB2021:18:19:51.000 03FEB2021:13:39:00.000 02FEB2021:18:19:51.000 ;
If the fractions of a second are not actually used anywhere the format to display them could be a DATETIME18. ;
Now, using something similar to the above code show what the expected results would look like.
If I understood the logic you've described correctly then below should work.
data have;
input id_num :$10. (fstdt snddt thirddt) (:datetime22.) ;
format fstdt snddt thirddt datetime22.3;
datalines;
175253072 . 10JAN2021:20:40:22.000 11JAN2021:21:16:00.000
175253072 . 12JAN2021:20:17:53.000 11JAN2021:21:16:00.000
176039146 . 20JAN2021:21:51:05.000 21JAN2021:17:07:00.000
176039146 . 20JAN2021:22:18:23.000 21JAN2021:17:07:00.000
176249875 . 01FEB2021:15:05:12.000 01FEB2021:13:39:00.000
176249875 3FEB2021:18:19:51.000 03FEB2021:13:39:00.000 02FEB2021:18:19:51.000
;
/* data step version */
data _valid_dttm(keep=id_num valid_dttm);
set have;
by id_num;
format valid_dttm datetime22.3;
retain valid_dttm;
if max(fstdt,snddt)>thirddt then
valid_dttm=max(valid_dttm,fstdt,snddt);
if last.id_num then
do;
output;
call missing(valid_dttm);
end;
run;
data want1;
merge have _valid_dttm;
by id_num;
run;
proc format;
value na_for_missing(default=22)
low-high=[datetime22.3]
other='NA'
;
quit;
proc print data=want1;
format valid_dttm na_for_missing.;
run;
/* SQL Version */
proc sql;
create table want2 as
select
*,
max(
case
when fstdt<=thirddt and snddt<=thirddt then .
else max(fstdt,snddt)
end
)
as valid_dttm format=datetime22.3
from have
group by
id_num
order by
id_num,
snddt
;
quit;
proc print data=want2;
format valid_dttm na_for_missing.;
run;
@Patrick wrote:
If I understood the logic you've described correctly then below should work.
data have; input id_num :$10. (fstdt snddt thirddt) (:datetime22.) ; format fstdt snddt thirddt datetime22.3; datalines; 175253072 . 10JAN2021:20:40:22.000 11JAN2021:21:16:00.000 175253072 . 12JAN2021:20:17:53.000 11JAN2021:21:16:00.000 176039146 . 20JAN2021:21:51:05.000 21JAN2021:17:07:00.000 176039146 . 20JAN2021:22:18:23.000 21JAN2021:17:07:00.000 176249875 . 01FEB2021:15:05:12.000 01FEB2021:13:39:00.000 176249875 3FEB2021:18:19:51.000 03FEB2021:13:39:00.000 02FEB2021:18:19:51.000 ; /* data step version */ data _valid_dttm(keep=id_num valid_dttm); set have; by id_num; format valid_dttm datetime22.3; retain valid_dttm; if max(fstdt,snddt)>thirddt then valid_dttm=max(valid_dttm,fstdt,snddt); if last.id_num then do; output; call missing(valid_dttm); end; run; data want1; merge have _valid_dttm; by id_num; run; proc format; value na_for_missing(default=22) low-high=[datetime22.3] other='NA' ; quit; proc print data=want1; format valid_dttm na_for_missing.; run; /* SQL Version */ proc sql; create table want2 as select *, max( case when fstdt<=thirddt and snddt<=thirddt then . else max(fstdt,snddt) end ) as valid_dttm format=datetime22.3 from have group by id_num order by id_num, snddt ; quit; proc print data=want2; format valid_dttm na_for_missing.; run;
This works. I forgot to say that I wanted to end up with only one row per id so I added this after your first data step and I have what I need.
data valid_dttm2; set _valid_dttm; format valid_dttm na_for_missing.; run;
Thanks Patrick.
@Patrick wrote:
If I understood the logic you've described correctly then below should work.
data have; input id_num :$10. (fstdt snddt thirddt) (:datetime22.) ; format fstdt snddt thirddt datetime22.3; datalines; 175253072 . 10JAN2021:20:40:22.000 11JAN2021:21:16:00.000 175253072 . 12JAN2021:20:17:53.000 11JAN2021:21:16:00.000 176039146 . 20JAN2021:21:51:05.000 21JAN2021:17:07:00.000 176039146 . 20JAN2021:22:18:23.000 21JAN2021:17:07:00.000 176249875 . 01FEB2021:15:05:12.000 01FEB2021:13:39:00.000 176249875 3FEB2021:18:19:51.000 03FEB2021:13:39:00.000 02FEB2021:18:19:51.000 ; /* data step version */ data _valid_dttm(keep=id_num valid_dttm); set have; by id_num; format valid_dttm datetime22.3; retain valid_dttm; if max(fstdt,snddt)>thirddt then valid_dttm=max(valid_dttm,fstdt,snddt); if last.id_num then do; output; call missing(valid_dttm); end; run;
I don't see that "missing" macro. Where is it?
Thanks
Your need is unclear, however a little tweaking of this code should get you the result you want:
data WANT;
do until(last.ID_NUM);
set HAVE;
by ID_NUM;
MIN=min(MIN, FSTDT, SNDDT);
MAX=max(MAX, FSTDT, SNDDT);
end;
format VALIDDT datetime.;
do until(last.ID_NUM);
set HAVE;
by ID_NUM;
if THIRDDT > MAX then VALIDDT = .N ;
else VALIDDT = MIN;
output;
end;
run;
ID_NUM | FSTDT | SNDDT | THIRDDT | VALIDDT |
---|---|---|---|---|
175253072 | . | 10JAN2021:20:40:22 | 11JAN2021:21:16:00 | N |
175253072. | . | 12JAN2021:20:17:53 | 11JAN2021:21:16:00 | 12JAN21:20:17:53 |
176039146 | . | 20JAN2021:21:51:05 | 21JAN2021:17:07:00 | N |
176039146 | . | 20JAN2021:22:18:23 | 21JAN2021:17:07:00 | N |
176249875 | . | 01FEB2021:15:05:12 | 01FEB2021:13:39:00 | 01FEB21:15:05:12 |
176249875 | 03FEB2021:18:19:51 | 03FEB2021:13:39:00 | 02FEB2021:18:19:51 | 01FEB21:15:05:12 |
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.