BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
DanD999
Quartz | Level 8

I'm sorry but I couldn't get the data to load correctly but here it is.                  

 

id_num fstdt  snddt     thirddt    
175253072 . 10JAN2021:20:40:22.000 11JAN2021:21:16:00.000
175253072. . 12JAN2021:20:17:53.000 11JAN2021:21:16:00.000
176039146 . 20JAN2021:21:51:05.000 21JAN2021:17:07:00.000
176039146 . 20JAN2021:22:18:23.000 21JAN2021:17:07:00.000
176249875 . 01FEB2021:15:05:12.000 01FEB2021:13:39:00.000
176249875 3FEB2021:18:19:51.000 03FEB2021:13:39:00.000 02FEB2021:18:19:51.000

 

 

The dates in the second column are all missing except for the last row. 

What I'm trying to do is take the lesser of the fstdt or snddt column (call it comparedt) and use it to compare to the thirddt column. If the thirddt column is less than the comparedt then create a new column called validdt and put the comparedt in it. If the thirddt is not less than the comparedt then 'NA' goes in the new validdt column. 

 

Note that there are 2 rows and in the live data there could be more than 2 rows per id_num. In the case of the first id_num the snddt is less than the thirddt so we go to the 2nd row of that id_num and see that the snddt is greater than the thirddt which is really what I want so we put the snddt in the new validdt column.

 

In the second set of rows for id_num 176039146 the snddt on both rows is less than the thirddt so 'NA' goes in the validdt column.

 

In the last set of id_num rows both the fstdt and snddt dates are greater than the thirddt column so we want to use one of those for the validdt column. Since the snddt is earlier than the fstdt column, it goes in the validdt column.

 

I hope that is clear enough. If not please ask questions.

 

Thanks for any help anyone can provide.

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

To do proper comparisons your "dates" need to be numeric SAS date (or time or datetime) values. As such if you are placing one of those "date" values into a variable then the variable would be numeric and you cannot assign "NA" to a value in a numeric column as that is character. There are ways to get something like that but it involves formats and special missing values.

 

If the TIME component you show is important then your values are datetime values and not dates. This is critical to understand because SAS uses count of days from 1 Jan 1960 to measure dates, and datetimes uses seconds from midnight on the same day. This is very important the functions that do things like determine how many months are between two dates (or datetimes) or advance dates/datetimes will return the wrong value if you use the syntax for a "date" with datetimes or vice versa.

 

The best way to share data is as data step code such as this pasted into a text box (the </> icon) or code box (the little running man icon) to preserve formatting. The forum re-formats pasted text in the  main message windows and the result may not run when copied into an editor in SAS and executed.

data have;
   input id_num :$10. (fstdt 	snddt	 thirddt) (:datetime22.) ;
   format fstdt 	snddt	 thirddt datetime22.3;
datalines;
175253072 	. 	10JAN2021:20:40:22.000 	11JAN2021:21:16:00.000
175253072. 	. 	12JAN2021:20:17:53.000 	11JAN2021:21:16:00.000
176039146 	. 	20JAN2021:21:51:05.000 	21JAN2021:17:07:00.000
176039146 	. 	20JAN2021:22:18:23.000 	21JAN2021:17:07:00.000
176249875 	. 	01FEB2021:15:05:12.000 	01FEB2021:13:39:00.000
176249875 	3FEB2021:18:19:51.000 	03FEB2021:13:39:00.000 	02FEB2021:18:19:51.000
;

 

If the fractions of a second are not actually used anywhere the format to display them could be a DATETIME18. ;

 

Now, using something similar to the above code show what the expected results would look like.

 

View solution in original post

6 REPLIES 6
Reeza
Super User
Please show your expected output.
ballardw
Super User

To do proper comparisons your "dates" need to be numeric SAS date (or time or datetime) values. As such if you are placing one of those "date" values into a variable then the variable would be numeric and you cannot assign "NA" to a value in a numeric column as that is character. There are ways to get something like that but it involves formats and special missing values.

 

If the TIME component you show is important then your values are datetime values and not dates. This is critical to understand because SAS uses count of days from 1 Jan 1960 to measure dates, and datetimes uses seconds from midnight on the same day. This is very important the functions that do things like determine how many months are between two dates (or datetimes) or advance dates/datetimes will return the wrong value if you use the syntax for a "date" with datetimes or vice versa.

 

The best way to share data is as data step code such as this pasted into a text box (the </> icon) or code box (the little running man icon) to preserve formatting. The forum re-formats pasted text in the  main message windows and the result may not run when copied into an editor in SAS and executed.

data have;
   input id_num :$10. (fstdt 	snddt	 thirddt) (:datetime22.) ;
   format fstdt 	snddt	 thirddt datetime22.3;
datalines;
175253072 	. 	10JAN2021:20:40:22.000 	11JAN2021:21:16:00.000
175253072. 	. 	12JAN2021:20:17:53.000 	11JAN2021:21:16:00.000
176039146 	. 	20JAN2021:21:51:05.000 	21JAN2021:17:07:00.000
176039146 	. 	20JAN2021:22:18:23.000 	21JAN2021:17:07:00.000
176249875 	. 	01FEB2021:15:05:12.000 	01FEB2021:13:39:00.000
176249875 	3FEB2021:18:19:51.000 	03FEB2021:13:39:00.000 	02FEB2021:18:19:51.000
;

 

If the fractions of a second are not actually used anywhere the format to display them could be a DATETIME18. ;

 

Now, using something similar to the above code show what the expected results would look like.

 

Patrick
Opal | Level 21

If I understood the logic you've described correctly then below should work.

data have;
 input id_num :$10. (fstdt snddt thirddt) (:datetime22.) ;
 format fstdt snddt thirddt datetime22.3;
datalines;
175253072 . 10JAN2021:20:40:22.000 11JAN2021:21:16:00.000
175253072 . 12JAN2021:20:17:53.000 11JAN2021:21:16:00.000
176039146 . 20JAN2021:21:51:05.000 21JAN2021:17:07:00.000
176039146 . 20JAN2021:22:18:23.000 21JAN2021:17:07:00.000
176249875 . 01FEB2021:15:05:12.000 01FEB2021:13:39:00.000
176249875 3FEB2021:18:19:51.000 03FEB2021:13:39:00.000 02FEB2021:18:19:51.000
;

/* data step version */
data _valid_dttm(keep=id_num valid_dttm);
  set have;
  by id_num;

  format valid_dttm datetime22.3;
  retain valid_dttm;

  if max(fstdt,snddt)>thirddt then
    valid_dttm=max(valid_dttm,fstdt,snddt);

  if last.id_num then 
    do;
      output;
      call missing(valid_dttm);
    end;
run;

data want1;
  merge have _valid_dttm;
  by id_num;
run;
  
proc format;
  value na_for_missing(default=22)
    low-high=[datetime22.3]
    other='NA'
  ;
quit;

proc print data=want1;
  format valid_dttm na_for_missing.;
run;


/* SQL Version */
proc sql;
  create table want2 as
  select 
    *,
    max(
      case
        when fstdt<=thirddt and snddt<=thirddt then .
        else max(fstdt,snddt)
        end
      )
      as valid_dttm format=datetime22.3
  from have
  group by
    id_num
  order by 
    id_num,
    snddt
  ;
quit;

proc print data=want2;
  format valid_dttm na_for_missing.;
run;
DanD999
Quartz | Level 8

@Patrick wrote:

If I understood the logic you've described correctly then below should work.

data have;
 input id_num :$10. (fstdt snddt thirddt) (:datetime22.) ;
 format fstdt snddt thirddt datetime22.3;
datalines;
175253072 . 10JAN2021:20:40:22.000 11JAN2021:21:16:00.000
175253072 . 12JAN2021:20:17:53.000 11JAN2021:21:16:00.000
176039146 . 20JAN2021:21:51:05.000 21JAN2021:17:07:00.000
176039146 . 20JAN2021:22:18:23.000 21JAN2021:17:07:00.000
176249875 . 01FEB2021:15:05:12.000 01FEB2021:13:39:00.000
176249875 3FEB2021:18:19:51.000 03FEB2021:13:39:00.000 02FEB2021:18:19:51.000
;

/* data step version */
data _valid_dttm(keep=id_num valid_dttm);
  set have;
  by id_num;

  format valid_dttm datetime22.3;
  retain valid_dttm;

  if max(fstdt,snddt)>thirddt then
    valid_dttm=max(valid_dttm,fstdt,snddt);

  if last.id_num then 
    do;
      output;
      call missing(valid_dttm);
    end;
run;

data want1;
  merge have _valid_dttm;
  by id_num;
run;
  
proc format;
  value na_for_missing(default=22)
    low-high=[datetime22.3]
    other='NA'
  ;
quit;

proc print data=want1;
  format valid_dttm na_for_missing.;
run;


/* SQL Version */
proc sql;
  create table want2 as
  select 
    *,
    max(
      case
        when fstdt<=thirddt and snddt<=thirddt then .
        else max(fstdt,snddt)
        end
      )
      as valid_dttm format=datetime22.3
  from have
  group by
    id_num
  order by 
    id_num,
    snddt
  ;
quit;

proc print data=want2;
  format valid_dttm na_for_missing.;
run;

This works. I forgot to say that I wanted to end up with only one row per id so I added this after your first data step and I have what I need.

data valid_dttm2;
	set _valid_dttm;
	format valid_dttm na_for_missing.;
run;

Thanks Patrick.

DanD999
Quartz | Level 8

@Patrick wrote:

If I understood the logic you've described correctly then below should work.

data have;
 input id_num :$10. (fstdt snddt thirddt) (:datetime22.) ;
 format fstdt snddt thirddt datetime22.3;
datalines;
175253072 . 10JAN2021:20:40:22.000 11JAN2021:21:16:00.000
175253072 . 12JAN2021:20:17:53.000 11JAN2021:21:16:00.000
176039146 . 20JAN2021:21:51:05.000 21JAN2021:17:07:00.000
176039146 . 20JAN2021:22:18:23.000 21JAN2021:17:07:00.000
176249875 . 01FEB2021:15:05:12.000 01FEB2021:13:39:00.000
176249875 3FEB2021:18:19:51.000 03FEB2021:13:39:00.000 02FEB2021:18:19:51.000
;

/* data step version */
data _valid_dttm(keep=id_num valid_dttm);
  set have;
  by id_num;

  format valid_dttm datetime22.3;
  retain valid_dttm;

  if max(fstdt,snddt)>thirddt then
    valid_dttm=max(valid_dttm,fstdt,snddt);

  if last.id_num then 
    do;
      output;
      call missing(valid_dttm);
    end;
run;




 

I don't see that "missing" macro. Where is it?

 

Thanks

ChrisNZ
Tourmaline | Level 20

Your need is unclear, however a little tweaking of this code should get you the result you want:

data WANT;    
  do until(last.ID_NUM);
    set HAVE;
    by ID_NUM;
    MIN=min(MIN, FSTDT, SNDDT);
    MAX=max(MAX, FSTDT, SNDDT);
  end;
  format VALIDDT datetime.;
  do until(last.ID_NUM);
    set HAVE;
    by ID_NUM;
    if THIRDDT > MAX then VALIDDT = .N ;
    else                  VALIDDT = MIN;
    output;
  end;
run;

ID_NUM FSTDT SNDDT THIRDDT VALIDDT
175253072 . 10JAN2021:20:40:22 11JAN2021:21:16:00 N
175253072. . 12JAN2021:20:17:53 11JAN2021:21:16:00 12JAN21:20:17:53
176039146 . 20JAN2021:21:51:05 21JAN2021:17:07:00 N
176039146 . 20JAN2021:22:18:23 21JAN2021:17:07:00 N
176249875 . 01FEB2021:15:05:12 01FEB2021:13:39:00 01FEB21:15:05:12
176249875 03FEB2021:18:19:51 03FEB2021:13:39:00 02FEB2021:18:19:51 01FEB21:15:05:12

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1773 views
  • 0 likes
  • 5 in conversation