Solved: help with datastep and dates

DanD999 · Posted 06-25-2021 05:17 PM

I'm sorry but I couldn't get the data to load correctly but here it is.

id_num	fstdt	snddt	thirddt
175253072	.	10JAN2021:20:40:22.000	11JAN2021:21:16:00.000
175253072.	.	12JAN2021:20:17:53.000	11JAN2021:21:16:00.000
176039146	.	20JAN2021:21:51:05.000	21JAN2021:17:07:00.000
176039146	.	20JAN2021:22:18:23.000	21JAN2021:17:07:00.000
176249875	.	01FEB2021:15:05:12.000	01FEB2021:13:39:00.000
176249875	3FEB2021:18:19:51.000	03FEB2021:13:39:00.000	02FEB2021:18:19:51.000

The dates in the second column are all missing except for the last row.

What I'm trying to do is take the lesser of the fstdt or snddt column (call it comparedt) and use it to compare to the thirddt column. If the thirddt column is less than the comparedt then create a new column called validdt and put the comparedt in it. If the thirddt is not less than the comparedt then 'NA' goes in the new validdt column.

Note that there are 2 rows and in the live data there could be more than 2 rows per id_num. In the case of the first id_num the snddt is less than the thirddt so we go to the 2nd row of that id_num and see that the snddt is greater than the thirddt which is really what I want so we put the snddt in the new validdt column.

In the second set of rows for id_num 176039146 the snddt on both rows is less than the thirddt so 'NA' goes in the validdt column.

In the last set of id_num rows both the fstdt and snddt dates are greater than the thirddt column so we want to use one of those for the validdt column. Since the snddt is earlier than the fstdt column, it goes in the validdt column.

I hope that is clear enough. If not please ask questions.

Thanks for any help anyone can provide.

ballardw · Posted 06-25-2021 06:19 PM

To do proper comparisons your "dates" need to be numeric SAS date (or time or datetime) values. As such if you are placing one of those "date" values into a variable then the variable would be numeric and you cannot assign "NA" to a value in a numeric column as that is character. There are ways to get something like that but it involves formats and special missing values.

If the TIME component you show is important then your values are datetime values and not dates. This is critical to understand because SAS uses count of days from 1 Jan 1960 to measure dates, and datetimes uses seconds from midnight on the same day. This is very important the functions that do things like determine how many months are between two dates (or datetimes) or advance dates/datetimes will return the wrong value if you use the syntax for a "date" with datetimes or vice versa.

The best way to share data is as data step code such as this pasted into a text box (the </> icon) or code box (the little running man icon) to preserve formatting. The forum re-formats pasted text in the main message windows and the result may not run when copied into an editor in SAS and executed.

data have;
   input id_num :$10. (fstdt 	snddt	 thirddt) (:datetime22.) ;
   format fstdt 	snddt	 thirddt datetime22.3;
datalines;
175253072 	. 	10JAN2021:20:40:22.000 	11JAN2021:21:16:00.000
175253072. 	. 	12JAN2021:20:17:53.000 	11JAN2021:21:16:00.000
176039146 	. 	20JAN2021:21:51:05.000 	21JAN2021:17:07:00.000
176039146 	. 	20JAN2021:22:18:23.000 	21JAN2021:17:07:00.000
176249875 	. 	01FEB2021:15:05:12.000 	01FEB2021:13:39:00.000
176249875 	3FEB2021:18:19:51.000 	03FEB2021:13:39:00.000 	02FEB2021:18:19:51.000
;

If the fractions of a second are not actually used anywhere the format to display them could be a DATETIME18. ;

Now, using something similar to the above code show what the expected results would look like.

View solution in original post

Reeza · Posted 06-25-2021 05:47 PM

Please show your expected output.

ballardw · Posted 06-25-2021 06:19 PM

To do proper comparisons your "dates" need to be numeric SAS date (or time or datetime) values. As such if you are placing one of those "date" values into a variable then the variable would be numeric and you cannot assign "NA" to a value in a numeric column as that is character. There are ways to get something like that but it involves formats and special missing values.

If the TIME component you show is important then your values are datetime values and not dates. This is critical to understand because SAS uses count of days from 1 Jan 1960 to measure dates, and datetimes uses seconds from midnight on the same day. This is very important the functions that do things like determine how many months are between two dates (or datetimes) or advance dates/datetimes will return the wrong value if you use the syntax for a "date" with datetimes or vice versa.

The best way to share data is as data step code such as this pasted into a text box (the </> icon) or code box (the little running man icon) to preserve formatting. The forum re-formats pasted text in the main message windows and the result may not run when copied into an editor in SAS and executed.

data have;
   input id_num :$10. (fstdt 	snddt	 thirddt) (:datetime22.) ;
   format fstdt 	snddt	 thirddt datetime22.3;
datalines;
175253072 	. 	10JAN2021:20:40:22.000 	11JAN2021:21:16:00.000
175253072. 	. 	12JAN2021:20:17:53.000 	11JAN2021:21:16:00.000
176039146 	. 	20JAN2021:21:51:05.000 	21JAN2021:17:07:00.000
176039146 	. 	20JAN2021:22:18:23.000 	21JAN2021:17:07:00.000
176249875 	. 	01FEB2021:15:05:12.000 	01FEB2021:13:39:00.000
176249875 	3FEB2021:18:19:51.000 	03FEB2021:13:39:00.000 	02FEB2021:18:19:51.000
;

If the fractions of a second are not actually used anywhere the format to display them could be a DATETIME18. ;

Now, using something similar to the above code show what the expected results would look like.

Patrick · Posted 06-25-2021 10:33 PM

If I understood the logic you've described correctly then below should work.

data have;
 input id_num :$10. (fstdt snddt thirddt) (:datetime22.) ;
 format fstdt snddt thirddt datetime22.3;
datalines;
175253072 . 10JAN2021:20:40:22.000 11JAN2021:21:16:00.000
175253072 . 12JAN2021:20:17:53.000 11JAN2021:21:16:00.000
176039146 . 20JAN2021:21:51:05.000 21JAN2021:17:07:00.000
176039146 . 20JAN2021:22:18:23.000 21JAN2021:17:07:00.000
176249875 . 01FEB2021:15:05:12.000 01FEB2021:13:39:00.000
176249875 3FEB2021:18:19:51.000 03FEB2021:13:39:00.000 02FEB2021:18:19:51.000
;

/* data step version */
data _valid_dttm(keep=id_num valid_dttm);
  set have;
  by id_num;

  format valid_dttm datetime22.3;
  retain valid_dttm;

  if max(fstdt,snddt)>thirddt then
    valid_dttm=max(valid_dttm,fstdt,snddt);

  if last.id_num then 
    do;
      output;
      call missing(valid_dttm);
    end;
run;

data want1;
  merge have _valid_dttm;
  by id_num;
run;
  
proc format;
  value na_for_missing(default=22)
    low-high=[datetime22.3]
    other='NA'
  ;
quit;

proc print data=want1;
  format valid_dttm na_for_missing.;
run;


/* SQL Version */
proc sql;
  create table want2 as
  select 
    *,
    max(
      case
        when fstdt<=thirddt and snddt<=thirddt then .
        else max(fstdt,snddt)
        end
      )
      as valid_dttm format=datetime22.3
  from have
  group by
    id_num
  order by 
    id_num,
    snddt
  ;
quit;

proc print data=want2;
  format valid_dttm na_for_missing.;
run;

DanD999 · Posted 06-27-2021 04:57 PM

@Patrick wrote:

If I understood the logic you've described correctly then below should work.

data have;
 input id_num :$10. (fstdt snddt thirddt) (:datetime22.) ;
 format fstdt snddt thirddt datetime22.3;
datalines;
175253072 . 10JAN2021:20:40:22.000 11JAN2021:21:16:00.000
175253072 . 12JAN2021:20:17:53.000 11JAN2021:21:16:00.000
176039146 . 20JAN2021:21:51:05.000 21JAN2021:17:07:00.000
176039146 . 20JAN2021:22:18:23.000 21JAN2021:17:07:00.000
176249875 . 01FEB2021:15:05:12.000 01FEB2021:13:39:00.000
176249875 3FEB2021:18:19:51.000 03FEB2021:13:39:00.000 02FEB2021:18:19:51.000
;

/* data step version */
data _valid_dttm(keep=id_num valid_dttm);
  set have;
  by id_num;

  format valid_dttm datetime22.3;
  retain valid_dttm;

  if max(fstdt,snddt)>thirddt then
    valid_dttm=max(valid_dttm,fstdt,snddt);

  if last.id_num then 
    do;
      output;
      call missing(valid_dttm);
    end;
run;

data want1;
  merge have _valid_dttm;
  by id_num;
run;
  
proc format;
  value na_for_missing(default=22)
    low-high=[datetime22.3]
    other='NA'
  ;
quit;

proc print data=want1;
  format valid_dttm na_for_missing.;
run;


/* SQL Version */
proc sql;
  create table want2 as
  select 
    *,
    max(
      case
        when fstdt<=thirddt and snddt<=thirddt then .
        else max(fstdt,snddt)
        end
      )
      as valid_dttm format=datetime22.3
  from have
  group by
    id_num
  order by 
    id_num,
    snddt
  ;
quit;

proc print data=want2;
  format valid_dttm na_for_missing.;
run;

This works. I forgot to say that I wanted to end up with only one row per id so I added this after your first data step and I have what I need.

data valid_dttm2;
	set _valid_dttm;
	format valid_dttm na_for_missing.;
run;

Thanks Patrick.

DanD999 · Posted 06-28-2021 09:36 AM

@Patrick wrote:

If I understood the logic you've described correctly then below should work.

data have;
 input id_num :$10. (fstdt snddt thirddt) (:datetime22.) ;
 format fstdt snddt thirddt datetime22.3;
datalines;
175253072 . 10JAN2021:20:40:22.000 11JAN2021:21:16:00.000
175253072 . 12JAN2021:20:17:53.000 11JAN2021:21:16:00.000
176039146 . 20JAN2021:21:51:05.000 21JAN2021:17:07:00.000
176039146 . 20JAN2021:22:18:23.000 21JAN2021:17:07:00.000
176249875 . 01FEB2021:15:05:12.000 01FEB2021:13:39:00.000
176249875 3FEB2021:18:19:51.000 03FEB2021:13:39:00.000 02FEB2021:18:19:51.000
;

/* data step version */
data _valid_dttm(keep=id_num valid_dttm);
  set have;
  by id_num;

  format valid_dttm datetime22.3;
  retain valid_dttm;

  if max(fstdt,snddt)>thirddt then
    valid_dttm=max(valid_dttm,fstdt,snddt);

  if last.id_num then 
    do;
      output;
      call missing(valid_dttm);
    end;
run;

I don't see that "missing" macro. Where is it?

Thanks

ChrisNZ · Posted 06-26-2021 01:18 AM

Your need is unclear, however a little tweaking of this code should get you the result you want:

data WANT;    
  do until(last.ID_NUM);
    set HAVE;
    by ID_NUM;
    MIN=min(MIN, FSTDT, SNDDT);
    MAX=max(MAX, FSTDT, SNDDT);
  end;
  format VALIDDT datetime.;
  do until(last.ID_NUM);
    set HAVE;
    by ID_NUM;
    if THIRDDT > MAX then VALIDDT = .N ;
    else                  VALIDDT = MIN;
    output;
  end;
run;

ID_NUM	FSTDT	SNDDT	THIRDDT	VALIDDT
175253072	.	10JAN2021:20:40:22	11JAN2021:21:16:00	N
175253072.	.	12JAN2021:20:17:53	11JAN2021:21:16:00	12JAN21:20:17:53
176039146	.	20JAN2021:21:51:05	21JAN2021:17:07:00	N
176039146	.	20JAN2021:22:18:23	21JAN2021:17:07:00	N
176249875	.	01FEB2021:15:05:12	01FEB2021:13:39:00	01FEB21:15:05:12
176249875	03FEB2021:18:19:51	03FEB2021:13:39:00	02FEB2021:18:19:51	01FEB21:15:05:12

High-Performance SAS Coding - Third Edition

help with datastep and dates

Re: help with datastep and dates

Re: help with datastep and dates

Re: help with datastep and dates

Re: help with datastep and dates

Re: help with datastep and dates

Re: help with datastep and dates

Re: help with datastep and dates

Registration is open

Registration is open

SAS Training: Just a Click Away