BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
arii
Obsidian | Level 7

Hi,

 

I would appreciate some help with this:

 

I am trying to create a new dataset (File 2) using data from specific dates in an existing dataset (File 1).

 

I want 'File 2' to contain only data that are 25days or more apart in 'File 1'.

(first observation for 'File 2" = 1990-02-07)

 

File 1:

 

Dateaa
1990-02-0223
1990-02-0745
1990-02-1356
1990-02-287
1990-03-128
1990-03-138
1990-03-169
1990-03-1955
1990-03-2378
1990-04-0390
1990-04-2376
1990-05-016
1990-05-114
1990-05-1533
1990-05-2534
1990-05-3012

 

 

 

File 2:

 

Datebb
1990-02-0745
1990-03-1256
1990-04-2376
1990-05-2534

 

 

Please do advise on this.

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Please run Proc Contents on the SAS data set that represents File1. It will have the details that will let us provide specifics on how to manipulate the data. Just because Excel displays a value as a date does not always mean that the SAS variable will because of some vagaries in how many people structure Excel files and the specific method used to bring the data into a SAS data set.

 

Notice that you saying that you selected a specific date to start with as a reference and that you need dates from that reference does now make much more sense.

 

The code below shows how to make a data set containing your "dates" as SAS date values and then select using your rule.

As example code the reference date variable is also in the output so you can compare if that is behaving correctly.

data file1;
   informat date yymmdd10.;
   input date aa;
   format date yymmddD10.;
datalines;
1990-02-02 23 
1990-02-07 45 
1990-02-13 56 
1990-02-28 7 
1990-03-12 8 
1990-03-13 8 
1990-03-16 9 
1990-03-19 55 
1990-03-23 78 
1990-04-03 90 
1990-04-23 76 
1990-05-01 6 
1990-05-11 4 
1990-05-15 33 
1990-05-25 34 
1990-05-30 12 
;
run;

data file2;;
   set file1;
   /* Retain will keep the value of a variable 
      from record to record. Note that to use
      a literal date value the ddMONyy in quotes with 
      D to tell SAS the value should be treated 
      as a date. None of the other date formats
      are acceptable
   */
   retain referencedate '07FEB1990'd; 
   format referencedate yymmddD10.;
   if date = referencedate OR date-Referencedate ge 25 then do;
      output;
      Referencedate = date;
   end;
run;

If your proc contents or examination of the data set's variable properties does not show the date as numeric you will need to use an INPUT statement such as

 

Sasdate = input(date, yymmdd10.); to create a SAS date valued variable and then do all the other calculations using the SASDATE variable.

View solution in original post

8 REPLIES 8
ballardw
Super User

First are your dates actually SAS date values with a date format attached or character values.

 

Second you will have to provide a bit more detail about "25 days apart". Why is 1990-02-07 included and not 1990-02-03? Why 1990-03-12 is included in your output. It is not more than 25 days from 1990-02-28, neither is 1990-04-03 more than 25 days from 1990-03-23.

 

 

arii
Obsidian | Level 7

Thank you for your response, Ballardw.

 

The dates you are referring to are from 'File 1' (the original data set).

I want to create a new dataset ('File 2') from the original dataset. The date conflict you are referring to is not in 'File 2'. (please have a look at 'File 2').

 

 

Also, I am not too sure what you mean by '.. a date format attached', does this affect the intended exercise?

ballardw
Super User

@arii wrote:

Thank you for your response, Ballardw.

 

The dates you are referring to are from 'File 1' (the original data set).

I want to create a new dataset ('File 2') from the original dataset. The date conflict you are referring to is not in 'File 2'. (please have a look at 'File 2').

 

 

Also, I am not too sure what you mean by '.. a date format attached', does this affect the intended exercise?


The reason why your dates in file2 are considered "25 days apart" is not clear in terms of relation to other dates in file1. For instance you say the first value in file2 from file1 is 1990--02-07. WHY? Why not 1990-02-13 which is also 25 days or more apart from  1990-03-12 (27 days). And the question is why specific values appear. There is more that just '25 or more days' involved to get your result because there are multiple values for any given date that are 25 or more days from some other date. To have any interval there must be a comparison between two dates and you do not indicate which two dates are being compared for any selection in the resulting set.

 

The value of the date variable is pretty important. If the value is a SAS date value, which is numeric and usually has a date format applied such as yymmdd10. or similar then there are a number of functions that are available that do calculations such as the number of days between two dates.

arii
Obsidian | Level 7

 

1990-02-07 (February 7th, 1990) is the first observation I chose for 'File 2' (this is the 'reference date').

 

 

Thus, I need the 'second observation'  to be 25 days or more after 1990-02-07. 

 

The 'third observation' would be 25 days or more apart from the 'second observation'.

 

The sequence continues like that.

 

 

 

My data is in an .xlsx file; the format is year-mm-dd.

 

I am relatively new to some of these.

Reeza
Super User

Why aren't the later records included?

 

Is aa the number of days apart or does it need to be calculated?

 

data want;
set have;

where aa > 25 ;
run;

@arii wrote:

Hi,

 

I would appreciate some help with this:

 

I am trying to create a new dataset (File 2) using data from specific dates in an existing dataset (File 1).

 

I want 'File 2' to contain only data that are 25days or more apart in 'File 1'.

(first observation for 'File 2" = 1990-02-07)

 

File 1:

 

Date aa
1990-02-02 23
1990-02-07 45
1990-02-13 56
1990-02-28 7
1990-03-12 8
1990-03-13 8
1990-03-16 9
1990-03-19 55
1990-03-23 78
1990-04-03 90
1990-04-23 76
1990-05-01 6
1990-05-11 4
1990-05-15 33
1990-05-25 34
1990-05-30 12

 

 

 

File 2:

 

Date bb
1990-02-07 45
1990-03-12 56
1990-04-23 76
1990-05-25 34

 

 

Please do advise on this.

 

 


 

arii
Obsidian | Level 7

Please see more details:

 

 

1990-02-07 (February 7th, 1990) is the first observation I chose for 'File 2' (this is the 'reference date').

 

Thus, I need the 'second observation'  to be 25 days or more after 1990-02-07. 

 

The 'third observation' would be 25 days or more apart from the 'second observation'.

 

The sequence continues like that.

 

 

'aa' is not the number of days apart, these are numerical data for the different dates shown.

ballardw
Super User

Please run Proc Contents on the SAS data set that represents File1. It will have the details that will let us provide specifics on how to manipulate the data. Just because Excel displays a value as a date does not always mean that the SAS variable will because of some vagaries in how many people structure Excel files and the specific method used to bring the data into a SAS data set.

 

Notice that you saying that you selected a specific date to start with as a reference and that you need dates from that reference does now make much more sense.

 

The code below shows how to make a data set containing your "dates" as SAS date values and then select using your rule.

As example code the reference date variable is also in the output so you can compare if that is behaving correctly.

data file1;
   informat date yymmdd10.;
   input date aa;
   format date yymmddD10.;
datalines;
1990-02-02 23 
1990-02-07 45 
1990-02-13 56 
1990-02-28 7 
1990-03-12 8 
1990-03-13 8 
1990-03-16 9 
1990-03-19 55 
1990-03-23 78 
1990-04-03 90 
1990-04-23 76 
1990-05-01 6 
1990-05-11 4 
1990-05-15 33 
1990-05-25 34 
1990-05-30 12 
;
run;

data file2;;
   set file1;
   /* Retain will keep the value of a variable 
      from record to record. Note that to use
      a literal date value the ddMONyy in quotes with 
      D to tell SAS the value should be treated 
      as a date. None of the other date formats
      are acceptable
   */
   retain referencedate '07FEB1990'd; 
   format referencedate yymmddD10.;
   if date = referencedate OR date-Referencedate ge 25 then do;
      output;
      Referencedate = date;
   end;
run;

If your proc contents or examination of the data set's variable properties does not show the date as numeric you will need to use an INPUT statement such as

 

Sasdate = input(date, yymmdd10.); to create a SAS date valued variable and then do all the other calculations using the SASDATE variable.

arii
Obsidian | Level 7

Thank you for your help, Ballardw. The codes worked out well.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 4114 views
  • 0 likes
  • 3 in conversation