Hi,
I would appreciate some help with this:
I am trying to create a new dataset (File 2) using data from specific dates in an existing dataset (File 1).
I want 'File 2' to contain only data that are 25days or more apart in 'File 1'.
(first observation for 'File 2" = 1990-02-07)
File 1:
Date | aa |
1990-02-02 | 23 |
1990-02-07 | 45 |
1990-02-13 | 56 |
1990-02-28 | 7 |
1990-03-12 | 8 |
1990-03-13 | 8 |
1990-03-16 | 9 |
1990-03-19 | 55 |
1990-03-23 | 78 |
1990-04-03 | 90 |
1990-04-23 | 76 |
1990-05-01 | 6 |
1990-05-11 | 4 |
1990-05-15 | 33 |
1990-05-25 | 34 |
1990-05-30 | 12 |
File 2:
Date | bb |
1990-02-07 | 45 |
1990-03-12 | 56 |
1990-04-23 | 76 |
1990-05-25 | 34 |
Please do advise on this.
Please run Proc Contents on the SAS data set that represents File1. It will have the details that will let us provide specifics on how to manipulate the data. Just because Excel displays a value as a date does not always mean that the SAS variable will because of some vagaries in how many people structure Excel files and the specific method used to bring the data into a SAS data set.
Notice that you saying that you selected a specific date to start with as a reference and that you need dates from that reference does now make much more sense.
The code below shows how to make a data set containing your "dates" as SAS date values and then select using your rule.
As example code the reference date variable is also in the output so you can compare if that is behaving correctly.
data file1; informat date yymmdd10.; input date aa; format date yymmddD10.; datalines; 1990-02-02 23 1990-02-07 45 1990-02-13 56 1990-02-28 7 1990-03-12 8 1990-03-13 8 1990-03-16 9 1990-03-19 55 1990-03-23 78 1990-04-03 90 1990-04-23 76 1990-05-01 6 1990-05-11 4 1990-05-15 33 1990-05-25 34 1990-05-30 12 ; run; data file2;; set file1; /* Retain will keep the value of a variable from record to record. Note that to use a literal date value the ddMONyy in quotes with D to tell SAS the value should be treated as a date. None of the other date formats are acceptable */ retain referencedate '07FEB1990'd; format referencedate yymmddD10.; if date = referencedate OR date-Referencedate ge 25 then do; output; Referencedate = date; end; run;
If your proc contents or examination of the data set's variable properties does not show the date as numeric you will need to use an INPUT statement such as
Sasdate = input(date, yymmdd10.); to create a SAS date valued variable and then do all the other calculations using the SASDATE variable.
First are your dates actually SAS date values with a date format attached or character values.
Second you will have to provide a bit more detail about "25 days apart". Why is 1990-02-07 included and not 1990-02-03? Why 1990-03-12 is included in your output. It is not more than 25 days from 1990-02-28, neither is 1990-04-03 more than 25 days from 1990-03-23.
Thank you for your response, Ballardw.
The dates you are referring to are from 'File 1' (the original data set).
I want to create a new dataset ('File 2') from the original dataset. The date conflict you are referring to is not in 'File 2'. (please have a look at 'File 2').
Also, I am not too sure what you mean by '.. a date format attached', does this affect the intended exercise?
@arii wrote:
Thank you for your response, Ballardw.
The dates you are referring to are from 'File 1' (the original data set).
I want to create a new dataset ('File 2') from the original dataset. The date conflict you are referring to is not in 'File 2'. (please have a look at 'File 2').
Also, I am not too sure what you mean by '.. a date format attached', does this affect the intended exercise?
The reason why your dates in file2 are considered "25 days apart" is not clear in terms of relation to other dates in file1. For instance you say the first value in file2 from file1 is 1990--02-07. WHY? Why not 1990-02-13 which is also 25 days or more apart from 1990-03-12 (27 days). And the question is why specific values appear. There is more that just '25 or more days' involved to get your result because there are multiple values for any given date that are 25 or more days from some other date. To have any interval there must be a comparison between two dates and you do not indicate which two dates are being compared for any selection in the resulting set.
The value of the date variable is pretty important. If the value is a SAS date value, which is numeric and usually has a date format applied such as yymmdd10. or similar then there are a number of functions that are available that do calculations such as the number of days between two dates.
1990-02-07 (February 7th, 1990) is the first observation I chose for 'File 2' (this is the 'reference date').
Thus, I need the 'second observation' to be 25 days or more after 1990-02-07.
The 'third observation' would be 25 days or more apart from the 'second observation'.
The sequence continues like that.
My data is in an .xlsx file; the format is year-mm-dd.
I am relatively new to some of these.
Why aren't the later records included?
Is aa the number of days apart or does it need to be calculated?
data want;
set have;
where aa > 25 ;
run;
@arii wrote:
Hi,
I would appreciate some help with this:
I am trying to create a new dataset (File 2) using data from specific dates in an existing dataset (File 1).
I want 'File 2' to contain only data that are 25days or more apart in 'File 1'.
(first observation for 'File 2" = 1990-02-07)
File 1:
Date aa 1990-02-02 23 1990-02-07 45 1990-02-13 56 1990-02-28 7 1990-03-12 8 1990-03-13 8 1990-03-16 9 1990-03-19 55 1990-03-23 78 1990-04-03 90 1990-04-23 76 1990-05-01 6 1990-05-11 4 1990-05-15 33 1990-05-25 34 1990-05-30 12
File 2:
Date bb 1990-02-07 45 1990-03-12 56 1990-04-23 76 1990-05-25 34
Please do advise on this.
Please see more details:
1990-02-07 (February 7th, 1990) is the first observation I chose for 'File 2' (this is the 'reference date').
Thus, I need the 'second observation' to be 25 days or more after 1990-02-07.
The 'third observation' would be 25 days or more apart from the 'second observation'.
The sequence continues like that.
'aa' is not the number of days apart, these are numerical data for the different dates shown.
Please run Proc Contents on the SAS data set that represents File1. It will have the details that will let us provide specifics on how to manipulate the data. Just because Excel displays a value as a date does not always mean that the SAS variable will because of some vagaries in how many people structure Excel files and the specific method used to bring the data into a SAS data set.
Notice that you saying that you selected a specific date to start with as a reference and that you need dates from that reference does now make much more sense.
The code below shows how to make a data set containing your "dates" as SAS date values and then select using your rule.
As example code the reference date variable is also in the output so you can compare if that is behaving correctly.
data file1; informat date yymmdd10.; input date aa; format date yymmddD10.; datalines; 1990-02-02 23 1990-02-07 45 1990-02-13 56 1990-02-28 7 1990-03-12 8 1990-03-13 8 1990-03-16 9 1990-03-19 55 1990-03-23 78 1990-04-03 90 1990-04-23 76 1990-05-01 6 1990-05-11 4 1990-05-15 33 1990-05-25 34 1990-05-30 12 ; run; data file2;; set file1; /* Retain will keep the value of a variable from record to record. Note that to use a literal date value the ddMONyy in quotes with D to tell SAS the value should be treated as a date. None of the other date formats are acceptable */ retain referencedate '07FEB1990'd; format referencedate yymmddD10.; if date = referencedate OR date-Referencedate ge 25 then do; output; Referencedate = date; end; run;
If your proc contents or examination of the data set's variable properties does not show the date as numeric you will need to use an INPUT statement such as
Sasdate = input(date, yymmdd10.); to create a SAS date valued variable and then do all the other calculations using the SASDATE variable.
Thank you for your help, Ballardw. The codes worked out well.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.