Hello !
My date and time variable arrival is of the type, length, format and informat as below. An example of a data point is 5/22/2020 4:11:00 PM
# | Variable | Type | Len | Format | Informat |
---|
Arrival | Num | 8 | DATETIME. | ANYDTDTM40. |
I used the code below for classification but it didn't work:
data B;
set A;
if arrival in (01/01/2019, 05/31/2019) then ARR=1;
else if arrival in (01/01/2020, 05/31/2020) then ARR=2;
run;
LOG
183 if arrival in (01/01/2019, 05/31/2019) then ARR=1;
-
22
76
184 else if arrival in (01/01/2020, 05/31/2020) then ARR=2;
-
22
76
ERROR 22-322: Syntax error, expecting one of the following: a quoted string, a numeric constant,
a datetime constant, a missing value, iterator, (, ), ',', :.
ERROR 76-322: Syntax error, statement will be ignored.
I want to create a variable call ARR such that, ARR captures the two intervals, from 01/01/2019 to 05/31/2019 and from 01/01/2020 to 05/31/2020. I do not want to just format it but create a new variable.
There are several problems.
First, when you want to use a date literal, such as you attempted with 01/01/2019 you must use something that appears like the SAS DATE. format. Which would be 01JAN2019. Second to tell SAS that you intend to use is as a date you quote the value and follow it with d
Example: '01JAN2019'D
the quotes can be single or double and the D can be upper or lower case. You can use 2-digit years but I strongly suggest against it.
The last problem is you are comparing a DATE with a DATETIME value (the format indicates such). Dates in SAS are measured in days and datetime values in seconds. So you need to make sure you compare a DATE with date value. You extract the date portion of a date value with the DATEPART function.
So for what you show what you should use would be to avoid errors
if datepart(arrival) in ('01JAN2019'd, '31MAY2019'd) then ARR=1; else if datepart (arrival) in ('01JAN2020'd, '31MAY2020'd) then ARR=2;
The question would be did you only want those exact dates or an interval?
Note there many functions to extract values from date, time and datetime values like Year, Month, Day, Hour, Minute, Datepart, Timepart (get the time from a datetime)
The code runs but does not capture the entire data:
data B;
set A;
if datepart(Discharge) in ('01JAN2019'd, '31MAY2019'd) then ARR=1;
else if datepart (Discharge) in ('01JAN2020'd, '31MAY2020'd) then ARR=2;
run;
proc freq data=B;
table ARR;
run;
proc print data=B (obs=10);
var Discharge ARR;
run;
ARR | Frequency | Percent | Cumulative Frequency |
Cumulative Percent |
---|---|---|---|---|
1 | 7 | 53.85 | 7 | 53.85 |
2 | 6 | 46.15 | 13 | 100.00 |
Frequency Missing = 1153 |
if datepart(Discharge) in ('01JAN2019'd, '31MAY2019'd) then ARR=1;
This sets ARR=1 only for these two dates, not for dates in between, and not for dates before or after. That doesn't sound like what you want. Is it?
Try this:
proc format;
value datef '01JAN2019:00:00:00'dt-'31MAY2019:23:59:59'dt = '1'
'01JAN2020:00:00:00'dt-'31MAY2020:23:59:59'dt = '2';
run;
proc freq data=A;
table DISCHARGE;
format discharge datef.;
run;
I want to create a variable call ARR such that, ARR captures the two intervals, from 01/01/2019 to 05/31/2019 and from 01/01/2020 to 05/31/2020. I do not want to just format it but create a new variable.
You can't mark a description of your original question as the correct answer.
Thank you,
data B;
set A;
if '01JAN2019:00:00:00'dt=<discharge<='31MAY2019:23:59:59'dt then ARR=1;
else if '01JAN2020:00:00:00'dt=<discharge<='31MAY2020:23:59:59'dt then ARR=2;
run;
proc freq data=B;
table ARR;
run;
@desireatem wrote:
Thank you,
data B;
set A;
if '01JAN2019:00:00:00'dt=<discharge<='31MAY2019:23:59:59'dt then ARR=1;
else if '01JAN2020:00:00:00'dt=<discharge<='31MAY2020:23:59:59'dt then ARR=2;
run;
proc freq data=B;
table ARR;
run;
I like the method with PROC FORMAT better. You don't have to create a whole new data set to get the answer. You can also modify the format so the output has more meaning than seeing 1 or 2, you can use words or other meaningful descriptors; ARR=1 and ARR=2 has no real meaning to anyone. For example:
proc format;
value datef '01JAN2019:00:00:00'dt-'31MAY2019:23:59:59'dt = 'First Cohort'
'01JAN2020:00:00:00'dt-'31MAY2020:23:59:59'dt = 'Second Cohort';
run;
@desireatem wrote:
The code runs but does not capture the entire data:
Define "entire data".
If you re-read my first answer there is a question about do you want an INTERVAL .
The IN operator only finds EXACT MATCHES for the values provided. If you want an INTERVAL then you use < or > to create ranges of values.
When there are errors in the log, please show us the ENTIRE log for that DATA step or PROC. Do not show us tiny portions of the log, as you have done. Please copy the log as text and paste it into the window that appears when you click on the </> icon.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.