I have data that I am trying to transform in order to create a SAS visual. There are several steps before arriving there. I have exported a few lines of data as an xlsx file and attached here.
A typical line of data is like (I have chosen just the important fields)
Case Date Contact_ID Date1
234243 01MAR2020 51954633 03SEP2020
716432 02MAR2020 53426754 03SEP2020
and so on
The date values are formatted as date9. I want to split then by day, month, year so I tried the following
Data SASCHART.Date_Fix; set SASCHART.Cases_Contacts_Want; Yr_1 = substr(Date,6,4); Mon_1= substr(Date,3,5); Day_1 = substr(Date,1,2); Daymon_1=trim(Day_1||Mon_1); Yr_2 = substr(Date1,6,4); Mon_2= substr(Date1,3,5); Day_2 = substr(Date1,1,2); Daymon_2=trim(Day_2||Mon_2); run;
which didn't produce the result I need which is
Daymon_1 Daymon_2
01MAR 03SEP
and so on. So I looked in some older SAS Communities posts and tried something like this
Data SASCHART.Date_Fix; set SASCHART.Cases_Contacts_Want; new_date_1 = mdy(substr(Date,3,5), substr(Date,1,2), substr(Date,6,4)); format new_date_1 YYMMDD8.; run;
That didn't work either.
I want to count the number of Cases for each daymon_1 and the number of contact_ID by daymon_2. It may be something like cases contacts
date count date count
01MAR 1 01SEP 38
. .
. .
26OCT 834 16OCT 543
Then I will use the dates on an x-axis and the counts on the y-axis to create an overlaid line graph in SAS. The cases started earlier than contact-tracing and are several days ahead of the contact counts, since contacting_tracing is slowing down for several reasons.
First, I need to split the dates into substrings properly.
Your expertise is very much appreciated.
wlierman
Run a PROC CONTENTS on your SAS dataset (not on your Excel output) and see what TYPE of variable you have.
If DATE is a numeric variable that is being displayed using the DATE9. format then it contains the number of days since 1960. Do not use string functions like SUBSTR() on numeric variables. SAS will first have to convert the number of days into a string and it will not look like the strings that the DATE9. format prints. If you want to get a string that just has the first 5 characters when printed using the DATE9. format then use the PUT() function with the DATE format but set the width to only 5 characters.
daymon = put(date,date5.);
And if you want the truncated ddMON strings to display in printouts and reports then don't bother the make any new variables, just use the one you already have with the different display format.
format date date5.;
If DATE is a character variable that has strings that look like they were dates printed using the DATE9. format then just use the substr() function directly on the character variable.
daymon = substr(date,1,5);
Or you can also use a format to have it only print the first 5 characters.
format date $5. ;
If you do have a character variable you can use the INPUT() function to generate a date value from it. Which you can then use with any of the various functions that operate on date values or formats that display date values.
datenum=input(date,date9.);
day_of_month=day(datenum);
month_of_year=month(datenum);
calendar_year=year(datenum);
format datenum yymmdd10.;
format datenum mmddyy10.;
The PUTN() function makes it possible to pass in the format specification as a string so you can take a look at how a number of different formats will display the same value.
779 data _null_; 780 length format display $50 ; 781 date=today(); 782 do format='yymmdd10.','mmddyy10.','ddmmyy10.','date5.','date7.','date9.','date11.','monyy7.' ; 783 display = putn(date,format); 784 put format $12. display ; 785 end; 786 run; yymmdd10. 2021-11-02 mmddyy10. 11/02/2021 ddmmyy10. 02/11/2021 date5. 02NOV date7. 02NOV21 date9. 02NOV2021 date11. 02-NOV-2021 monyy7. NOV2021
In SAS data set SASCHART.Cases_Contacts_Want, are the dates numeric or text? What does PROC CONTENTS say?
If a value is a DATE you can get the day of the month using the DAY function : dayofmonth= day(datevariable).
Same with the Month and Year. The values will be numbers, which means the results will sort properly.
However if you want to Count something in ddMON appearance use Proc Freq and the format DATE5.
Groups created by formats will be honored for most analysis, reporting and graphing procedures.
An example:
data example; input date :Date9.; format date date9.; datalines; 01Jan2021 05Jan2021 19Jan2021 01Feb2021 01Feb2021 01Feb2020 ; proc freq data=example; tables date; format date date5.; run;
Please note that I included a value of the same date in a separate year intentionally. You have to consider year somewhere if your data includes more than one year.
This "does not work" because when date is numeric SAS will convert the number to a character value using the BEST12 format resulting in many leading blanks and the substr function has nothing to work with for most of the bits. Plus the MDY function would be attempting to create a new date value, not a string. If Date started as string that might work if starting at the 3rd position the 5 following characters consisted of spaces and or 2 consecutive digits to indicate month but since you have subst(date,6,4) one suspects what ever you borrowed this from was either different or looking a very odd "date" values as character.
new_date_1 = mdy(substr(Date,3,5), substr(Date,1,2), substr(Date,6,4));
BTW Many users here don't want to download Excel files because of virus potential, others have such things blocked by security software. Also if you give us Excel we have to create a SAS data set and due to the non-existent constraints on Excel data cells the result we end up with may not have variables of the same type (numeric or character) and even values.
Run a PROC CONTENTS on your SAS dataset (not on your Excel output) and see what TYPE of variable you have.
If DATE is a numeric variable that is being displayed using the DATE9. format then it contains the number of days since 1960. Do not use string functions like SUBSTR() on numeric variables. SAS will first have to convert the number of days into a string and it will not look like the strings that the DATE9. format prints. If you want to get a string that just has the first 5 characters when printed using the DATE9. format then use the PUT() function with the DATE format but set the width to only 5 characters.
daymon = put(date,date5.);
And if you want the truncated ddMON strings to display in printouts and reports then don't bother the make any new variables, just use the one you already have with the different display format.
format date date5.;
If DATE is a character variable that has strings that look like they were dates printed using the DATE9. format then just use the substr() function directly on the character variable.
daymon = substr(date,1,5);
Or you can also use a format to have it only print the first 5 characters.
format date $5. ;
If you do have a character variable you can use the INPUT() function to generate a date value from it. Which you can then use with any of the various functions that operate on date values or formats that display date values.
datenum=input(date,date9.);
day_of_month=day(datenum);
month_of_year=month(datenum);
calendar_year=year(datenum);
format datenum yymmdd10.;
format datenum mmddyy10.;
The PUTN() function makes it possible to pass in the format specification as a string so you can take a look at how a number of different formats will display the same value.
779 data _null_; 780 length format display $50 ; 781 date=today(); 782 do format='yymmdd10.','mmddyy10.','ddmmyy10.','date5.','date7.','date9.','date11.','monyy7.' ; 783 display = putn(date,format); 784 put format $12. display ; 785 end; 786 run; yymmdd10. 2021-11-02 mmddyy10. 11/02/2021 ddmmyy10. 02/11/2021 date5. 02NOV date7. 02NOV21 date9. 02NOV2021 date11. 02-NOV-2021 monyy7. NOV2021
You could do it with user define format,
When you input date data are text strings:
data have2;
input Case Date : $ 9. Contact_ID Date1 : $ 9.;
cards;
234243 01MAR2020 51954633 03SEP2020
716432 02MAR2020 53426754 03SEP2020
;
run;
proc print;
run;
do it like this:
proc format;
picture mydate (default=5)
other='%d%b' (datatype=date LANGUAGE=English);
;
run;
[edit:] or use suggested date5. format
and then
data want2;
set have2;
length Daymon_1 Daymon_2 $ 5;
Daymon_1 = put(input(Date,date9.), mydate. /* date5. */);
Daymon_2 = put(input(Date1,date9.),mydate. /* date5. */);
run;
proc print;
run;
Bart
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.