I have two datasets for covid cases and covid contacts
Both datasets have date fields with values like the following
11SEP20:09:19:55
11SEP20:08:48:00 and so on through
26OCT21:00:00:00
What I would like to do is extract the datepart and then apply a format that could be used as a BY variable in a sort of each data set and then a merge.
I set up two sql queries
Cases ***;
Proc sql;
Create Table SASCHART.Epi_Cases_Export AS
SELECT CaseID,
CaseStatus,
County,
TrueCaseDate AS Date label="Date"
FROM SASCHART.EpiExport_Cases_All_Fields
Where '01MAR20'd <= datepart(Date) <= '05NOV21'd
Order by Date;
quit;
Contacts ***;
Proc sql;
Create Table SASCHART.Arias_Contacts_Trace AS
SELECT Contact_Person_ID,
CaseStatus,
County_1 AS County label = "County",
Created_On AS Date label="Date"
FROM SASCHART.Arias_EpiCase_AriasContacts
Where '01MAR20'd <= datepart(Date) <= '05NOV21'd
And CaseStatus ne " "
Order by Date;
quit;
But the merge
Data SASCHART.EPI_Cases_ARIAS_Contacts_Merge;
Merge SASCHART.Epi_Cases_Export (in=EPI)
SASCHART.Arias_Contacts_Trace(in=ARIAS);
by Date;
If EPI = ARIAS;
run;
did not produce any match.
Data SASCHART.EPI_Cases_ARIAS_Contacts_Merge;
5868 Merge SASCHART.Epi_Cases_Export (in=EPI)
5869 SASCHART.Arias_Contacts_Trace(in=ARIAS);
5870 by Date;
5871 If EPI = ARIAS;
5872 run;
WARNING: Multiple lengths were specified for the variable County by input data set(s). This can cause
truncation of data.
NOTE: There were 362559 observations read from the data set SASCHART.EPI_CASES_EXPORT.
NOTE: There were 167054 observations read from the data set SASCHART.ARIAS_CONTACTS_TRACE.
NOTE: The data set SASCHART.EPI_CASES_ARIAS_CONTACTS_MERGE has 0 observations and 5 variables.
NOTE: DATA statement used (Total process time):
real time 0.22 seconds
cpu time 0.23 seconds
The log notes multiple lengths for county which I will fix.
Since I was merging by Date I checked what that the date part looks like
Data SASCHART.Arias_Contacts_Trace;
Set SASCHART.Arias_Contacts_Trace;
Date1 = datepart(Date);
run;
The output looks like
Date Date1
11SEP20:09:19:55 22169 (this is a representative example)
I would like date to show as mmmyy. So a format like mmmyy. would work?
A consistent format would likely permit the merge to execute?
Thank you for your help in diagnosing these issues.
wklierman
If you want to reference a variable you are creating in the SELECT statement then use the CALCULATED keyword. PS always use 4 digits for years.
where '01MAR2020'd <= calculated Date <= '05NOV2021'd
Or use the original variable and datetime literals instead of date literals.
where '01MAR2020:00:00'dt <= Created_On < '06NOV2021:00:00'dt
Formats don't matter when you're merging by dates.
However, the datepart in the filter condition but doesn't change the underlying variable, it remains a datetime variable.
Modify the SQL so that the datetimes are converted to dates.
Cases ***; Proc sql; Create Table SASCHART.Epi_Cases_Export AS SELECT CaseID, CaseStatus, County, datepart(TrueCaseDate) AS Date label="Date" format=date9. FROM SASCHART.EpiExport_Cases_All_Fields Where '01MAR20'd <= datepart(Date) <= '05NOV21'd Order by Date; quit; Contacts ***; Proc sql; Create Table SASCHART.Arias_Contacts_Trace AS SELECT Contact_Person_ID, CaseStatus, County_1 AS County label = "County", datepart(Created_On) as Date label="Date" format=date9. FROM SASCHART.Arias_EpiCase_AriasContacts Where '01MAR20'd <= datepart(Date) <= '05NOV21'd /*only filters the date, no actual date conversion on your data happens*/ And CaseStatus ne " " Order by Date; quit;
Then the rest of your code should work as expected.
@wlierman wrote:
I have two datasets for covid cases and covid contacts
Both datasets have date fields with values like the following
11SEP20:09:19:55
11SEP20:08:48:00 and so on through
26OCT21:00:00:00
What I would like to do is extract the datepart and then apply a format that could be used as a BY variable in a sort of each data set and then a merge.
I set up two sql queries
Cases ***; Proc sql; Create Table SASCHART.Epi_Cases_Export AS SELECT CaseID, CaseStatus, County, TrueCaseDate AS Date label="Date" FROM SASCHART.EpiExport_Cases_All_Fields Where '01MAR20'd <= datepart(Date) <= '05NOV21'd Order by Date; quit; Contacts ***; Proc sql; Create Table SASCHART.Arias_Contacts_Trace AS SELECT Contact_Person_ID, CaseStatus, County_1 AS County label = "County", Created_On AS Date label="Date" FROM SASCHART.Arias_EpiCase_AriasContacts Where '01MAR20'd <= datepart(Date) <= '05NOV21'd And CaseStatus ne " " Order by Date; quit;
But the merge
Data SASCHART.EPI_Cases_ARIAS_Contacts_Merge; Merge SASCHART.Epi_Cases_Export (in=EPI) SASCHART.Arias_Contacts_Trace(in=ARIAS); by Date; If EPI = ARIAS; run;
did not produce any match.
Data SASCHART.EPI_Cases_ARIAS_Contacts_Merge; 5868 Merge SASCHART.Epi_Cases_Export (in=EPI) 5869 SASCHART.Arias_Contacts_Trace(in=ARIAS); 5870 by Date; 5871 If EPI = ARIAS; 5872 run; WARNING: Multiple lengths were specified for the variable County by input data set(s). This can cause truncation of data. NOTE: There were 362559 observations read from the data set SASCHART.EPI_CASES_EXPORT. NOTE: There were 167054 observations read from the data set SASCHART.ARIAS_CONTACTS_TRACE. NOTE: The data set SASCHART.EPI_CASES_ARIAS_CONTACTS_MERGE has 0 observations and 5 variables. NOTE: DATA statement used (Total process time): real time 0.22 seconds cpu time 0.23 seconds
The log notes multiple lengths for county which I will fix.
Since I was merging by Date I checked what that the date part looks like
Data SASCHART.Arias_Contacts_Trace; Set SASCHART.Arias_Contacts_Trace; Date1 = datepart(Date); run;
The output looks like
Date Date1
11SEP20:09:19:55 22169 (this is a representative example)
I would like date to show as mmmyy. So a format like mmmyy. would work?
A consistent format would likely permit the merge to execute?
Thank you for your help in diagnosing these issues.
wklierman
Create Table SASCHART.Epi_Cases_Export AS
SELECT t1.CaseID,
CaseStatus,
County,
datepart(TrueCaseDate) AS Date label="Date" format=date9.
FROM SASCHART.EpiExport_Cases_All_Fields
Where '01MAR20'd <= Date <= '05NOV21'd
Order by Date;
I missed that you were using the new variable in the filter, so since it's already been converted you need to remove the date parts. I'll assume you can fix the rest of your code with this pointer.
Here's a great, but longer and in depth, reference for dates and times in SAS
https://communities.sas.com/t5/SAS-Communities-Library/Working-with-Dates-and-Times-in-SAS-Tutorial/...
If you want to reference a variable you are creating in the SELECT statement then use the CALCULATED keyword. PS always use 4 digits for years.
where '01MAR2020'd <= calculated Date <= '05NOV2021'd
Or use the original variable and datetime literals instead of date literals.
where '01MAR2020:00:00'dt <= Created_On < '06NOV2021:00:00'dt
Thank you for the direction using Calculated date in the where statement.
It cleared up the problem.
Thank you
wlierman
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.