DATA work.import;
SET work.import;
IF (eventdate >= "03012020") and (eventdate <= "03312020") then month = 3;
IF (eventdate >= "04012020") AND (eventdate <= "04302020") then month = 4;
IF (eventdate >= "05012020") and (eventdate <= "05312020") then month = 5;
if (eventdate >= "06012020") and (eventdate <= "06302020") then month=6;
if (eventdate >= "07012020") and (eventdate <= "07312020") then month=7;
if (eventdate => "08012020") and (eventdate <= "08312020") then month=8;
if (eventdate => "09012020") and (eventdate <= "09302020") then month=9;
if (eventdate => "10012020") and (eventdate <= "10312020") then month=10;
if (eventdate => "11012020") and (eventdate <= "11302020") then month=11;
For chi-squared test and logistic regression, you do not have to create your own dummy variables. If your professor is telling you that you have to, then your professor is wrong, you don't have to create dummy variables for this problem, and there are much more efficient ways of doing this.
If the variable EVENTDATE is numeric and formatted as mmddyy (I'm still a little skeptical, I did ask you to show me a typical value of EVENTDATE, so please show me a typical value of EVENTDATE) then this should work:
data want;
set have;
month_variable=month(eventdate);
run;
proc logistic data=want;
class month_variable;
model y = month_variable;
run;
No user-created dummy variables needed. SAS creates the dummy variables behind the scenes, so you don't have to. That's what the CLASS statement does.
Rarely would you ever need to create your own DUMMY variables in SAS. There are many more efficient ways to do this, SAS has done the hard work here so you don't have to.
Before I can provide code that helps, please answer some questions.
What are typical values of the variable eventdate?
Is eventdate numeric or character, according to PROC CONTENTS?
What analysis are you going to do with these dummy variables once you have them?
Hi,
To answer your questions: I did run a proc contents and the event dates are numerical and in this format: mmddyy. Also, once I have the data, I will be running a chi-square and logistic regression analysis. My professor is asking to create a "dummy" variables for months: March through November.
For chi-squared test and logistic regression, you do not have to create your own dummy variables. If your professor is telling you that you have to, then your professor is wrong, you don't have to create dummy variables for this problem, and there are much more efficient ways of doing this.
If the variable EVENTDATE is numeric and formatted as mmddyy (I'm still a little skeptical, I did ask you to show me a typical value of EVENTDATE, so please show me a typical value of EVENTDATE) then this should work:
data want;
set have;
month_variable=month(eventdate);
run;
proc logistic data=want;
class month_variable;
model y = month_variable;
run;
No user-created dummy variables needed. SAS creates the dummy variables behind the scenes, so you don't have to. That's what the CLASS statement does.
Eventdate is given in this format: 08/27/2020
Then the code I gave should work.
Thank you so much...I really appreciate it!
@Lilo wrote:
Hi,
To answer your questions: I did run a proc contents and the event dates are numerical and in this format: mmddyy. Also, once I have the data, I will be running a chi-square and logistic regression analysis. My professor is asking to create a "dummy" variables for months: March through November.
So if your data is numeric why did you write a bunch of comparisons of Character values such as
IF (eventdate >= "03012020")
The information you show indicates the values are being treated as dates, which is correct.
You can create groups by applying desired date format to your variable to create groups honored by reporting, analysis and most graphing processes.
Changing the format to YYmmn. will group the data by year and month.
Try this code and see:
Proc freq data=work.import; tables eventdate; format eventdate yymmn.; run;
Seldom a good idea to separate year from month unless you really know what you are doing with the dates. Or you could use the MONTH format to display just the number of the month, or MONNAME format display the month of the year. The concept of Format is very important in SAS as it allows changing displayed values without changing the underlying value.
https://communities.sas.com/t5/SAS-Communities-Library/Working-with-Dates-and-Times-in-SAS-Tutorial/... has a PDF with much information about dates.
For beginning SAS users it is a poor idea to use code like:
data somedatasetname; set somedatasetname; <other code>
If you do not have a serious error that will replace the existing Somedatasetname data set completely. Which means that you may accidentally replace starting values and have to go back to your original data.
@ballardw wrote:
Seldom a good idea to separate year from month unless you really know what you are doing with the dates.
I agree. I get the impression this idea of using months March through November, without a year, comes from the professor, which may be fine for this specific problem, but it's poor logic in general.
But, shockingly, the professor wants dummy variables to be used here, which is completely unnecessary, and it seems as if the professor is leading the students down a sub-optimal path.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.