Hi. I have a SAS dataset (A) with dates formatted into mmddyy10. format. I have another data set (B) originating from a CSV with a 4 digit numeric variable confirmed to represent YYMM (i.e., 20YYMM). If merging these two to eventually do an analysis by year and month, how would you recommend formatting the data set B numeric variable?
data A;
input event date :yymmdd10.;
format date yymmdd10.;
datalines;
1 2020-01-01
2 2018-07-06
3 2015-02-15
;
RUN;
data B;
input var yearmonth;
datalines;
1 1801
2 1911
3 2202
;
RUN;
Thanks for all the help with this. I ultimately used code from @Stu_SAS for dataset a:
data a2;
set a;
month = intnx('month', date, 0, 'B');
format month monyy.;
run;
For dataset b I used a combination of code from @Tom and @Stu_SAS as below:
/*within my data step while infiling the csv;
additional input variables excluded from code below for simplicity*/
input yearmonth :yymmn.;
format yearmonth yymmdd10.;
month=intnx('month',yearmonth,0,'B');
format month monyy.;
This ultimately allowed me to merge by month (and other variables) as I was hoping. Thank you!
When reading the CSV file use the read the YEARMONTH variable as a DATE by using the YYMMN informat.
(Note only use PROC IMPORT to read a CSV when exploring unknown files. When you know what is in the file just write a data step to read the file instead of "importing" it.)
data A;
input event date :yymmdd.;
format date yymmdd10.;
datalines;
1 2020-01-01
2 2018-07-06
3 2015-02-15
;
data B;
input var yearmonth :yymmn.;
format yearmonth yymmdd10.;
datalines;
1 1801
2 1911
3 2202
;
You might need to worry about the YEARCUTOFF setting when using only 2 digits for the year. In that case it might help to read the YYMM as a STRING and then use INPUT() to convert to a date.
data B;
input var yymm :$4.;
yearmonth=input(cats('20',yymm,'01',yymmdd8.);
format yearmonth yymmdd10.;
datalines;
1 1801
2 1911
3 2202
;
When comparing the dates convert the full date to the start of the month using the INTNX() function.
proc sql;
create table want as select *
from a,b
where intnx('month',a.date,0) = b.yearmonth
;
quit;
Hey @sasgorilla! If you need to analyze it by year and month, you'll want to align the dates to the first of the month and then merge them. To keep it easy to understand from a visual perspective, you can use the MONYY format to display them. For simplicity, let's assume there is only one day per month
data a2;
set a;
month = intnx('month', date, 0, 'B');
format yearmonth monyy.;
run;
data b2;
set b;
month = input(put(yearmonth, $4.), yymmn4.);
format yearmonth2 monyy.;
run;
This gets you two datasets whose dates are aligned to the first of the month but formatted to display as a year/month:
A2
event date month
1 2020-01-01 JAN20
2 2018-07-06 JUL18
3 2015-02-15 FEB15
B2
var yearmonth month
1 1801 JAN18
2 1911 NOV19
3 2202 FEB22
If there are multiple values per month then you will need to aggregate them so that there is one value per month using SQL, PROC TIMESERIES, etc.
Stu, thanks for the reply. I errantly sent a response which I deleted.
Okay, so when I ran your code on data set a:
data datetest;
set a;
month=intnx('month',date,0,'B');
format month monyy.;
run;
I end up getting what looks like a reasonable distribution, however the month display is in format not easily interpretable. See below:
How can I change the display of month to be like you have in your month column (e.g., JAN20, etc.)?
@sasgorilla wrote:
Stu, thanks for the reply. I errantly sent a response which I deleted.
Okay, so when I ran your code on data set a:data datetest; set a; month=intnx('month',date,0,'B'); format month monyy.; run;
I end up getting what looks like a reasonable distribution, however the month display is in format not easily interpretable. See below:
How can I change the display of month to be like you have in your month column (e.g., JAN20, etc.)?
That is what the FORMAT statement your posted code should have fixed. That that step run properly? Did you get any errors or warnings of notes about converting character to numbers or the reverse?
Make sure you are running the PROC FREQ against the dataset that have MONTH formatted properly. Or add the FORMAT statement to the PROC FREQ step to make sure it uses the format you want.
You are right. I figured out the problem was that in the code Stu provided he had a different varible name in the format statement (yearmonth) which was uninitialized in dataset a. When I changed that variable to "month" then the correct result was produced.
You could try GROUPFORMAT option of BY statement ,.
data A;
input event date :yymmdd10.;
format date yymmdd10.;
datalines;
1 2020-01-01
2 2018-07-06
3 2015-02-15
;
RUN;
data B;
input var yearmonth;
date=mdy(mod(yearmonth,100),1,int(yearmonth/100));
datalines;
1 1801
2 1911
3 2202
;
RUN;
proc sort data=A;by date;run;
proc sort data=B;by date;run;
data want;
merge A B ;
by date groupformat;
format date yymmd7.;
run;
Or you can produce the YEAR and MONTH variables separatedly ,and use them in BY statement.
proc sort data=A;by year month;run;
proc sort data=B;by year month;run;
data want;
merge A B ;
by year month;
run;
Thanks for all the help with this. I ultimately used code from @Stu_SAS for dataset a:
data a2;
set a;
month = intnx('month', date, 0, 'B');
format month monyy.;
run;
For dataset b I used a combination of code from @Tom and @Stu_SAS as below:
/*within my data step while infiling the csv;
additional input variables excluded from code below for simplicity*/
input yearmonth :yymmn.;
format yearmonth yymmdd10.;
month=intnx('month',yearmonth,0,'B');
format month monyy.;
This ultimately allowed me to merge by month (and other variables) as I was hoping. Thank you!
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.