BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
sasgorilla
Obsidian | Level 7

Hi. I have a SAS dataset (A) with dates formatted into mmddyy10. format. I have another data set (B) originating from a CSV with a 4 digit numeric variable confirmed to represent YYMM (i.e., 20YYMM).  If merging these two to eventually do an analysis by year and month, how would you recommend formatting the data set B numeric variable?

 

 

data A;
input event date :yymmdd10.;
format date yymmdd10.;
datalines;
1 2020-01-01 
2 2018-07-06 
3 2015-02-15 
;
RUN;

data B;
input var yearmonth;
datalines;
1 1801 
2 1911 
3 2202 
;
RUN;

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
sasgorilla
Obsidian | Level 7

Thanks for all the help with this. I ultimately used code from @Stu_SAS  for dataset a:

data a2;
    set a;
    month = intnx('month', date, 0, 'B');
    format month monyy.;
run;

For dataset b I used a combination of code from @Tom and @Stu_SAS  as below: 

/*within my data step while infiling the csv; 
additional input variables excluded from code below for simplicity*/
  input yearmonth :yymmn.;
  format yearmonth yymmdd10.;
  month=intnx('month',yearmonth,0,'B');
  format month monyy.;

  

 

This ultimately allowed me to merge by month (and other variables) as I was hoping. Thank you!

View solution in original post

7 REPLIES 7
Tom
Super User Tom
Super User

When reading the CSV file use the read the YEARMONTH variable as a DATE by using the YYMMN informat.

(Note only use PROC IMPORT to read a CSV when exploring unknown files.  When you know what is in the file just write a data step to read the file instead of "importing" it.)

data A;
  input event date :yymmdd.;
  format date yymmdd10.;
datalines;
1 2020-01-01 
2 2018-07-06 
3 2015-02-15 
;

data B;
  input var yearmonth :yymmn.;
  format yearmonth yymmdd10.;
datalines;
1 1801 
2 1911 
3 2202 
;

You might need to worry about the YEARCUTOFF setting when using only 2 digits for the year.  In that case it might help to read the YYMM as a STRING and then use INPUT() to convert to a date.

data B;
  input var yymm :$4.;
  yearmonth=input(cats('20',yymm,'01',yymmdd8.);
  format yearmonth yymmdd10.;
datalines;
1 1801 
2 1911 
3 2202 
;

When comparing the dates convert the full date to the start of the month using the INTNX() function.

proc sql;
  create table want as select *
  from a,b
  where intnx('month',a.date,0) = b.yearmonth
  ;
quit;
Stu_SAS
SAS Employee

Hey @sasgorilla! If you need to analyze it by year and month, you'll want to align the dates to the first of the month and then merge them. To keep it easy to understand from a visual perspective, you can use the MONYY format to display them. For simplicity, let's assume there is only one day per month

 

data a2;
    set a;

    month = intnx('month', date, 0, 'B');

    format yearmonth monyy.;
run;

data b2;
    set b;
    
    month = input(put(yearmonth, $4.), yymmn4.);
   
    format yearmonth2 monyy.;
run;

 

This gets you two datasets whose dates are aligned to the first of the month but formatted to display as a year/month:

A2
event	date	    month
1	    2020-01-01	JAN20
2	    2018-07-06	JUL18
3	    2015-02-15	FEB15

B2
var	yearmonth	month
1	1801	    JAN18
2	1911	    NOV19
3	2202	    FEB22

If there are multiple values per month then you will need to aggregate them so that there is one value per month using SQL, PROC TIMESERIES, etc.

sasgorilla
Obsidian | Level 7

Stu, thanks for the reply. I errantly sent a response which I deleted. 

Okay, so when I ran your code on data set a: 

data datetest;
	set a;
	month=intnx('month',date,0,'B');
	format month monyy.;
run;

I end up getting what looks like a reasonable distribution, however the month display is in format not easily interpretable. See below: 

sasgorilla_0-1732732674386.png

 

How can I change the display of month to be like you have in your month column (e.g., JAN20, etc.)? 

Tom
Super User Tom
Super User

@sasgorilla wrote:

Stu, thanks for the reply. I errantly sent a response which I deleted. 

Okay, so when I ran your code on data set a: 

data datetest;
	set a;
	month=intnx('month',date,0,'B');
	format month monyy.;
run;

I end up getting what looks like a reasonable distribution, however the month display is in format not easily interpretable. See below: 

sasgorilla_0-1732732674386.png

 

How can I change the display of month to be like you have in your month column (e.g., JAN20, etc.)? 


That is what the FORMAT statement your posted code should have fixed.  That that step run properly?  Did you get any errors or warnings of notes about converting character to numbers or the reverse?

 

Make sure you are running the PROC FREQ against the dataset that have MONTH formatted properly.  Or add the FORMAT statement to the PROC FREQ step to make sure it uses the format you want.

sasgorilla
Obsidian | Level 7

You are right. I figured out the problem was that in the code Stu provided he had a different varible name in the format statement (yearmonth) which was uninitialized in dataset a. When I changed that variable to "month" then the correct result was produced. 

Ksharp
Super User

You could try GROUPFORMAT option of BY statement ,.

data A;
input event date :yymmdd10.;
format date yymmdd10.;
datalines;
1 2020-01-01 
2 2018-07-06 
3 2015-02-15 
;
RUN;

data B;
input var yearmonth;
date=mdy(mod(yearmonth,100),1,int(yearmonth/100));
datalines;
1 1801 
2 1911 
3 2202 
;
RUN;
proc sort data=A;by date;run;
proc sort data=B;by date;run;
data want;
 merge A B ;
 by date groupformat;
 format date yymmd7.;
run;

 

Or you can produce the YEAR and MONTH variables separatedly ,and use them in BY statement.


proc sort data=A;by year month;run;
proc sort data=B;by year month;run;
data want;
 merge A B ;
 by year month;
run;
sasgorilla
Obsidian | Level 7

Thanks for all the help with this. I ultimately used code from @Stu_SAS  for dataset a:

data a2;
    set a;
    month = intnx('month', date, 0, 'B');
    format month monyy.;
run;

For dataset b I used a combination of code from @Tom and @Stu_SAS  as below: 

/*within my data step while infiling the csv; 
additional input variables excluded from code below for simplicity*/
  input yearmonth :yymmn.;
  format yearmonth yymmdd10.;
  month=intnx('month',yearmonth,0,'B');
  format month monyy.;

  

 

This ultimately allowed me to merge by month (and other variables) as I was hoping. Thank you!

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1067 views
  • 4 likes
  • 4 in conversation