BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
NewUsrStat
Lapis Lazuli | Level 10

Hi guys, 

I'm trying to extract the year and the month from dates like " 20213". The year is "2021" and the month is "3" (March). I'm using the following code that unfortunately does not work for dates like 202212. 

(year: 2022 and month 12). 

 

data DB;
   set mydb;
   var1= substr(Var,1,5);
   Var2= substr(Var,6,2); 
 run;

With the code above year is 20221 and month 2.

Var is a character Var of length 6 and contains dates like  20236,  20224, 200511, 202212, ...

 

Can anyone help me please? 

 

Thank you in advance

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

The problem with SUBSTR is that Year and Month should be NUMERIC, but SUBSTR makes them character. In fact, best practice is that Year and Month ought to be combined into a valid SAS date value, instead of in separate variables. So modifying @antonbcristina code,

 

data db;
	set mydb;
	year=input(substr(var,1,4),4.);
	month=input(substr(var,5),2.);
    yearmonth=mdy(month,1,year);
    format yearmonth yymmn6.;
run;

 

You can perform numerical (like subtraction, or count the number of months since starting point) or logical operations on these values if they are numeric, and they will sort properly as well if they are numeric. If you leave them as character strings, none of that is possible, and they may not sort properly.

 

Lastly, I don't know who is creating months that can be five character digits such as 20236, but this is a very poor practice that should be avoided. Use numeric 202306 if you must, or better still is to create valid SAS date values, such as yearmonth in the code above.

 

--
Paige Miller

View solution in original post

4 REPLIES 4
antonbcristina
SAS Super FREQ

Hi @NewUsrStat, try using the SUBSTR function without a length provided to read until the end of the string. Here's the test dataset I used and adjusted code:

 

data mydb;
	input var $6.;
datalines;
20213
202212
;
run;

data db;
	set mydb;
	year=substr(var,1,4);
	month=substr(var,5);
run;

Notice that to read in the year, you'll want to start at position 1 and read for a length of 4 characters. For the month, you'll start reading at position 5 and without the last argument provided (length), continue reading until the end of the string. This should capture both single and double digit month numbers. 

PaigeMiller
Diamond | Level 26

The problem with SUBSTR is that Year and Month should be NUMERIC, but SUBSTR makes them character. In fact, best practice is that Year and Month ought to be combined into a valid SAS date value, instead of in separate variables. So modifying @antonbcristina code,

 

data db;
	set mydb;
	year=input(substr(var,1,4),4.);
	month=input(substr(var,5),2.);
    yearmonth=mdy(month,1,year);
    format yearmonth yymmn6.;
run;

 

You can perform numerical (like subtraction, or count the number of months since starting point) or logical operations on these values if they are numeric, and they will sort properly as well if they are numeric. If you leave them as character strings, none of that is possible, and they may not sort properly.

 

Lastly, I don't know who is creating months that can be five character digits such as 20236, but this is a very poor practice that should be avoided. Use numeric 202306 if you must, or better still is to create valid SAS date values, such as yearmonth in the code above.

 

--
Paige Miller
Tom
Super User Tom
Super User

If the year part is only 4 digits why did you split the string after the 5th character?

Do some of your values with only one digit months have a leading space?

Also are you sure your existing variable is character?  If it is numeric then SAS will use the BEST12. format when converting it to a string to pass to the SUBSTR() function. Which means they will have either 6 or 7 leading spaces.

data DB;
   set mydb;
   var1= substr(left(Var),1,4);
   Var2= substr(left(Var),5,2); 
 run;
ballardw
Super User

First a comment: character values are not "dates", they are character values.

Second comment: whoever doesn't provide months in two digits is just asking for problems down the line has you have discovered.

 

Strong Suggestion: Make an actual SAS date value from those strings. Once you have an actual date there are functions to extract information like Year and Month plus the values can actually be used in things like interval or duration calculations. Plus changes in assigned format will handle a lot of reporting issues such as quarterly or annual reporting.

A single DATE variable will sort properly for reports or graphs where two variables are often cumbersome.

SAS provides many formats and Proc Format will allow you create your own that are much easier than trying to parse raw values.

data example;
   input var $;
   date= input(var,yymmn6.);
   format date date9.;
   Year = Year(date);
   Month= month(date);
datalines;
20236
20224 
200511 
202212
;

Note that the INFORMAT YYMMN6 is smart enough to parse out the 4 digit year with the remainder treated as a month. It does impute the day of the month to be 1.

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1067 views
  • 0 likes
  • 5 in conversation