Hi,
I have a dataset with an ID variable, and a concatenation of the ID variable, a date variable, and a numeric variable. What I want is to take the date value out of the concatenated variable and format it as a date.
ID concatenated_var
12345 12345184231
123456 12345618444123
What I tried was this:
data want;
set have;
date_var=input(substr(ID, 5, 5), 5.);
run;
and then another step that formatted date_var as a date variable (for some reason when I tried to do that all in one step like this: date_var=input(substr(ID, 5, 5), mmddyy10.), every value of date_var was missing). It worked, but the problem is that I'm now finding that the ID variable has either 5 or 6 digits. The numeric variable at the end of the concatenated variable also has different lengths.
My question is whether there's a way to extract the date part of the concatenated variable (possibly using the ID variable since I do have that one; I could probably bring the numeric one in but don't currently have it in my dataset).
Any help is much appreciated.
The problem is that the ID is not always the same length--it is sometimes 5 and sometimes 6 digits.
The date part is 18423, the 1 is from the numeric variable which was also concatenated with the ID var and date var.
For one thing, you don't want to substring ID, you want to substring concatenated_var.
Secondly, you can use the length of ID to determine where to start in concatenated_var (position 6 or 7), e.g.
date_var=input(substr(concatenated_var,length(ID)+1,5),5.);
Then, assuming date_var is constant length (5) you can get your numeric var in a similar fashion.
You can use tranwrd() to remove the id from the concatenated string, then format accordingly:
data have;
input ID $6. concatenated_var $15.;
cards;
12345 1234512032015
123456 12345612032015
;
data want;
set have;
date = input(tranwrd(strip(concatenated_var),strip(id),trimn('')),mmddyy10.);
format date mmddyy10.;
run;
@Steelers_In_DC wrote:
You can use tranwrd() to remove the id from the concatenated string, then format accordingly:
data have;
input ID $6. concatenated_var $15.;
cards;
12345 1234512032015
123456 12345612032015
;
data want;
set have;
date = input(tranwrd(strip(concatenated_var),strip(id),trimn('')),mmddyy10.);
format date mmddyy10.;
run;
Two concerns regarding your approach,
1. OP's "concatenated_var" has some other elements besides ID and date.
2. What if id and date take the same value or their combination generate the same value? id=12345 concatenated_var=1234512345 // id=12312 concatenated_var=1231231221?
I did miss the trailing numbers so a substr() would be necessary:
data want;
format date mmddyy10.;
set have;
test = put(input(substr(strip(tranwrd(strip(concatenated_var),strip(id),trimn(''))),1,5),8.),mmddyy10.);
date = input(test,mmddyy10.);
drop test;
run;
If there are more dates or other numbers placed throughout the string I agree this would not be a good solution.
@Steelers_In_DC: I'm sorry, but I think you've overlooked three things:
Understanding that all you need is to get the 5 digits after ID, @JoshB's classic solution will get what you need. You need also to check the data, making sure all of the dates are 5 digits (eg. 09999 for 18May1987). By the same token, using PRX functions we can also have:
data have; input (id date_var) (:$20.); cards; 12345 12345184231 123456 12345618444123 ; data want; set have; format dt date9.; dt=input(prxchange(cats('s/(',id,')(\d{5})(.*)/$2/'),-1,date_var),5.); run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.