BookmarkSubscribeRSS Feed
Walternate
Obsidian | Level 7

Hi,

 

I have a dataset with an ID variable, and a concatenation of the ID variable, a date variable, and a numeric variable. What I want is to take the date value out of the concatenated variable and format it as a date. 

 

ID          concatenated_var

12345         12345184231

123456       12345618444123

 

What I tried was this:

data want;

set have;

date_var=input(substr(ID, 5, 5), 5.);

run;

 

and then another step that formatted date_var as a date variable (for some reason when I tried to do that all in one step like this: date_var=input(substr(ID, 5, 5), mmddyy10.), every value of date_var was missing). It worked, but the problem is that I'm now finding that the ID variable has either 5 or 6 digits. The numeric variable at the end of the concatenated variable also has different lengths.

 

My question is whether there's a way to extract the date part of the concatenated variable (possibly using the ID variable since I do have that one; I could probably bring the numeric one in but don't currently have it in my dataset).

 

Any help is much appreciated. 

8 REPLIES 8
Reeza
Super User
Well, if you have the length of the ID you know how long to make your substring. What is that date supposed to represent though, mmdddyy - 18 42 31?



Walternate
Obsidian | Level 7

The problem is that the ID is not always the same length--it is sometimes 5 and sometimes 6 digits. 

 

The date part is 18423, the 1 is from the numeric variable which was also concatenated with the ID var and date var. 

JoshB
Quartz | Level 8

 

For one thing, you don't want to substring ID, you want to substring concatenated_var. 

Secondly, you can use the length of ID to determine where to start in concatenated_var (position 6 or 7), e.g.

 

date_var=input(substr(concatenated_var,length(ID)+1,5),5.);

Then, assuming date_var is constant length (5) you can get your numeric var in a similar fashion. 

Steelers_In_DC
Barite | Level 11

You can use tranwrd() to remove the id from the concatenated string, then format accordingly:

 

data have;
input ID $6. concatenated_var $15.;
cards;
12345  1234512032015
123456 12345612032015
;

data want;
set have;
date = input(tranwrd(strip(concatenated_var),strip(id),trimn('')),mmddyy10.);
format date mmddyy10.;
run;

Haikuo
Onyx | Level 15

@Steelers_In_DC wrote:

You can use tranwrd() to remove the id from the concatenated string, then format accordingly:

 

data have;
input ID $6. concatenated_var $15.;
cards;
12345  1234512032015
123456 12345612032015
;

data want;
set have;
date = input(tranwrd(strip(concatenated_var),strip(id),trimn('')),mmddyy10.);
format date mmddyy10.;
run;


Two concerns regarding your approach,

1. OP's "concatenated_var" has some other elements besides ID and date.

2. What if id and date take the same value or their combination generate the same value? id=12345 concatenated_var=1234512345 // id=12312 concatenated_var=1231231221?

 

 

Steelers_In_DC
Barite | Level 11

I did miss the trailing numbers so a substr() would be necessary:

 

data want;
format date mmddyy10.;
set have;
test = put(input(substr(strip(tranwrd(strip(concatenated_var),strip(id),trimn(''))),1,5),8.),mmddyy10.);
date = input(test,mmddyy10.);
drop test;
run;

 

If there are more dates or other numbers placed throughout the string I agree this would not be a good solution.

 

FreelanceReinh
Jade | Level 19

@Steelers_In_DC: I'm sorry, but I think you've overlooked three things:

  1. There is an additional "numeric" value at the end of @Walternate's concatenated_var.
  2. The date values embedded in @Walternate's concatenated_var appear to be SAS date values, but at least not MMDDYY values (although, I must say, I have never come across such data in the past 18 years).
  3. Most importantly, I would not take the risk of deleting or replacing substrings in a concatenated string, if there was uncertainty as to where the pattern match would occur. In the worst case, it could even occur twice. Example: ID='84231', CONCATENATED_VAR='84231184231' (including the same "date" value 18423 as in the original sample data). TRANWRD would turn this into a single '1'.

 

Haikuo
Onyx | Level 15

Understanding that all you need is to get the 5 digits after ID, @JoshB's classic solution will get what you need.  You need also to check the data, making sure all of the dates are 5 digits (eg. 09999 for 18May1987). By the same token, using PRX functions we can also have:

 

 

data have;
	input (id date_var) (:$20.);
	cards;
12345         12345184231
123456       12345618444123
;

data want;
	set have;
	format dt date9.;
	dt=input(prxchange(cats('s/(',id,')(\d{5})(.*)/$2/'),-1,date_var),5.);
run;

 

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1144 views
  • 1 like
  • 6 in conversation