DATA Step, Macro, Functions and more

Separating a concatenated variable

Reply
Frequent Contributor
Posts: 138

Separating a concatenated variable

Hi,

 

I have a dataset with an ID variable, and a concatenation of the ID variable, a date variable, and a numeric variable. What I want is to take the date value out of the concatenated variable and format it as a date. 

 

ID          concatenated_var

12345         12345184231

123456       12345618444123

 

What I tried was this:

data want;

set have;

date_var=input(substr(ID, 5, 5), 5.);

run;

 

and then another step that formatted date_var as a date variable (for some reason when I tried to do that all in one step like this: date_var=input(substr(ID, 5, 5), mmddyy10.), every value of date_var was missing). It worked, but the problem is that I'm now finding that the ID variable has either 5 or 6 digits. The numeric variable at the end of the concatenated variable also has different lengths.

 

My question is whether there's a way to extract the date part of the concatenated variable (possibly using the ID variable since I do have that one; I could probably bring the numeric one in but don't currently have it in my dataset).

 

Any help is much appreciated. 

Super User
Posts: 17,837

Re: Separating a concatenated variable

Well, if you have the length of the ID you know how long to make your substring. What is that date supposed to represent though, mmdddyy - 18 42 31?



Frequent Contributor
Posts: 138

Re: Separating a concatenated variable

The problem is that the ID is not always the same length--it is sometimes 5 and sometimes 6 digits. 

 

The date part is 18423, the 1 is from the numeric variable which was also concatenated with the ID var and date var. 

Contributor
Posts: 55

Re: Separating a concatenated variable

[ Edited ]

 

For one thing, you don't want to substring ID, you want to substring concatenated_var. 

Secondly, you can use the length of ID to determine where to start in concatenated_var (position 6 or 7), e.g.

 

date_var=input(substr(concatenated_var,length(ID)+1,5),5.);

Then, assuming date_var is constant length (5) you can get your numeric var in a similar fashion. 

Valued Guide
Posts: 858

Re: Separating a concatenated variable

You can use tranwrd() to remove the id from the concatenated string, then format accordingly:

 

data have;
input ID $6. concatenated_var $15.;
cards;
12345  1234512032015
123456 12345612032015
;

data want;
set have;
date = input(tranwrd(strip(concatenated_var),strip(id),trimn('')),mmddyy10.);
format date mmddyy10.;
run;

Respected Advisor
Posts: 3,124

Re: Separating a concatenated variable


Steelers_In_DC wrote:

You can use tranwrd() to remove the id from the concatenated string, then format accordingly:

 

data have;
input ID $6. concatenated_var $15.;
cards;
12345  1234512032015
123456 12345612032015
;

data want;
set have;
date = input(tranwrd(strip(concatenated_var),strip(id),trimn('')),mmddyy10.);
format date mmddyy10.;
run;


Two concerns regarding your approach,

1. OP's "concatenated_var" has some other elements besides ID and date.

2. What if id and date take the same value or their combination generate the same value? id=12345 concatenated_var=1234512345 // id=12312 concatenated_var=1231231221?

 

 

Valued Guide
Posts: 858

Re: Separating a concatenated variable

I did miss the trailing numbers so a substr() would be necessary:

 

data want;
format date mmddyy10.;
set have;
test = put(input(substr(strip(tranwrd(strip(concatenated_var),strip(id),trimn(''))),1,5),8.),mmddyy10.);
date = input(test,mmddyy10.);
drop test;
run;

 

If there are more dates or other numbers placed throughout the string I agree this would not be a good solution.

 

Trusted Advisor
Posts: 1,115

Re: Separating a concatenated variable

@Steelers_In_DC: I'm sorry, but I think you've overlooked three things:

  1. There is an additional "numeric" value at the end of @Walternate's concatenated_var.
  2. The date values embedded in @Walternate's concatenated_var appear to be SAS date values, but at least not MMDDYY values (although, I must say, I have never come across such data in the past 18 years).
  3. Most importantly, I would not take the risk of deleting or replacing substrings in a concatenated string, if there was uncertainty as to where the pattern match would occur. In the worst case, it could even occur twice. Example: ID='84231', CONCATENATED_VAR='84231184231' (including the same "date" value 18423 as in the original sample data). TRANWRD would turn this into a single '1'.

 

Respected Advisor
Posts: 3,124

Re: Separating a concatenated variable

Understanding that all you need is to get the 5 digits after ID, @JoshB's classic solution will get what you need.  You need also to check the data, making sure all of the dates are 5 digits (eg. 09999 for 18May1987). By the same token, using PRX functions we can also have:

 

 

data have;
	input (id date_var) (:$20.);
	cards;
12345         12345184231
123456       12345618444123
;

data want;
	set have;
	format dt date9.;
	dt=input(prxchange(cats('s/(',id,')(\d{5})(.*)/$2/'),-1,date_var),5.);
run;

 

 

Ask a Question
Discussion stats
  • 8 replies
  • 332 views
  • 1 like
  • 6 in conversation