DATA Step, Macro, Functions and more

Separating a concatenated variable

Reply
Frequent Contributor
Posts: 138

Separating a concatenated variable

Hi,

 

I have a dataset with an ID variable, and a concatenation of the ID variable, a date variable, and a numeric variable. What I want is to take the date value out of the concatenated variable and format it as a date. 

 

ID          concatenated_var

12345         12345184231

123456       12345618444123

 

What I tried was this:

data want;

set have;

date_var=input(substr(ID, 5, 5), 5.);

run;

 

and then another step that formatted date_var as a date variable (for some reason when I tried to do that all in one step like this: date_var=input(substr(ID, 5, 5), mmddyy10.), every value of date_var was missing). It worked, but the problem is that I'm now finding that the ID variable has either 5 or 6 digits. The numeric variable at the end of the concatenated variable also has different lengths.

 

My question is whether there's a way to extract the date part of the concatenated variable (possibly using the ID variable since I do have that one; I could probably bring the numeric one in but don't currently have it in my dataset).

 

Any help is much appreciated. 

Super User
Posts: 19,768

Re: Separating a concatenated variable

Posted in reply to Walternate
Well, if you have the length of the ID you know how long to make your substring. What is that date supposed to represent though, mmdddyy - 18 42 31?



Frequent Contributor
Posts: 138

Re: Separating a concatenated variable

The problem is that the ID is not always the same length--it is sometimes 5 and sometimes 6 digits. 

 

The date part is 18423, the 1 is from the numeric variable which was also concatenated with the ID var and date var. 

Contributor
Posts: 56

Re: Separating a concatenated variable

[ Edited ]
Posted in reply to Walternate

 

For one thing, you don't want to substring ID, you want to substring concatenated_var. 

Secondly, you can use the length of ID to determine where to start in concatenated_var (position 6 or 7), e.g.

 

date_var=input(substr(concatenated_var,length(ID)+1,5),5.);

Then, assuming date_var is constant length (5) you can get your numeric var in a similar fashion. 

Valued Guide
Posts: 860

Re: Separating a concatenated variable

Posted in reply to Walternate

You can use tranwrd() to remove the id from the concatenated string, then format accordingly:

 

data have;
input ID $6. concatenated_var $15.;
cards;
12345  1234512032015
123456 12345612032015
;

data want;
set have;
date = input(tranwrd(strip(concatenated_var),strip(id),trimn('')),mmddyy10.);
format date mmddyy10.;
run;

Respected Advisor
Posts: 3,156

Re: Separating a concatenated variable

Posted in reply to Steelers_In_DC

Steelers_In_DC wrote:

You can use tranwrd() to remove the id from the concatenated string, then format accordingly:

 

data have;
input ID $6. concatenated_var $15.;
cards;
12345  1234512032015
123456 12345612032015
;

data want;
set have;
date = input(tranwrd(strip(concatenated_var),strip(id),trimn('')),mmddyy10.);
format date mmddyy10.;
run;


Two concerns regarding your approach,

1. OP's "concatenated_var" has some other elements besides ID and date.

2. What if id and date take the same value or their combination generate the same value? id=12345 concatenated_var=1234512345 // id=12312 concatenated_var=1231231221?

 

 

Valued Guide
Posts: 860

Re: Separating a concatenated variable

I did miss the trailing numbers so a substr() would be necessary:

 

data want;
format date mmddyy10.;
set have;
test = put(input(substr(strip(tranwrd(strip(concatenated_var),strip(id),trimn(''))),1,5),8.),mmddyy10.);
date = input(test,mmddyy10.);
drop test;
run;

 

If there are more dates or other numbers placed throughout the string I agree this would not be a good solution.

 

Trusted Advisor
Posts: 1,117

Re: Separating a concatenated variable

Posted in reply to Steelers_In_DC

@Steelers_In_DC: I'm sorry, but I think you've overlooked three things:

  1. There is an additional "numeric" value at the end of @Walternate's concatenated_var.
  2. The date values embedded in @Walternate's concatenated_var appear to be SAS date values, but at least not MMDDYY values (although, I must say, I have never come across such data in the past 18 years).
  3. Most importantly, I would not take the risk of deleting or replacing substrings in a concatenated string, if there was uncertainty as to where the pattern match would occur. In the worst case, it could even occur twice. Example: ID='84231', CONCATENATED_VAR='84231184231' (including the same "date" value 18423 as in the original sample data). TRANWRD would turn this into a single '1'.

 

Respected Advisor
Posts: 3,156

Re: Separating a concatenated variable

Posted in reply to Walternate

Understanding that all you need is to get the 5 digits after ID, @JoshB's classic solution will get what you need.  You need also to check the data, making sure all of the dates are 5 digits (eg. 09999 for 18May1987). By the same token, using PRX functions we can also have:

 

 

data have;
	input (id date_var) (:$20.);
	cards;
12345         12345184231
123456       12345618444123
;

data want;
	set have;
	format dt date9.;
	dt=input(prxchange(cats('s/(',id,')(\d{5})(.*)/$2/'),-1,date_var),5.);
run;

 

 

Ask a Question
Discussion stats
  • 8 replies
  • 337 views
  • 1 like
  • 6 in conversation