BookmarkSubscribeRSS Feed
xyxu
Quartz | Level 8

 

I have a dataset with two variables. The first variable is a unique ID for each observation, and the second is a character variable. I am trying to use function SCAN to decompose each value of the second variable into multiple separate observations. Below is an example of the original value:

 

[2011-12-12--2014-04-04] Moudy El Khodr;[2016-10-01--] Laurent van Tuyckom;[2014-04-04--2016-10-01] Ignace de Coene;[2014-04-04--2016-01-01] Alexander Roose;[2016-10-01--] Alexander Roose;[2014-04-04--2015-03-01] Koen Bosquet;

 

I want to obtain all characters between each delimiter

[

and delimiter

 

 

;

as a piece of information, and save it into an observation with the same ID. Thus, for this case, I want to convert the original observation into 6 separate observations. Different IDs might have different numbers of pieces of information, and I am using function

COUNTW(variable2, '[')

to generate this number. 

 

I read through the guide for function SCAN, but now my problem is that, how can I parse information between two specific and different delimiters?

 

6 REPLIES 6
rudfaden
Lapis Lazuli | Level 10

You might want to try using regex.

 

data have;
variable="[2011-12-12--2014-04-04] Moudy El Khodr;"; output;
variable="[2016-10-01--] Laurent van Tuyckom;"; output;
variable="[2014-04-04--2016-10-01] Ignace de Coene;"; output;
variable="[2014-04-04--2016-01-01] Alexander Roose;"; output;
variable="[2016-10-01--] Alexander Roose;"; output;
variable="[2014-04-04--2015-03-01] Koen Bosquet;"; output;
run;


data want;
set have;
	newString=prxchange('s/\[(.*);/$1/', -1, variable);	
run;
 
xyxu
Quartz | Level 8
The six pieces of information were stored in as one value of the second variable in original data. If they were already separate as in your dataset "have", I am done.
rudfaden
Lapis Lazuli | Level 10

Ahh. Okay. I misunderstood your problem.

 

Try this: 

data have;
variable="[2011-12-12--2014-04-04] Moudy El Khodr;[2016-10-01--] Laurent van Tuyckom;[2014-04-04--2016-10-01] Ignace de Coene;[2014-04-04--2016-01-01] Alexander Roose;[2016-10-01--] Alexander Roose;[2014-04-04--2015-03-01] Koen Bosquet;";
run;

data want;
set have;
num=1;
start=1;
	do while (1);
		num=findc(variable,';',start); 
		if num = 0 then leave;
		variable2=substr(variable,start,num-start+1); output;
		start=num+1;
	end;
run;
 
xyxu
Quartz | Level 8

Thanks for your help. I am trying to use your code to process the dataset, which includes observations with missing values for the variable. Here is a sample and my modified version of your code:

 

data have;
length ID $3 variable $500;
ID = "001"; variable = "[2011-12-12--2014-04-04] David Tien;"; output;
ID = "002"; variable="[2011-12-12--2014-04-04] Moudy El Khodr;[2016-10-01--] Laurent van Tuyckom;[2014-04-04--2016-10-01] Ignace de Coene;[2014-04-04--2016-01-01] Alexander Roose;[2016-10-01--] Alexander Roose;[2014-04-04--2015-03-01] Koen Bosquet;"; output;
ID = "003"; variable = ""; output;
run;

data want;
set have;
	num_person = countw(variable, "[");
	if num_mgr = 0 then do;
		varible2 = ""; output;
	end;

	if num_person > 0 then do;
		num_find = 1;
		start = 1;
		i = 1;
		do while (i <= num_person);
			num_find =findc(variable, ';', start); 
			variable2=substr(variable, start, num_find - start + 1); output;
			start=num_find + 1;
			i+1;
		end;
	end;
run;

It seems to be doing what I need, but the log shows some error:

NOTE: Invalid third argument to function SUBSTR at line 1200 column 23.

ID=003 variable=  num_mgr=1 varible2=  num_find=0 start=1 i=2 variable2=  _ERROR_=1 _N_=3

Did I make any mistake?

rudfaden
Lapis Lazuli | Level 10

The problem is that Variable is empty, but you are telling Substr to extract on character, so it throws an error. You should include a check for Variable being empty.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Well you can use perl regex, or datastep:

data want;
  length str s new_wrd $2000;
  str="[2011-12-12--2014-04-04] Moudy El Khodr;[2016-10-01--] Laurent van Tuyckom;[2014-04-04--2016-10-01] Ignace de Coene;[2014-04-04--2016-01-01] Alexander Roose;[2016-10-01--] Alexander Roose;[2014-04-04--2015-03-01] Koen Bosquet;";
  s=str;
  do while(lengthn(s));
    new_wrd=substr(s,1,findc(s,";"));
    s=substr(s,findc(s,";")+1);
    output;
  end;
run;

If you wanted you could replace the output with a further parse of the individual components.

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 3171 views
  • 1 like
  • 3 in conversation