BookmarkSubscribeRSS Feed
xyxu
Quartz | Level 8

 

I have a dataset with two variables. The first variable is a unique ID for each observation, and the second is a character variable. I am trying to use function SCAN to decompose each value of the second variable into multiple separate observations. Below is an example of the original value:

 

[2011-12-12--2014-04-04] Moudy El Khodr;[2016-10-01--] Laurent van Tuyckom;[2014-04-04--2016-10-01] Ignace de Coene;[2014-04-04--2016-01-01] Alexander Roose;[2016-10-01--] Alexander Roose;[2014-04-04--2015-03-01] Koen Bosquet;

 

I want to obtain all characters between each delimiter

[

and delimiter

 

 

;

as a piece of information, and save it into an observation with the same ID. Thus, for this case, I want to convert the original observation into 6 separate observations. Different IDs might have different numbers of pieces of information, and I am using function

COUNTW(variable2, '[')

to generate this number. 

 

I read through the guide for function SCAN, but now my problem is that, how can I parse information between two specific and different delimiters?

 

6 REPLIES 6
rudfaden
Lapis Lazuli | Level 10

You might want to try using regex.

 

data have;
variable="[2011-12-12--2014-04-04] Moudy El Khodr;"; output;
variable="[2016-10-01--] Laurent van Tuyckom;"; output;
variable="[2014-04-04--2016-10-01] Ignace de Coene;"; output;
variable="[2014-04-04--2016-01-01] Alexander Roose;"; output;
variable="[2016-10-01--] Alexander Roose;"; output;
variable="[2014-04-04--2015-03-01] Koen Bosquet;"; output;
run;


data want;
set have;
	newString=prxchange('s/\[(.*);/$1/', -1, variable);	
run;
 
xyxu
Quartz | Level 8
The six pieces of information were stored in as one value of the second variable in original data. If they were already separate as in your dataset "have", I am done.
rudfaden
Lapis Lazuli | Level 10

Ahh. Okay. I misunderstood your problem.

 

Try this: 

data have;
variable="[2011-12-12--2014-04-04] Moudy El Khodr;[2016-10-01--] Laurent van Tuyckom;[2014-04-04--2016-10-01] Ignace de Coene;[2014-04-04--2016-01-01] Alexander Roose;[2016-10-01--] Alexander Roose;[2014-04-04--2015-03-01] Koen Bosquet;";
run;

data want;
set have;
num=1;
start=1;
	do while (1);
		num=findc(variable,';',start); 
		if num = 0 then leave;
		variable2=substr(variable,start,num-start+1); output;
		start=num+1;
	end;
run;
 
xyxu
Quartz | Level 8

Thanks for your help. I am trying to use your code to process the dataset, which includes observations with missing values for the variable. Here is a sample and my modified version of your code:

 

data have;
length ID $3 variable $500;
ID = "001"; variable = "[2011-12-12--2014-04-04] David Tien;"; output;
ID = "002"; variable="[2011-12-12--2014-04-04] Moudy El Khodr;[2016-10-01--] Laurent van Tuyckom;[2014-04-04--2016-10-01] Ignace de Coene;[2014-04-04--2016-01-01] Alexander Roose;[2016-10-01--] Alexander Roose;[2014-04-04--2015-03-01] Koen Bosquet;"; output;
ID = "003"; variable = ""; output;
run;

data want;
set have;
	num_person = countw(variable, "[");
	if num_mgr = 0 then do;
		varible2 = ""; output;
	end;

	if num_person > 0 then do;
		num_find = 1;
		start = 1;
		i = 1;
		do while (i <= num_person);
			num_find =findc(variable, ';', start); 
			variable2=substr(variable, start, num_find - start + 1); output;
			start=num_find + 1;
			i+1;
		end;
	end;
run;

It seems to be doing what I need, but the log shows some error:

NOTE: Invalid third argument to function SUBSTR at line 1200 column 23.

ID=003 variable=  num_mgr=1 varible2=  num_find=0 start=1 i=2 variable2=  _ERROR_=1 _N_=3

Did I make any mistake?

rudfaden
Lapis Lazuli | Level 10

The problem is that Variable is empty, but you are telling Substr to extract on character, so it throws an error. You should include a check for Variable being empty.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Well you can use perl regex, or datastep:

data want;
  length str s new_wrd $2000;
  str="[2011-12-12--2014-04-04] Moudy El Khodr;[2016-10-01--] Laurent van Tuyckom;[2014-04-04--2016-10-01] Ignace de Coene;[2014-04-04--2016-01-01] Alexander Roose;[2016-10-01--] Alexander Roose;[2014-04-04--2015-03-01] Koen Bosquet;";
  s=str;
  do while(lengthn(s));
    new_wrd=substr(s,1,findc(s,";"));
    s=substr(s,findc(s,";")+1);
    output;
  end;
run;

If you wanted you could replace the output with a further parse of the individual components.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 3246 views
  • 1 like
  • 3 in conversation