I have a dataset with two variables. The first variable is a unique ID for each observation, and the second is a character variable. I am trying to use function SCAN to decompose each value of the second variable into multiple separate observations. Below is an example of the original value:
[2011-12-12--2014-04-04] Moudy El Khodr;[2016-10-01--] Laurent van Tuyckom;[2014-04-04--2016-10-01] Ignace de Coene;[2014-04-04--2016-01-01] Alexander Roose;[2016-10-01--] Alexander Roose;[2014-04-04--2015-03-01] Koen Bosquet;
I want to obtain all characters between each delimiter
[
and delimiter
;as a piece of information, and save it into an observation with the same ID. Thus, for this case, I want to convert the original observation into 6 separate observations. Different IDs might have different numbers of pieces of information, and I am using function
COUNTW(variable2, '[')to generate this number.
I read through the guide for function SCAN, but now my problem is that, how can I parse information between two specific and different delimiters?
You might want to try using regex.
data have;
variable="[2011-12-12--2014-04-04] Moudy El Khodr;"; output;
variable="[2016-10-01--] Laurent van Tuyckom;"; output;
variable="[2014-04-04--2016-10-01] Ignace de Coene;"; output;
variable="[2014-04-04--2016-01-01] Alexander Roose;"; output;
variable="[2016-10-01--] Alexander Roose;"; output;
variable="[2014-04-04--2015-03-01] Koen Bosquet;"; output;
run;
data want;
set have;
	newString=prxchange('s/\[(.*);/$1/', -1, variable);	
run;
 Ahh. Okay. I misunderstood your problem.
Try this:
data have;
variable="[2011-12-12--2014-04-04] Moudy El Khodr;[2016-10-01--] Laurent van Tuyckom;[2014-04-04--2016-10-01] Ignace de Coene;[2014-04-04--2016-01-01] Alexander Roose;[2016-10-01--] Alexander Roose;[2014-04-04--2015-03-01] Koen Bosquet;";
run;
data want;
set have;
num=1;
start=1;
	do while (1);
		num=findc(variable,';',start); 
		if num = 0 then leave;
		variable2=substr(variable,start,num-start+1); output;
		start=num+1;
	end;
run;
 Thanks for your help. I am trying to use your code to process the dataset, which includes observations with missing values for the variable. Here is a sample and my modified version of your code:
data have;
length ID $3 variable $500;
ID = "001"; variable = "[2011-12-12--2014-04-04] David Tien;"; output;
ID = "002"; variable="[2011-12-12--2014-04-04] Moudy El Khodr;[2016-10-01--] Laurent van Tuyckom;[2014-04-04--2016-10-01] Ignace de Coene;[2014-04-04--2016-01-01] Alexander Roose;[2016-10-01--] Alexander Roose;[2014-04-04--2015-03-01] Koen Bosquet;"; output;
ID = "003"; variable = ""; output;
run;
data want;
set have;
	num_person = countw(variable, "[");
	if num_mgr = 0 then do;
		varible2 = ""; output;
	end;
	if num_person > 0 then do;
		num_find = 1;
		start = 1;
		i = 1;
		do while (i <= num_person);
			num_find =findc(variable, ';', start); 
			variable2=substr(variable, start, num_find - start + 1); output;
			start=num_find + 1;
			i+1;
		end;
	end;
run;It seems to be doing what I need, but the log shows some error:
NOTE: Invalid third argument to function SUBSTR at line 1200 column 23.
ID=003 variable= num_mgr=1 varible2= num_find=0 start=1 i=2 variable2= _ERROR_=1 _N_=3
Did I make any mistake?
The problem is that Variable is empty, but you are telling Substr to extract on character, so it throws an error. You should include a check for Variable being empty.
Well you can use perl regex, or datastep:
data want;
  length str s new_wrd $2000;
  str="[2011-12-12--2014-04-04] Moudy El Khodr;[2016-10-01--] Laurent van Tuyckom;[2014-04-04--2016-10-01] Ignace de Coene;[2014-04-04--2016-01-01] Alexander Roose;[2016-10-01--] Alexander Roose;[2014-04-04--2015-03-01] Koen Bosquet;";
  s=str;
  do while(lengthn(s));
    new_wrd=substr(s,1,findc(s,";"));
    s=substr(s,findc(s,";")+1);
    output;
  end;
run;
If you wanted you could replace the output with a further parse of the individual components.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
