I have a dataset with two variables. The first variable is a unique ID for each observation, and the second is a character variable. I am trying to use function SCAN to decompose each value of the second variable into multiple separate observations. Below is an example of the original value:
[2011-12-12--2014-04-04] Moudy El Khodr;[2016-10-01--] Laurent van Tuyckom;[2014-04-04--2016-10-01] Ignace de Coene;[2014-04-04--2016-01-01] Alexander Roose;[2016-10-01--] Alexander Roose;[2014-04-04--2015-03-01] Koen Bosquet;
I want to obtain all characters between each delimiter
[
and delimiter
;
as a piece of information, and save it into an observation with the same ID. Thus, for this case, I want to convert the original observation into 6 separate observations. Different IDs might have different numbers of pieces of information, and I am using function
COUNTW(variable2, '[')
to generate this number.
I read through the guide for function SCAN, but now my problem is that, how can I parse information between two specific and different delimiters?
You might want to try using regex.
data have;
variable="[2011-12-12--2014-04-04] Moudy El Khodr;"; output;
variable="[2016-10-01--] Laurent van Tuyckom;"; output;
variable="[2014-04-04--2016-10-01] Ignace de Coene;"; output;
variable="[2014-04-04--2016-01-01] Alexander Roose;"; output;
variable="[2016-10-01--] Alexander Roose;"; output;
variable="[2014-04-04--2015-03-01] Koen Bosquet;"; output;
run;
data want;
set have;
newString=prxchange('s/\[(.*);/$1/', -1, variable);
run;
Ahh. Okay. I misunderstood your problem.
Try this:
data have;
variable="[2011-12-12--2014-04-04] Moudy El Khodr;[2016-10-01--] Laurent van Tuyckom;[2014-04-04--2016-10-01] Ignace de Coene;[2014-04-04--2016-01-01] Alexander Roose;[2016-10-01--] Alexander Roose;[2014-04-04--2015-03-01] Koen Bosquet;";
run;
data want;
set have;
num=1;
start=1;
do while (1);
num=findc(variable,';',start);
if num = 0 then leave;
variable2=substr(variable,start,num-start+1); output;
start=num+1;
end;
run;
Thanks for your help. I am trying to use your code to process the dataset, which includes observations with missing values for the variable. Here is a sample and my modified version of your code:
data have;
length ID $3 variable $500;
ID = "001"; variable = "[2011-12-12--2014-04-04] David Tien;"; output;
ID = "002"; variable="[2011-12-12--2014-04-04] Moudy El Khodr;[2016-10-01--] Laurent van Tuyckom;[2014-04-04--2016-10-01] Ignace de Coene;[2014-04-04--2016-01-01] Alexander Roose;[2016-10-01--] Alexander Roose;[2014-04-04--2015-03-01] Koen Bosquet;"; output;
ID = "003"; variable = ""; output;
run;
data want;
set have;
num_person = countw(variable, "[");
if num_mgr = 0 then do;
varible2 = ""; output;
end;
if num_person > 0 then do;
num_find = 1;
start = 1;
i = 1;
do while (i <= num_person);
num_find =findc(variable, ';', start);
variable2=substr(variable, start, num_find - start + 1); output;
start=num_find + 1;
i+1;
end;
end;
run;
It seems to be doing what I need, but the log shows some error:
NOTE: Invalid third argument to function SUBSTR at line 1200 column 23.
ID=003 variable= num_mgr=1 varible2= num_find=0 start=1 i=2 variable2= _ERROR_=1 _N_=3
Did I make any mistake?
The problem is that Variable is empty, but you are telling Substr to extract on character, so it throws an error. You should include a check for Variable being empty.
Well you can use perl regex, or datastep:
data want; length str s new_wrd $2000; str="[2011-12-12--2014-04-04] Moudy El Khodr;[2016-10-01--] Laurent van Tuyckom;[2014-04-04--2016-10-01] Ignace de Coene;[2014-04-04--2016-01-01] Alexander Roose;[2016-10-01--] Alexander Roose;[2014-04-04--2015-03-01] Koen Bosquet;"; s=str; do while(lengthn(s)); new_wrd=substr(s,1,findc(s,";")); s=substr(s,findc(s,";")+1); output; end; run;
If you wanted you could replace the output with a further parse of the individual components.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.