Hi there,
When I was processing dataset this morning, I faced a problem.
Let's take a look at my data first:
30210027甲三郎30210397乙先天冰30210069丙小姐30210493丁阿旺
30210044物阿梅30210021己阿有30240052更有言30210346心一?
...
Basic it's in the pattern of "8-digit number" + "Chinese name of a person", and it is stored in the text file.
My problem is: I don't know how to divide this seemingly random variable into the way I want?
As I said, I want to divide each row into "ID1" + "Name1" + "ID2" + "Name2" + "ID3" + "Name3" + "ID4" + "Name4"
As a whole, I wish the data looked like
ID1 Name1 ID2 Name2 ID3 Name3 ID4 Name4
30210027 甲三郎 30210397 乙先天冰 30210069 丙小姐 30210493 丁阿旺
30210044 物阿梅 30210021 己阿有 30240052 更有言 30210346 心一?
...
I've tried three ways
#1 Import
It leads to a result that I wouldn't conclude it as a failure, but it's definitely not a success:
#2 Infile
Some variables are wrong because of the random length of the Chinese name, plus there's no delimiter between each variable.
#3 Adding delimiter, such as ",", by my hand.
For this method, I added a comma between each variable.
It works actually. However, considering the total of around 10000 variables awaits me to process, It may not work that well as I thought...
Therefore, I'm here to ask for your kindly help.
Any suggestions would be helpful.
Thanks in advance!
data have;
infile cards truncover;
input x $80.;
cards;
30210027甲三郎30210397乙先天冰30210069丙小姐30210493丁阿旺
30210044物阿梅30210021己阿有30240052更有言30210346心一?
;
run;
data want;
set have;
id1=scan(x,1,,'kd');
name1=scan(x,1,,'d');
id2=scan(x,2,,'kd');
name2=scan(x,2,,'d');
id3=scan(x,3,,'kd');
name3=scan(x,3,,'d');
id4=scan(x,4,,'kd');
name4=scan(x,4,,'d');
run;
I would go for the first one "import" which puts it all in one variable - var.
Then i would walk through the dataset and for each line do a loop (do i = 1 to length(var)) and check each position in var, if its a numeric one then add it to a id variable and if it is a character put it as a name.
Usually I always ask for delimiter in the files sent to system, to prevent this kind of logic 🙂
//fredrik
Hello,
You can use a regex to add commas in your file each time a digit is followed by a non-digit
Edit : and vice-versa
Example with sed :
> cat > test << EOF
> 012321akjghjkghj8576543jhgjhgjhg657654654kugjyfj
> 354354354jytjhfhgxchs9687657654jhgjhgjhgj65746576354yrzgredh
> EOF
> sed "s/\([0-9]\)\([^0-9]\)/$1,$2/g; s/\([^0-9]\)\([0-9]\)/$1,$2/g" test
01232,kjghjkgh,57654,hgjhgjh,5765465,ugjyfj
35435435,ytjhfhgxch,68765765,hgjhgjhg,574657635,rzgredh
data have;
input long_text $60.;
datalines;
30210027aaaaaaaa30210397bbbb30210069cccc30210493dddd
run;
data want;
set have;
name_list=translate(long_text,' ','0123456789');
array name {6} $10;
do I=1 to countw(name_list);
name{I}=scan(name_list,I,' ');
long_text=transtrn(long_text,trim(name{I}),' ');
end;
array id{6};
do I=1 to countw(long_text);
id{I}=input(scan(long_text,I,' '),10.);
end;
drop name_text long_text;
run;
data have;
infile cards truncover;
input x $80.;
cards;
30210027甲三郎30210397乙先天冰30210069丙小姐30210493丁阿旺
30210044物阿梅30210021己阿有30240052更有言30210346心一?
;
run;
data want;
set have;
id1=scan(x,1,,'kd');
name1=scan(x,1,,'d');
id2=scan(x,2,,'kd');
name2=scan(x,2,,'d');
id3=scan(x,3,,'kd');
name3=scan(x,3,,'d');
id4=scan(x,4,,'kd');
name4=scan(x,4,,'d');
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.