Solved: Re: Divide a variable into several variables

Chung-Li · Posted 09-26-2017 02:54 AM

Hi there,

When I was processing dataset this morning, I faced a problem.

Let's take a look at my data first:

30210027甲三郎30210397乙先天冰30210069丙小姐30210493丁阿旺

30210044物阿梅30210021己阿有30240052更有言30210346心一?

...

Basic it's in the pattern of "8-digit number" + "Chinese name of a person", and it is stored in the text file.

My problem is: I don't know how to divide this seemingly random variable into the way I want?

As I said, I want to divide each row into "ID1" + "Name1" + "ID2" + "Name2" + "ID3" + "Name3" + "ID4" + "Name4"

As a whole, I wish the data looked like

ID1 Name1 ID2 Name2 ID3 Name3 ID4 Name4

30210027 甲三郎 30210397 乙先天冰 30210069 丙小姐 30210493 丁阿旺

30210044 物阿梅 30210021 己阿有 30240052 更有言 30210346 心一?

...

I've tried three ways

#1 Import

It leads to a result that I wouldn't conclude it as a failure, but it's definitely not a success:

#2 Infile

Some variables are wrong because of the random length of the Chinese name, plus there's no delimiter between each variable.

#3 Adding delimiter, such as ",", by my hand.

For this method, I added a comma between each variable.

It works actually. However, considering the total of around 10000 variables awaits me to process, It may not work that well as I thought...

Therefore, I'm here to ask for your kindly help.

Any suggestions would be helpful.

Thanks in advance!

Ksharp · Posted 09-26-2017 08:51 AM

data have;
infile cards truncover;
input x $80.;
cards;
30210027甲三郎30210397乙先天冰30210069丙小姐30210493丁阿旺
30210044物阿梅30210021己阿有30240052更有言30210346心一?
;
run;
data want;
 set have;
 id1=scan(x,1,,'kd');
 name1=scan(x,1,,'d');
  id2=scan(x,2,,'kd');
 name2=scan(x,2,,'d');
  id3=scan(x,3,,'kd');
 name3=scan(x,3,,'d');
  id4=scan(x,4,,'kd');
 name4=scan(x,4,,'d');
run;

View solution in original post

FredrikE · Posted 09-26-2017 04:13 AM

I would go for the first one "import" which puts it all in one variable - var.

Then i would walk through the dataset and for each line do a loop (do i = 1 to length(var)) and check each position in var, if its a numeric one then add it to a id variable and if it is a character put it as a name.

Usually I always ask for delimiter in the files sent to system, to prevent this kind of logic 🙂

//fredrik

Chung-Li · Posted 09-27-2017 01:46 AM

Fredrik,

Thank you for your help!
Sure, having delimiter can definitely make things easier, but sometimes things just don't go that well.

gamotte · Posted 09-26-2017 04:25 AM

Hello,

You can use a regex to add commas in your file each time a digit is followed by a non-digit

Edit : and vice-versa

Example with sed :

> cat > test << EOF
> 012321akjghjkghj8576543jhgjhgjhg657654654kugjyfj
> 354354354jytjhfhgxchs9687657654jhgjhgjhgj65746576354yrzgredh
> EOF

> sed "s/$[0-9]$$[^0-9]$/$1,$2/g; s/$[^0-9]$$[0-9]$/$1,$2/g" test
01232,kjghjkgh,57654,hgjhgjh,5765465,ugjyfj
35435435,ytjhfhgxch,68765765,hgjhgjhg,574657635,rzgredh

Chung-Li · Posted 09-27-2017 01:53 AM

Thanks Gamotte!
I'll check "regex".

mkeintz · Posted 09-26-2017 08:49 AM

Take the LONG_TEXT variable and change all numbers to blanks, producing a space-separated list of names in NAME_LIST
As you iteratively extract each name, blank it out in LONG_TEXT, leaving a space-separated list of ID's:

data have;
  input long_text $60.;
datalines;
30210027aaaaaaaa30210397bbbb30210069cccc30210493dddd
run;

data want;
  set have;
  name_list=translate(long_text,' ','0123456789');
  array name {6} $10;
  do I=1 to countw(name_list);
    name{I}=scan(name_list,I,' ');
    long_text=transtrn(long_text,trim(name{I}),' ');
  end;
  array id{6};
  do I=1 to countw(long_text);
    id{I}=input(scan(long_text,I,' '),10.);
  end;
  drop name_text long_text;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Chung-Li · Posted 09-27-2017 02:04 AM

Mkeintz,
What a masterpiece this is!
Also, it works really well when I apply it to my data!
I guess now I need to spend some time to study this code.

Thanks!

Ksharp · Posted 09-26-2017 08:51 AM

data have;
infile cards truncover;
input x $80.;
cards;
30210027甲三郎30210397乙先天冰30210069丙小姐30210493丁阿旺
30210044物阿梅30210021己阿有30240052更有言30210346心一?
;
run;
data want;
 set have;
 id1=scan(x,1,,'kd');
 name1=scan(x,1,,'d');
  id2=scan(x,2,,'kd');
 name2=scan(x,2,,'d');
  id3=scan(x,3,,'kd');
 name3=scan(x,3,,'d');
  id4=scan(x,4,,'kd');
 name4=scan(x,4,,'d');
run;

Chung-Li · Posted 09-27-2017 02:06 AM

Ksharp,

This is so understandable for me, and it works!
Thank you so much!

Registration is open

SAS Training: Just a Click Away