DATA Step, Macro, Functions and more

Divide a variable into several variables

Accepted Solution Solved
Reply
Contributor
Posts: 45
Accepted Solution

Divide a variable into several variables

Hi there, 

 

When I was processing dataset this morning, I faced a problem.

Let's take a look at my data first:

 

30210027甲三郎30210397乙先天冰30210069丙小姐30210493丁阿旺

30210044物阿梅30210021己阿有30240052更有言30210346心一?

...

 

Basic it's in the pattern of "8-digit number" + "Chinese name of a person", and it is stored in the text file.

My problem is: I don't know how to divide this seemingly random variable into the way I want?

 

As I said, I want to divide each row into "ID1" + "Name1" + "ID2" + "Name2" + "ID3" + "Name3" + "ID4" + "Name4"

As a whole, I wish the data looked like

ID1               Name1   ID2               Name2      ID3              Name3    ID4              Name4

 

30210027    甲三郎    30210397    乙先天冰    30210069    丙小姐    30210493    丁阿旺

30210044    物阿梅    30210021    己阿有        30240052   更有言     30210346    心一?

...

 

I've tried three ways

#1 Import

It leads to a result that I wouldn't conclude it as a failure, but it's definitely not a success:

未命名.png

 

 

#2 Infile

Some variables are wrong because of the random length of the Chinese name, plus there's no delimiter between each variable.

 

#3 Adding delimiter, such as ",", by my hand.

For this method, I added a comma between each variable.

It works actually. However, considering the total of around 10000 variables awaits me to process, It may not work that well as I thought...

 

Therefore, I'm here to ask for your kindly help.

Any suggestions would be helpful.

Thanks in advance! 

 


Accepted Solutions
Solution
‎09-27-2017 01:41 AM
Super User
Posts: 10,618

Re: Divide a variable into several variables

data have;
infile cards truncover;
input x $80.;
cards;
30210027甲三郎30210397乙先天冰30210069丙小姐30210493丁阿旺
30210044物阿梅30210021己阿有30240052更有言30210346心一?
;
run;
data want;
 set have;
 id1=scan(x,1,,'kd');
 name1=scan(x,1,,'d');
  id2=scan(x,2,,'kd');
 name2=scan(x,2,,'d');
  id3=scan(x,3,,'kd');
 name3=scan(x,3,,'d');
  id4=scan(x,4,,'kd');
 name4=scan(x,4,,'d');
run;

View solution in original post


All Replies
Super Contributor
Posts: 342

Re: Divide a variable into several variables

I would go for the first one "import" which puts it all in one variable - var.

Then i would walk through the dataset and for each line do a loop (do i = 1 to length(var)) and check each position in var, if its a numeric one then add it to a id variable and if it is a character put it as a name.

 

Usually I always ask for delimiter in the files sent to system, to prevent this kind of logic Smiley Happy

 

//fredrik

Contributor
Posts: 45

Re: Divide a variable into several variables

Fredrik,

Thank you for your help!
Sure, having delimiter can definitely make things easier, but sometimes things just don't go that well.
Super Contributor
Posts: 320

Re: Divide a variable into several variables

[ Edited ]

Hello,

 

You can use a regex to add commas in your file each time a digit is followed by a non-digit

 Edit : and vice-versa

 

Example with sed :

> cat > test << EOF
> 012321akjghjkghj8576543jhgjhgjhg657654654kugjyfj
> 354354354jytjhfhgxchs9687657654jhgjhgjhgj65746576354yrzgredh
> EOF

 

> sed "s/\([0-9]\)\([^0-9]\)/$1,$2/g; s/\([^0-9]\)\([0-9]\)/$1,$2/g" test
01232,kjghjkgh,57654,hgjhgjh,5765465,ugjyfj
35435435,ytjhfhgxch,68765765,hgjhgjhg,574657635,rzgredh

Contributor
Posts: 45

Re: Divide a variable into several variables

Thanks Gamotte!
I'll check "regex".
Trusted Advisor
Posts: 1,288

Re: Divide a variable into several variables

 

  1. Take the LONG_TEXT variable and change all numbers to blanks, producing  a space-separated list of names in NAME_LIST
  2. As you iteratively extract each name, blank it out in LONG_TEXT, leaving a space-separated list of ID's:

 

data have;
  input long_text $60.;
datalines;
30210027aaaaaaaa30210397bbbb30210069cccc30210493dddd
run;

data want;
  set have;
  name_list=translate(long_text,' ','0123456789');
  array name {6} $10;
  do I=1 to countw(name_list);
    name{I}=scan(name_list,I,' ');
    long_text=transtrn(long_text,trim(name{I}),' ');
  end;
  array id{6};
  do I=1 to countw(long_text);
    id{I}=input(scan(long_text,I,' '),10.);
  end;
drop name_text long_text; run;
Contributor
Posts: 45

Re: Divide a variable into several variables

Mkeintz,
What a masterpiece this is!
Also, it works really well when I apply it to my data!
I guess now I need to spend some time to study this code.

Thanks!
Solution
‎09-27-2017 01:41 AM
Super User
Posts: 10,618

Re: Divide a variable into several variables

data have;
infile cards truncover;
input x $80.;
cards;
30210027甲三郎30210397乙先天冰30210069丙小姐30210493丁阿旺
30210044物阿梅30210021己阿有30240052更有言30210346心一?
;
run;
data want;
 set have;
 id1=scan(x,1,,'kd');
 name1=scan(x,1,,'d');
  id2=scan(x,2,,'kd');
 name2=scan(x,2,,'d');
  id3=scan(x,3,,'kd');
 name3=scan(x,3,,'d');
  id4=scan(x,4,,'kd');
 name4=scan(x,4,,'d');
run;
Contributor
Posts: 45

Re: Divide a variable into several variables

Ksharp,

This is so understandable for me, and it works!
Thank you so much!
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 246 views
  • 6 likes
  • 5 in conversation