BookmarkSubscribeRSS Feed
esw1
Calcite | Level 5

Hi,

 

I'm trying to parse some text in a particular way. I have a bunch of data where each rows has something like:

000006|load|||||| / 000194|on|794|417|Notes|Region2| / 000244|off|768|383|Notes|Region2| / 000244|on|768|383|Notes|Region3| / 000395|off|700|358|Notes|Region3| /

Sometimes this goes on for longer.

 

Is it possible to split this into a) rows in which I have each pair of on and off in one row with all of the text between | among columns (so it's easy to compute differences between some of the numeric variables for each on/off), and b) different rows after each instance of on and off (so after, for example, Region2| / appears twice)?

4 REPLIES 4
Tom
Super User Tom
Super User

Are you reading from a text file or is this data already in a SAS dataset?  If the later is it in one variable?

Is there a pattern?  Looks like you have groups of 6 or 7 values separated by pipes and the group are separated by slashes.  So something like this:

000006|load|||||| / 
000194|on|794|417|Notes|Region2| / 
000244|off|768|383|Notes|Region2| / 
000244|on|768|383|Notes|Region3| / 
000395|off|700|358|Notes|Region3| /
esw1
Calcite | Level 5

It's already in a SAS dataset loaded from an Excel file. 

The two columns that exist are a unique identifier and a column with all of these entries (can call the variable to_split), with 200 rows (where the text length can be slightly different for each row).

 

But yes, the pattern is that the first row will be the load, and then every set of two rows is an on then an off. 

Tom
Super User Tom
Super User

It is still really hard to tell what you actually have. Please post a data step that creates a few example rows of your starting data. It would also help if you posted a data step that creates the results you want from that example.

 

It sounds like you have two variables and you want to convert it into 7 variables instead. SCAN() is probably what you want to use.  You didn't say what your variables are called or what you want the variables to be called.  So let's assume they are named ID and STRING.

data want;
  set have;
  length var1 $8 var2 $8 var3 var4 8 var5 $100 var6 $20 ;
   var1=scan(string,1,'|','mq');
   var2=scan(string,2,'|','mq');
   var3=input(scan(string,3,'|','mq'),32.);
   var4=input(scan(string,4,'|','mq'),32.);
   var5=scan(string,5,'|','mq');
   var6=scan(string,6,'|','mq');
run;
esw1
Calcite | Level 5

My apologies for the confusion--and thank you for your patience. I imported the data from Excel, but to give an idea of what it would look like for the first rows (noting the datalines are messed up by the string):

 

data help;

infile datalines dlm=',';

input ID string;

datalines;

1234,000004|load|||||| / 001135|on|974|544|Notes|Region2| / 001159|off|950|516|Notes|Region2| / 001160|on|950|516|Notes|Region4| / 001177|off|927|489|Notes|Region4| / 001178|on|927|489|Notes|Region1| / 001183|off|915|475|Notes|Region1|

2345,000006|load|||||| / 002003|on|897|562|Notes|Region3| / 002019|off|886|546|Notes|Region3| / 002021|on|886|546|Notes|Region2| / 002042|off|873|526|Notes|Region2| / 002044|on|873|526|Notes|Region4| / 002130|off|858|501|Notes|Region4| /

3456,000007|load|||||| / 000490|on|784|575|Notes|Region3| / 000524|off|762|537|Notes|Region3| / 000525|on|762|537|Notes|Region2| / 000890|off|734|538|Notes|Region2| /

4567,000007|load|||||| / 000864|on|1014|552|Notes|Region2| / 001047|off|1016|541|Notes|Region2| /

;

run;

 

 

Yes--and thank you, that works for splitting each section (e.g., 001135|on|974|544|Notes|Region2|) into separate variables. I had also split up the preceding text string into separate rows at the / using:

data want;set have ;

do i=1 by 1 while(scan(oldstring,i,'/')^=' ');

string=scan(oldstring,i,'/');

output;

end;

run;

 

My apologies for any 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 622 views
  • 0 likes
  • 2 in conversation