DATA Step, Macro, Functions and more

Separating Values in a Variable into Different Variables then Concatenating

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 144
Accepted Solution

Separating Values in a Variable into Different Variables then Concatenating

Hi All,

I'm trying to delimit a var with different counts of values separated by ','.

I start with a table that looks like this:

CustomerID

Event_List

123

10,12,100,110,121

456

100,117,121,123,124,000

786

181,  444, 656

The next step I need to do is this delimit the values into different vars.  I can use the SCAN function but there are different number counts in each var so I can specify specific positions. 

CustomerID

Event_List

Var1

Var2

Var3

Var4

Var5

Var6

123

10,12,100,110,121

10

12

100

110

121

456

100,117,121,123,124,000

100

117

121

123

124

000

786

181,  444, 656

181

444

656

Then I need to transpose them and apply a format (using just ID 456):

CustomerID

Event_List

456

100

456

117

456

121

456

123

456

124

456

0

CustomerID

Event_List

456

Click

456

Click2

456

Buy

456

Order

456

Click Out

456

Exit

Then I need to Concatenate them back together with the format applied.

CustomerID

Event_Path

456

Click, Click2, Buy , Order,  Click Out , Exit

Any help will be greatly appreciated! 


Accepted Solutions
Solution
‎07-06-2015 12:37 PM
Super User
Posts: 19,772

Re: Separating Values in a Variable into Different Variables then Concatenating

Actually it doesn't matter. If you need all the intermediate data sets you can separate this data step, but this does it in one shot. If you comment out the OUTPUT statement in the do loop you can get just the last line/data set required.

data have;

infile cards dlm=',' dsd truncover;

informat customerID $3. event_list $50.;

input customerID $ event_list $;

cards;

123, "10, 12, 100, 110, 121"

456, "100, 117, 121, 123, 124, 000"

786, "181, 444, 656"

;

run;

proc format;

value $ event_fmt

'10' = 'Click1'

'12' = 'Click2'

'110' = 'Redirect'

'100' = 'Rocky'

'117' = 'Mountain'

'121' = 'Ice Cream'

'123' = "Is"

'124' = "The"

'000' = "Best"

'181' = "I'm"

'444' = 'incredibly'

'656' = 'tired'

;

run;

data want;

set have;

length sentence $256.;

n=countc(event_list, ",")+1;

do i=1 to n;

word=scan(event_list, i);

sentence = catx(', ', sentence, put(word, $event_fmt.));

*output;

end;

drop event_list;

run;

proc print data=want;

run;

View solution in original post


All Replies
Super User
Posts: 19,772

Re: Separating Values in a Variable into Different Variables then Concatenating

Do you know the maximum amount of words/events ahead of time?

I don't think you need all the steps either..

Frequent Contributor
Posts: 144

Re: Separating Values in a Variable into Different Variables then Concatenating

Hi Reeza,

Unfortunately, I don't.

Solution
‎07-06-2015 12:37 PM
Super User
Posts: 19,772

Re: Separating Values in a Variable into Different Variables then Concatenating

Actually it doesn't matter. If you need all the intermediate data sets you can separate this data step, but this does it in one shot. If you comment out the OUTPUT statement in the do loop you can get just the last line/data set required.

data have;

infile cards dlm=',' dsd truncover;

informat customerID $3. event_list $50.;

input customerID $ event_list $;

cards;

123, "10, 12, 100, 110, 121"

456, "100, 117, 121, 123, 124, 000"

786, "181, 444, 656"

;

run;

proc format;

value $ event_fmt

'10' = 'Click1'

'12' = 'Click2'

'110' = 'Redirect'

'100' = 'Rocky'

'117' = 'Mountain'

'121' = 'Ice Cream'

'123' = "Is"

'124' = "The"

'000' = "Best"

'181' = "I'm"

'444' = 'incredibly'

'656' = 'tired'

;

run;

data want;

set have;

length sentence $256.;

n=countc(event_list, ",")+1;

do i=1 to n;

word=scan(event_list, i);

sentence = catx(', ', sentence, put(word, $event_fmt.));

*output;

end;

drop event_list;

run;

proc print data=want;

run;

Frequent Contributor
Posts: 130

Re: Separating Values in a Variable into Different Variables then Concatenating

Here's another way, it involves more steps than Reeza as it runs the transpose and then does by processing at the end to get what you need (just remove the Where= in the proc transpose to get all CustomerID instead of just 456).  Hope this helps!

data start;

length CustomerID $10 Event_List $50;

input CustomerID $ Event_List $;

datalines;

123 10,12,100,110,121

456 100,117,121,123,124,000

786 181,444,656

;

run;

data merge_data;

length Event_list $10 Event_Char $25;

input Event_List $ Event_Char $;

datalines;

100 Click

117 Click2

121 Buy

123 Order

124 Click Out

000 Exit

;

run;

data count;

set start;

count=countc(Event_List,',')+1;

run;

proc sql noprint;

select max(count) into: max

from count;

quit;

%macro split;

data separate;

set start;

%do i=1 %to &max;

var&i=scan(Event_List,&i,","); %end;

run;

proc transpose data=separate

out=transposed (drop=_name_) prefix=Event_List;

where CustomerID='456';

var %do t=1 %to &max; var&t %end;;

by CustomerID;

run;

%mend;

%split

data num_separate;

set transposed;

Event_List=input(Event_List1,6.);

run;

proc sql;

create table separate_merge as

select A.CustomerID, B.Event_Char as Event_List

from num_separate as A inner join merge_data as B

on (A.Event_List1=B.Event_List);

quit;

data final (drop=Event_List);

retain CustomerID Event_Path;

set separate_merge;

format Event_Path $50.;

by CustomerID;

Event_Path=catx(", ",Event_Path,Event_List);

If Last.CustomerID then output;

run;

PROC Star
Posts: 1,167

Re: Separating Values in a Variable into Different Variables then Concatenating

Would this work?

data want;
   if _n_ = 1 then
      do;
         retain prx1-prx6;
         prx1 = 's/100/Click/';
         prx1 = 's/100/Click/';
         prx2 = 's/117/Click2/';
         prx3 = 's/121/Buy/';
         prx4 = 's/123/Order/';
         prx5 = 's/124/Click Out/';
         prx6 = 's/000/Exit /';
      end;

   set have;
   length Event_Path $ 1024;
   Event_Path = Event_List;
   Event_Path = prxchange(prx1, -1, Event_Path);
   Event_Path = prxchange(prx2, -1, Event_Path);
   Event_Path = prxchange(prx3, -1, Event_Path);
   Event_Path = prxchange(prx4, -1, Event_Path);
   Event_Path = prxchange(prx5, -1, Event_Path);
   Event_Path = prxchange(prx6, -1, Event_Path);
   drop prx1-prx6;
run;

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 252 views
  • 1 like
  • 4 in conversation