Hi All,
I'm trying to delimit a var with different counts of values separated by ','.
I start with a table that looks like this:
CustomerID | Event_List |
123 | 10,12,100,110,121 |
456 | 100,117,121,123,124,000 |
786 | 181, 444, 656 |
The next step I need to do is this delimit the values into different vars. I can use the SCAN function but there are different number counts in each var so I can specify specific positions.
CustomerID | Event_List | Var1 | Var2 | Var3 | Var4 | Var5 | Var6 |
123 | 10,12,100,110,121 | 10 | 12 | 100 | 110 | 121 | |
456 | 100,117,121,123,124,000 | 100 | 117 | 121 | 123 | 124 | 000 |
786 | 181, 444, 656 | 181 | 444 | 656 |
Then I need to transpose them and apply a format (using just ID 456):
CustomerID | Event_List |
456 | 100 |
456 | 117 |
456 | 121 |
456 | 123 |
456 | 124 |
456 | 0 |
CustomerID | Event_List |
456 | Click |
456 | Click2 |
456 | Buy |
456 | Order |
456 | Click Out |
456 | Exit |
Then I need to Concatenate them back together with the format applied.
CustomerID | Event_Path |
456 | Click, Click2, Buy , Order, Click Out , Exit |
Any help will be greatly appreciated!
Actually it doesn't matter. If you need all the intermediate data sets you can separate this data step, but this does it in one shot. If you comment out the OUTPUT statement in the do loop you can get just the last line/data set required.
data have;
infile cards dlm=',' dsd truncover;
informat customerID $3. event_list $50.;
input customerID $ event_list $;
cards;
123, "10, 12, 100, 110, 121"
456, "100, 117, 121, 123, 124, 000"
786, "181, 444, 656"
;
run;
proc format;
value $ event_fmt
'10' = 'Click1'
'12' = 'Click2'
'110' = 'Redirect'
'100' = 'Rocky'
'117' = 'Mountain'
'121' = 'Ice Cream'
'123' = "Is"
'124' = "The"
'000' = "Best"
'181' = "I'm"
'444' = 'incredibly'
'656' = 'tired'
;
run;
data want;
set have;
length sentence $256.;
n=countc(event_list, ",")+1;
do i=1 to n;
word=scan(event_list, i);
sentence = catx(', ', sentence, put(word, $event_fmt.));
*output;
end;
drop event_list;
run;
proc print data=want;
run;
Do you know the maximum amount of words/events ahead of time?
I don't think you need all the steps either..
Hi Reeza,
Unfortunately, I don't.
Actually it doesn't matter. If you need all the intermediate data sets you can separate this data step, but this does it in one shot. If you comment out the OUTPUT statement in the do loop you can get just the last line/data set required.
data have;
infile cards dlm=',' dsd truncover;
informat customerID $3. event_list $50.;
input customerID $ event_list $;
cards;
123, "10, 12, 100, 110, 121"
456, "100, 117, 121, 123, 124, 000"
786, "181, 444, 656"
;
run;
proc format;
value $ event_fmt
'10' = 'Click1'
'12' = 'Click2'
'110' = 'Redirect'
'100' = 'Rocky'
'117' = 'Mountain'
'121' = 'Ice Cream'
'123' = "Is"
'124' = "The"
'000' = "Best"
'181' = "I'm"
'444' = 'incredibly'
'656' = 'tired'
;
run;
data want;
set have;
length sentence $256.;
n=countc(event_list, ",")+1;
do i=1 to n;
word=scan(event_list, i);
sentence = catx(', ', sentence, put(word, $event_fmt.));
*output;
end;
drop event_list;
run;
proc print data=want;
run;
Here's another way, it involves more steps than Reeza as it runs the transpose and then does by processing at the end to get what you need (just remove the Where= in the proc transpose to get all CustomerID instead of just 456). Hope this helps!
data start;
length CustomerID $10 Event_List $50;
input CustomerID $ Event_List $;
datalines;
123 10,12,100,110,121
456 100,117,121,123,124,000
786 181,444,656
;
run;
data merge_data;
length Event_list $10 Event_Char $25;
input Event_List $ Event_Char $;
datalines;
100 Click
117 Click2
121 Buy
123 Order
124 Click Out
000 Exit
;
run;
data count;
set start;
count=countc(Event_List,',')+1;
run;
proc sql noprint;
select max(count) into: max
from count;
quit;
%macro split;
data separate;
set start;
%do i=1 %to &max;
var&i=scan(Event_List,&i,","); %end;
run;
proc transpose data=separate
out=transposed (drop=_name_) prefix=Event_List;
where CustomerID='456';
var %do t=1 %to &max; var&t %end;;
by CustomerID;
run;
%mend;
%split
data num_separate;
set transposed;
Event_List=input(Event_List1,6.);
run;
proc sql;
create table separate_merge as
select A.CustomerID, B.Event_Char as Event_List
from num_separate as A inner join merge_data as B
on (A.Event_List1=B.Event_List);
quit;
data final (drop=Event_List);
retain CustomerID Event_Path;
set separate_merge;
format Event_Path $50.;
by CustomerID;
Event_Path=catx(", ",Event_Path,Event_List);
If Last.CustomerID then output;
run;
Would this work?
data want;
if _n_ = 1 then
do;
retain prx1-prx6;
prx1 = 's/100/Click/';
prx1 = 's/100/Click/';
prx2 = 's/117/Click2/';
prx3 = 's/121/Buy/';
prx4 = 's/123/Order/';
prx5 = 's/124/Click Out/';
prx6 = 's/000/Exit /';
end;
set have;
length Event_Path $ 1024;
Event_Path = Event_List;
Event_Path = prxchange(prx1, -1, Event_Path);
Event_Path = prxchange(prx2, -1, Event_Path);
Event_Path = prxchange(prx3, -1, Event_Path);
Event_Path = prxchange(prx4, -1, Event_Path);
Event_Path = prxchange(prx5, -1, Event_Path);
Event_Path = prxchange(prx6, -1, Event_Path);
drop prx1-prx6;
run;
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.