Dear all,
I have a dataset that looks like this:
OrderID JourneyPosition Channel
1234 1 D
1234 2 T
1235 1 B
I have sucessfully concatenated the channels into a list item, ordered by OrderID using
data abc.concat;
length Channels $ 200 /* need to be the max */;
set abc.cj2;
by OrderID;
retain Channels;
if first.OrderID then Channels = ChannelShort;
else
Channels=trim(Channels) || ' ' || trim(ChannelShort);
if last.OrderID then output;
run;
How can I achieve the same result as a matrix?
It should look like
OrderID C1 C2
1234 D T
1235 B
Thank you very much!
It looks like the program editor corrupted my previous program. Sorry about that.
I assume that you have or can create a variable that enumerates the observations within each level of the OrderID variable. Then you want to do something like this example. I've indicated where I think your variable names would go:
.
/* Create some sorted data similar to OP's */
proc sort data=sashelp.class out=class;
by sex;
run;
/* add variable that enumerates obs within each group */
data class;
set class;
by sex;
if first.sex then Position = 0;
Position + 1;
run;
/* begin SAS/IML solution */
proc iml;
use class;
read all var {sex} into C; /* =OrderID */
read all var {name} into x; /* =ChannelShort? */
read all var {position}; /* =JourneyPosition? */
close;
/* unique-loc technique:
http://blogs.sas.com/content/iml/2011/11/01/the-unique-loc-trick-a-real-treat.html */
u = unique(C);
s = j(ncol(u), 1+max(position), blankstr(32));
do i = 1 to ncol(u);
idx = loc(C=u[i]);
s[i, 1] = u[i];
s[i, 2:ncol(idx)+1] = rowvec(x[idx]);
end;
print s;
1. Your matrix doesn't reflect the DATA step code. In the DATA You are concatenating strings together, so the matrix would have two columns, not three. The columns would be OrderID and Channel, where Channel is the concatenation of all values of the ShortChannel variable.
2. If the data are in unsorted order, I'd use the UNIQUE-LOC technique to loop over the unique values of the OrderID variable. Use the ROWCAT function to concatentate the values into a single string.
3. If the data are sorted, it would be more efficient to use the UNIQUEBY subroutine.
Here's an example of using the UNIQUE-LOC trick (item 2) to concatenate the NAME variable for each value of the SEX variable in the Sashelp.Class data.
proc iml;
use sashelp.class;
read all var {sex
Sorry for being unclear. I want a matrix that contains n columns, n being the max count of elements in ChannelShort.
I want a new row for each unique OrderID.
And you are right, the data step does something different.
It looks like the program editor corrupted my previous program. Sorry about that.
I assume that you have or can create a variable that enumerates the observations within each level of the OrderID variable. Then you want to do something like this example. I've indicated where I think your variable names would go:
.
/* Create some sorted data similar to OP's */
proc sort data=sashelp.class out=class;
by sex;
run;
/* add variable that enumerates obs within each group */
data class;
set class;
by sex;
if first.sex then Position = 0;
Position + 1;
run;
/* begin SAS/IML solution */
proc iml;
use class;
read all var {sex} into C; /* =OrderID */
read all var {name} into x; /* =ChannelShort? */
read all var {position}; /* =JourneyPosition? */
close;
/* unique-loc technique:
http://blogs.sas.com/content/iml/2011/11/01/the-unique-loc-trick-a-real-treat.html */
u = unique(C);
s = j(ncol(u), 1+max(position), blankstr(32));
do i = 1 to ncol(u);
idx = loc(C=u[i]);
s[i, 1] = u[i];
s[i, 2:ncol(idx)+1] = rowvec(x[idx]);
end;
print s;
That looks perfect!
Though, I am not sure why I am getting this error:
NOTE: Closing WORK.CLASS 62 63 /* unique-loc technique: 64 http://blogs.sas.com/content/iml/2011/11/01/the-unique-loc-trick-a-real-treat.html */ 65 u = unique(C); 66 s = j(ncol(u), 1+max(Position), blankstr(32)); ERROR: (execution) Matrix has not been set to a value.
Printing C and x looks like we're getting there.
Please post your SAS version and your complete log beginning with the "proc iml" statement
You can obtain the SAS version in the SAS log by submitting:
%put &sysvlong;
1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK; 55 56 libname ABC '/folders/myfolders/ABC/'; NOTE: Libref ABC was successfully assigned as follows: Engine: V9 Physical Name: /folders/myfolders/ABC 57 /* add variable that enumerates obs within each group */ 58 data class; 59 set ABC.cj2; 60 by OrderID; 61 if first.OrderID then Position = 0; 62 Position + 1; 63 drop JourneyPosition; 64 run; NOTE: There were 192469 observations read from the data set ABC.CJ2. NOTE: The data set WORK.CLASS has 192469 observations and 3 variables. NOTE: DATA statement used (Total process time): real time 0.03 seconds cpu time 0.02 seconds 65 66 /* begin SAS/IML solution */ 67 proc iml; NOTE: IML Ready 68 use class; 69 read all var {OrderID} into C; 69 ! /* =OrderID */ 70 read all var {ChannelShort} into x; 70 ! /* =ChannelShort? */ 71 close; NOTE: Closing WORK.CLASS 72 73 /* unique-loc technique: 74 http://blogs.sas.com/content/iml/2011/11/01/the-unique-loc-trick-a-real-treat.html */ 75 s = j(ncol(u), 1+max(Position), blankstr(32)); ERROR: (execution) Matrix has not been set to a value. operation : MAX at line 75 column 21 operands : Position Position 0 row 0 col (numeric) statement : ASSIGN at line 75 column 1 76 do i = 1 to ncol(u); 77 idx = loc(C=u[i]); 78 s[i, 1] = u[i]; 79 s[i, 2:ncol(idx)+1] = rowvec(x[idx]); 80 end; 81 82 83 print s; ERROR: Matrix s has not been set to a value. statement : PRINT at line 83 column 1 84 85 %put &sysvlong; 9.04.01M3P062415 86 87 88 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK; 100
You are not reading in the Position variable in the Use/Read/Close statement.
Oh I'm sorry. I tried removing lines of code to find the error.
This is the error I got in the first place:
1 65 66 /* begin SAS/IML solution */ 67 proc iml; NOTE: IML Ready 68 use class; 69 read all var {OrderID} into C; 69 ! /* =OrderID */ 70 read all var {ChannelShort} into x; 70 ! /* =ChannelShort? */ 71 read all var {position}; 72 close; NOTE: Closing WORK.CLASS 73 74 /* unique-loc technique: 75 http://blogs.sas.com/content/iml/2011/11/01/the-unique-loc-trick-a-real-treat.html */ 76 s = j(ncol(u), 1+max(position), blankstr(32)); NOTE: Module BLANKSTR loaded from the storage SASHELP.IMLMLIB. ERROR: (execution) Invalid operand to operation. operation : J at line 76 column 6 operands : _TEM1001, _TEM1003, _TEM1004 _TEM1001 1 row 1 col (numeric) 0 _TEM1003 1 row 1 col (numeric) 767 _TEM1004 1 row 1 col (character, size 32) statement : START at offset 1 column 1 77 do i = 1 to ncol(u); 78 idx = loc(C=u[i]); 79 s[i, 1] = u[i]; 80 s[i, 2:ncol(idx)+1] = rowvec(x[idx]); 81 end; 82 83 84 print s; ERROR: Matrix s has not been set to a value. statement : PRINT at line 84 column 1 85 86 %put &sysvlong; 9.04.01M3P062415 87 88 89 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK; 101
Please think about what you are doing and make an effort to check your program against the one that I gave you. You deleted the line
u = unique(C);
I'm sure that you can solve your problem from here. Good luck.
Sorry about that. Should get some sleep.
Thank you so much! You are my hero! I spent 2 days searching for solutions to my problem, but it seems like I chose the wrong keywords. Should have found your posts earlier.
Thanks again!
You can use START-END INDEX skill I proposed to handle such group operation problem, and could use it to replace the UNIIQYE-LOC skill ,which is a little more complicated and not friendly .
data have;
input OrderID JourneyPosition Channel $;
cards;
1234 1 D
1234 2 T
1235 1 B
;
run;
proc iml;
use have nobs nobs;
read all var{OrderID Channel};
close;
staet_end=t(loc(OrderID^=t({.}||remove(OrderID,nobs))))||
t(loc(OrderID^=t(remove(OrderID,1)||{.}))) ;
id=OrderID[staet_end[,1]];
n=nrow(staet_end);
x=j(n,max((staet_end[,2]-staet_end[,1]))+1,blankstr(nleng(Channel)));
do i=1 to n;
x[i,1:staet_end[i,2]-staet_end[i,1]+1]=t(Channel[staet_end[i,1]:staet_end[i,2]]);
end;
print id x ;
quit;
KSharp's indexing trick is similar to the UNIQUEBY technique that I linked to earlier. If the data are already sorted, you can skip the UNIQUEBY call and go directly to the processing. See also the CUSUM-LAG trick, which gets the indices of each BY group for sorted data.
Rick,
Yeah. But UNIQUEBY() need firstly sort data and can get the first value index for a group after data is ordered. But if data is not ordered? like:
data x;
set x;
by id NOTSORTED ;
Thank you both so much!
I implemented Ksharp's solution, as I have the impression it runs faster.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.