BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
BalassaSamuelso
Obsidian | Level 7

Dear all,

 

I have a dataset that looks like this:

 

OrderID    JourneyPosition     Channel

1234         1                             D

1234         2                             T

1235         1                             B

 

I have sucessfully concatenated the channels into a list item, ordered by OrderID using

data abc.concat;
length Channels $ 200 /* need to be the max */;
set abc.cj2;
by OrderID;

retain Channels;

if first.OrderID then Channels = ChannelShort;
else
Channels=trim(Channels) || ' ' || trim(ChannelShort);


if last.OrderID then output;
run;

How can I achieve the same result as a matrix?

It should look like

 

OrderID   C1    C2

1234        D     T

1235        B

 

Thank you very much!

 

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

It looks like the program editor corrupted my previous program. Sorry about that.

 

I assume that you have or can create a variable that enumerates the observations within each level of the OrderID variable.  Then you want to do something like this example. I've indicated where I think your variable names would go:

.

/* Create some sorted data similar to OP's */
proc sort data=sashelp.class out=class;
by sex;
run;
/* add variable that enumerates obs within each group */
data class;
set class;
by sex;
if first.sex then Position = 0;
Position + 1;
run;

/* begin SAS/IML solution */
proc iml;
use class; 
   read all var {sex} into C;    /* =OrderID */
   read all var {name} into x;   /* =ChannelShort? */ 
   read all var {position};      /* =JourneyPosition? */
close;

/* unique-loc technique: 
   http://blogs.sas.com/content/iml/2011/11/01/the-unique-loc-trick-a-real-treat.html */ 
u = unique(C);  
s = j(ncol(u), 1+max(position), blankstr(32)); 
do i = 1 to ncol(u);   
   idx = loc(C=u[i]);  
   s[i, 1] = u[i];
   s[i, 2:ncol(idx)+1] = rowvec(x[idx]);
end;

print s;

View solution in original post

14 REPLIES 14
Rick_SAS
SAS Super FREQ

1. Your matrix doesn't reflect the DATA step code. In the DATA You are concatenating strings together, so the matrix would have two columns, not three. The columns would be OrderID and Channel, where Channel is the concatenation of all values of the ShortChannel variable.

2. If the data are in unsorted order, I'd use the UNIQUE-LOC technique to loop over the unique values of the OrderID variable. Use the ROWCAT function to concatentate the values into a single string.

3. If the data  are sorted, it would be more efficient to use the UNIQUEBY subroutine.

 

Here's an example of using the UNIQUE-LOC trick (item 2) to concatenate the NAME variable for each value of the SEX variable in the Sashelp.Class data.

 


proc iml;
use sashelp.class;
read all var {sex

 

BalassaSamuelso
Obsidian | Level 7

Sorry for being unclear. I want a matrix that contains n columns, n being the max count of elements in ChannelShort. 

I want a new row for each unique OrderID.

 

And you are right, the data step does something different. 

Rick_SAS
SAS Super FREQ

It looks like the program editor corrupted my previous program. Sorry about that.

 

I assume that you have or can create a variable that enumerates the observations within each level of the OrderID variable.  Then you want to do something like this example. I've indicated where I think your variable names would go:

.

/* Create some sorted data similar to OP's */
proc sort data=sashelp.class out=class;
by sex;
run;
/* add variable that enumerates obs within each group */
data class;
set class;
by sex;
if first.sex then Position = 0;
Position + 1;
run;

/* begin SAS/IML solution */
proc iml;
use class; 
   read all var {sex} into C;    /* =OrderID */
   read all var {name} into x;   /* =ChannelShort? */ 
   read all var {position};      /* =JourneyPosition? */
close;

/* unique-loc technique: 
   http://blogs.sas.com/content/iml/2011/11/01/the-unique-loc-trick-a-real-treat.html */ 
u = unique(C);  
s = j(ncol(u), 1+max(position), blankstr(32)); 
do i = 1 to ncol(u);   
   idx = loc(C=u[i]);  
   s[i, 1] = u[i];
   s[i, 2:ncol(idx)+1] = rowvec(x[idx]);
end;

print s;
BalassaSamuelso
Obsidian | Level 7

That looks perfect! 

 

Though, I am not sure why I am getting this error:

 

 

 NOTE: Closing WORK.CLASS
 62         
 63         /* unique-loc technique:
 64            http://blogs.sas.com/content/iml/2011/11/01/the-unique-loc-trick-a-real-treat.html */
 65         u = unique(C);
 66         s = j(ncol(u), 1+max(Position), blankstr(32));
 ERROR: (execution) Matrix has not been set to a value.

 

Printing C and x looks like we're getting there.

 

Rick_SAS
SAS Super FREQ

Please post your SAS version and your complete log beginning with the "proc iml" statement

You can obtain the SAS version in the SAS log by submitting:


%put &sysvlong;

BalassaSamuelso
Obsidian | Level 7
 1          OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 55         
 56         libname ABC '/folders/myfolders/ABC/';
 NOTE: Libref ABC was successfully assigned as follows: 
       Engine:        V9 
       Physical Name: /folders/myfolders/ABC
 57         /* add variable that enumerates obs within each group */
 58         data class;
 59         set ABC.cj2;
 60         by OrderID;
 61         if first.OrderID then Position = 0;
 62         Position + 1;
 63         drop JourneyPosition;
 64         run;
 
 NOTE: There were 192469 observations read from the data set ABC.CJ2.
 NOTE: The data set WORK.CLASS has 192469 observations and 3 variables.
 NOTE: DATA statement used (Total process time):
       real time           0.03 seconds
       cpu time            0.02 seconds
       
 
 65         
 66         /* begin SAS/IML solution */
 67         proc iml;
 NOTE: IML Ready
 68         use class;
 69            read all var {OrderID} into C;
 69       !                                      /* =OrderID */
 70            read all var {ChannelShort} into x;
 70       !                                          /* =ChannelShort? */
 71         close;
 NOTE: Closing WORK.CLASS
 72         
 73         /* unique-loc technique:
 74            http://blogs.sas.com/content/iml/2011/11/01/the-unique-loc-trick-a-real-treat.html */
 75         s = j(ncol(u), 1+max(Position), blankstr(32));
 ERROR: (execution) Matrix has not been set to a value.
 
  operation : MAX at line 75 column 21
  operands  : Position
 
 Position      0 row       0 col     (numeric)
 
 
  statement : ASSIGN at line 75 column 1
 76         do i = 1 to ncol(u);
 77            idx = loc(C=u[i]);
 78            s[i, 1] = u[i];
 79            s[i, 2:ncol(idx)+1] = rowvec(x[idx]);
 80         end;
 81         
 82         
 83         print s;
 ERROR: Matrix s has not been set to a value.
 
  statement : PRINT at line 83 column 1
 84         
 85         %put &sysvlong;
 9.04.01M3P062415
 86         
 87         
 88         OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 100   
Rick_SAS
SAS Super FREQ

You are not reading in the Position variable in the Use/Read/Close statement.

BalassaSamuelso
Obsidian | Level 7

Oh I'm sorry. I tried removing lines of code to find the error. 

 

This is the error I got in the first place:

 

 1 
 65         
 66         /* begin SAS/IML solution */
 67         proc iml;
 NOTE: IML Ready
 68         use class;
 69            read all var {OrderID} into C;
 69       !                                      /* =OrderID */
 70            read all var {ChannelShort} into x;
 70       !                                          /* =ChannelShort? */
 71            read all var {position};
 72         close;
 NOTE: Closing WORK.CLASS
 73         
 74         /* unique-loc technique:
 75            http://blogs.sas.com/content/iml/2011/11/01/the-unique-loc-trick-a-real-treat.html */
 76         s = j(ncol(u), 1+max(position), blankstr(32));
 NOTE: Module BLANKSTR loaded from the storage SASHELP.IMLMLIB.
 ERROR: (execution) Invalid operand to operation.
 
  operation : J at line 76 column 6
  operands  : _TEM1001, _TEM1003, _TEM1004
 
 _TEM1001      1 row       1 col     (numeric)
 
          0
 
 _TEM1003      1 row       1 col     (numeric)
 
        767
 
 _TEM1004      1 row       1 col     (character, size 32)
 
                                  
 
  statement : START at offset   1 column   1
 77         do i = 1 to ncol(u);
 78            idx = loc(C=u[i]);
 79            s[i, 1] = u[i];
 80            s[i, 2:ncol(idx)+1] = rowvec(x[idx]);
 81         end;
 82         
 83         
 84         print s;
 ERROR: Matrix s has not been set to a value.
 
  statement : PRINT at line 84 column 1
 85         
 86         %put &sysvlong;
 9.04.01M3P062415
 87         
 88         
 89         OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 101 
Rick_SAS
SAS Super FREQ

Please think about what you are doing and make an effort to check your program against the one that I gave you. You deleted the line

u = unique(C);

 

I'm sure that you can solve your problem from here. Good luck.

BalassaSamuelso
Obsidian | Level 7

Sorry about that. Should get some sleep. 

 

Thank you so much! You are my hero! I spent 2 days searching for solutions to my problem, but it seems like I chose the wrong keywords. Should have found your posts earlier.

 

Thanks again!

Ksharp
Super User

You can use START-END INDEX skill I proposed to handle such group operation problem, and could use it to replace the UNIIQYE-LOC skill ,which is a little more complicated and not friendly  .

 

data have;
input OrderID    JourneyPosition     Channel $;
cards;
1234         1                             D
1234         2                             T
1235         1                             B
;
run;
proc iml;
use have nobs nobs;
read all var{OrderID  Channel};
close;

staet_end=t(loc(OrderID^=t({.}||remove(OrderID,nobs))))||
          t(loc(OrderID^=t(remove(OrderID,1)||{.}))) ;
id=OrderID[staet_end[,1]];
n=nrow(staet_end);
x=j(n,max((staet_end[,2]-staet_end[,1]))+1,blankstr(nleng(Channel)));
do i=1 to n;
 x[i,1:staet_end[i,2]-staet_end[i,1]+1]=t(Channel[staet_end[i,1]:staet_end[i,2]]);
end;

print id x ;
quit;
Rick_SAS
SAS Super FREQ

KSharp's indexing trick is similar to the UNIQUEBY technique that I linked to earlier. If the data are already sorted, you can skip the UNIQUEBY call and go directly to the processing. See also the CUSUM-LAG trick, which gets the indices of each BY group for sorted data. 

Ksharp
Super User

Rick,

Yeah. But UNIQUEBY() need firstly sort data and can get the first value index for a group  after data is ordered. But if data is not ordered? like:

data x;

 set x;

 by id NOTSORTED ;

BalassaSamuelso
Obsidian | Level 7

Thank you both so much! 

I implemented Ksharp's solution, as I have the impression it runs faster.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

From The DO Loop
Want more? Visit our blog for more articles like these.
Discussion stats
  • 14 replies
  • 10692 views
  • 2 likes
  • 3 in conversation