Hi all,
I would like to do the same thing (see below) in PROC IML. I created two variables and output observations based on values of "b".
data test;
input a b;
cards;
1 0
2 1
3 2
4 0
5 1
6 2
7 0
;
run;
data test1 test2 test3;
set test;
if b=0 then output test1;
else if b=1 output test2;
else output test3;
run;
With only two groups, I tried the syntax below and it worked.
proc iml;
a={1, 2, 3, 4, 5, 6, 7}; *a vector with random numbers;
b={0, 1, 0, 0, 1, 0, 1}; *a vector indicating the group membership;
c=t(remove(a,loc(b))); *elements from a when b=0;
d=t(remove(a,loc(element(a,c)))); *elements from a when b=1;
quit;
BUT, I have difficulty if there are three groups such as:
a={1, 2, 3, 4, 5, 6, 7};
b={0, 1, 2, 0, 1, 2, 1};
I can create SAS data set to save those values and then to output to three SAS data sets. But I would like to learn how to achieve this in PROC IML.
Thanks to all!
Two approaches;
1) Use the UNIQUE-LOC technique (Search for "unique-loc" at blogs.sas.com/content/iml)
2) Use the UNIQUEBY approach
Assuming that you are using SAS/IML 12.1 or better, you can construct the data set names inside the DO loop.
For example, here is the UNIQUE-LOC approach:
proc iml;
a={1, 2, 3, 4, 5, 6, 7};
b={0, 1, 2, 0, 1, 2, 1};
u = unique(b);
do i = 1 to ncol(u);
v = a[loc(b=u)];
/* if you want it written to a data set, do the following */
dsname = "test" + strip(char(i));
create (dsname) var {v}; append; close (dsname);
end;
Two approaches;
1) Use the UNIQUE-LOC technique (Search for "unique-loc" at blogs.sas.com/content/iml)
2) Use the UNIQUEBY approach
Assuming that you are using SAS/IML 12.1 or better, you can construct the data set names inside the DO loop.
For example, here is the UNIQUE-LOC approach:
proc iml;
a={1, 2, 3, 4, 5, 6, 7};
b={0, 1, 2, 0, 1, 2, 1};
u = unique(b);
do i = 1 to ncol(u);
v = a[loc(b=u)];
/* if you want it written to a data set, do the following */
dsname = "test" + strip(char(i));
create (dsname) var {v}; append; close (dsname);
end;
Thanks for you quick reply, Rick.
I tried the UNIQUE-LOC approach. This approach does not create three vectors or data sets. In my understanding, v is a temporary vector overwritten every time in the do loop. I want three separate vectors like v1, v2, and v3. I remember I also tried UNIQUEBY last time but I failed. I will try again to see what I will get.
I also tried WHERE clause before. I failed because the length of the categorical variable is not fixed.
Your post says "I would like to do the same thing (see below) in PROC IML." That is what I showed. The code I provided creates three data sets (TEST1, TEST2, and TEST3) that contain the information you asked for.
I might do something wrong then.
I copied and ran your code directly, but got error messages.
186 | create (dsname) var {v}; |
- | |
22 | |
76 | |
ERROR 22-322: Expecting a name.
ERROR 76-322: Syntax error, statement will be ignored.
So I deleted the parenthesis. Then I got
NOTE: The data set WORK.DSNAME has 12 observations and 1 variables.
NOTE: The data set WORK.DSNAME has 12 observations and 1 variables.
NOTE: The data set WORK.DSNAME has 76 observations and 1 variables.
So only one data set was created, although it should be "test1", "test2", and "test3“.
Apparently you are not running SAS/IML 12.1 or better. What appears in the SAS Log when you submit
%put &SYSVLONG;
244 %put &SYSVLONG;
9.03.01M1P110211
You have SAS 9.3, which is about 3 years old. (That would have been good information to provide in your question.) The CREATE and CLOSE statements in your version of SAS do not support data set names that are defined at run time, which is why you are getting an error.
With software this old, I would either upgrade or use Base SAS tools to do the splitting. Do an internet search for "sas split dataset by variable" and you'll find dozens of techniques and macros, such as this page. It's not that hard, but if you get stuck the fine folks at the Base SAS and macro support community can help you over the hurdles.
Thanks, Rick. I will try it on other computers that are installed SAS 9.4. The UNIQUE function works for me so far.
You can also use the WHERE clause to read only certain observations into SAS/IML within a loop. For example, this blog posts shows an example where the read statement is
read all var {a} where (b=i); /* read i_th category */
The downside of this WHERE clause approach is that you have to know the categories in advance and you end up passing through the data multiple times. If the data fit in memory, I'd stick with UNIQUE-LOC. BTW, several useful variations of the UNIQUE-LOC technique are described on pp 60-72 of Statistical Programming with SAS/IML Software.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.