Preparations to reproduce my situation
proc format;
value _grp
1 = "Group 1"
2 = "Group 2"
3 = "Group 3";
value _yesno
1 = "Yes"
0 = "No";
run;
/* Simulate an input dataset */
data source (keep = id group_var var:);
call streaminit(123);
attrib id label = "ID"
group_var format = _grp. label = "Categorical group variable";
format var1 - var9 _yesno.;
label id = "ID"
group_var = "Categorical group variable";
format ;
do i = 1 to 1000;
id = i;
/* Simulate data for categorical data. */
group_var = rand("integer", 1, 3); /* Create values "1", "2" or "3". */
array yes_no var1-var9;
do over yes_no;
yes_no = rand("integer", 1, 2) - 1; /* Create values "0" and "1". */
end;
output;
end;
run;
This gives a dataset like this:
Based on the input I create a variable endpoint in dataset have with this code:
data have;
set source (in = a keep = id group_var where = (group_var = 1))
source (in = b keep = id group_var where = (group_var = 2))
source (in = c keep = id group_var where = (group_var = 3))
source (in = d keep = id var1 where = (var1 = 1))
source (in = e keep = id var2 where = (var2 = 1))
source (in = f keep = id var3 where = (var3 = 1))
source (in = g keep = id var4 where = (var4 = 1))
source (in = h keep = id var5 where = (var5 = 1))
source (in = i keep = id var6 where = (var6 = 1))
source (in = j keep = id var7 where = (var7 = 1))
source (in = k keep = id var8 where = (var8 = 1))
source (in = l keep = id var9 where = (var9 = 1));
by id;
if a then endpoint = 1;
if b then endpoint = 2;
if c then endpoint = 3;
if d then endpoint = 4;
if e then endpoint = 5;
if f then endpoint = 6;
if g then endpoint = 7;
if h then endpoint = 8;
if i then endpoint = 9;
if j then endpoint = 10;
if k then endpoint = 11;
if l then endpoint = 12;
run;
The resulting dataset looks like this:
Problem to solve
There has to be a simpler way to code the dataset have.
I want to replace the steps
if <dataset-id> then endpoint = <number-of-dataset>;
A bonus would be to simplify the set statements themselves, but this is not critical to regard this issue as solved.
Solution formats
I am looking for solutions avoiding SQL. I love to hear about it (for general solution), but want to have a dataset solution (hash or DoW-Loop is also acceptable).
What I tried so far
I wanted to translate the character values a - l to numeric values 1 - 12 using the byte() function, so I can query the variables a - l created by 12 in= options of the set statements:
data want_fail_1;
set source (in = a keep = id group_var where = (group_var = 1))
source (in = b keep = id group_var where = (group_var = 2))
source (in = c keep = id group_var where = (group_var = 3))
source (in = d keep = id var1 where = (var1 = 1))
source (in = e keep = id var2 where = (var2 = 1))
source (in = f keep = id var3 where = (var3 = 1))
source (in = g keep = id var4 where = (var4 = 1))
source (in = h keep = id var5 where = (var5 = 1))
source (in = i keep = id var6 where = (var6 = 1))
source (in = j keep = id var7 where = (var7 = 1))
source (in = k keep = id var8 where = (var8 = 1))
source (in = l keep = id var9 where = (var9 = 1));
by id;
length a_to_l $12;
a_to_l = collate(rank("a"));
put a_to_l=;
do z = 1 to vlength(a_to_l);
if vvaluex(byte(96 + z)) then endpoint = z;
*if vname(input(char(a_to_l, z), $1.) then endpoint = z;
end;
stop;
run;
So is the goal of this programming to produce the endpoint values, or do you really need var1-var9 with values 1 or missing? For a given ID, such as ID=2, do you really need all 5 rows of output? Do you really need all of the var1-var9 if you have the endpoint values?
Seems to me that an array ought to work. This gets most of what you want. You can change the 0s to missing if necessary.
data have2;
set source(obs=10);
by id;
array v var1-var9;
if first.id then do;
endpoint=group_var;
output;
end;
do i=1 to dim(v);
if v(i)=1 then do;
endpoint=i+3;
output;
end;
end;
drop i;
run;
We can't work with data in a screen capture. We can only work with data provided as working SAS data step code, which you can create yourself or follow these instructions.
UPDATE: Please ignore this message, I thought the screen capture was the data, but it is not.
@PaigeMiller I think the input data is generated in the initial data step so you could copy/paste...?
So is the goal of this programming to produce the endpoint values, or do you really need var1-var9 with values 1 or missing? For a given ID, such as ID=2, do you really need all 5 rows of output? Do you really need all of the var1-var9 if you have the endpoint values?
Seems to me that an array ought to work. This gets most of what you want. You can change the 0s to missing if necessary.
data have2;
set source(obs=10);
by id;
array v var1-var9;
if first.id then do;
endpoint=group_var;
output;
end;
do i=1 to dim(v);
if v(i)=1 then do;
endpoint=i+3;
output;
end;
end;
drop i;
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.