- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Preparations to reproduce my situation
proc format;
value _grp
1 = "Group 1"
2 = "Group 2"
3 = "Group 3";
value _yesno
1 = "Yes"
0 = "No";
run;
/* Simulate an input dataset */
data source (keep = id group_var var:);
call streaminit(123);
attrib id label = "ID"
group_var format = _grp. label = "Categorical group variable";
format var1 - var9 _yesno.;
label id = "ID"
group_var = "Categorical group variable";
format ;
do i = 1 to 1000;
id = i;
/* Simulate data for categorical data. */
group_var = rand("integer", 1, 3); /* Create values "1", "2" or "3". */
array yes_no var1-var9;
do over yes_no;
yes_no = rand("integer", 1, 2) - 1; /* Create values "0" and "1". */
end;
output;
end;
run;
This gives a dataset like this:
Based on the input I create a variable endpoint in dataset have with this code:
data have;
set source (in = a keep = id group_var where = (group_var = 1))
source (in = b keep = id group_var where = (group_var = 2))
source (in = c keep = id group_var where = (group_var = 3))
source (in = d keep = id var1 where = (var1 = 1))
source (in = e keep = id var2 where = (var2 = 1))
source (in = f keep = id var3 where = (var3 = 1))
source (in = g keep = id var4 where = (var4 = 1))
source (in = h keep = id var5 where = (var5 = 1))
source (in = i keep = id var6 where = (var6 = 1))
source (in = j keep = id var7 where = (var7 = 1))
source (in = k keep = id var8 where = (var8 = 1))
source (in = l keep = id var9 where = (var9 = 1));
by id;
if a then endpoint = 1;
if b then endpoint = 2;
if c then endpoint = 3;
if d then endpoint = 4;
if e then endpoint = 5;
if f then endpoint = 6;
if g then endpoint = 7;
if h then endpoint = 8;
if i then endpoint = 9;
if j then endpoint = 10;
if k then endpoint = 11;
if l then endpoint = 12;
run;
The resulting dataset looks like this:
Problem to solve
There has to be a simpler way to code the dataset have.
I want to replace the steps
if <dataset-id> then endpoint = <number-of-dataset>;
A bonus would be to simplify the set statements themselves, but this is not critical to regard this issue as solved.
Solution formats
I am looking for solutions avoiding SQL. I love to hear about it (for general solution), but want to have a dataset solution (hash or DoW-Loop is also acceptable).
What I tried so far
I wanted to translate the character values a - l to numeric values 1 - 12 using the byte() function, so I can query the variables a - l created by 12 in= options of the set statements:
data want_fail_1;
set source (in = a keep = id group_var where = (group_var = 1))
source (in = b keep = id group_var where = (group_var = 2))
source (in = c keep = id group_var where = (group_var = 3))
source (in = d keep = id var1 where = (var1 = 1))
source (in = e keep = id var2 where = (var2 = 1))
source (in = f keep = id var3 where = (var3 = 1))
source (in = g keep = id var4 where = (var4 = 1))
source (in = h keep = id var5 where = (var5 = 1))
source (in = i keep = id var6 where = (var6 = 1))
source (in = j keep = id var7 where = (var7 = 1))
source (in = k keep = id var8 where = (var8 = 1))
source (in = l keep = id var9 where = (var9 = 1));
by id;
length a_to_l $12;
a_to_l = collate(rank("a"));
put a_to_l=;
do z = 1 to vlength(a_to_l);
if vvaluex(byte(96 + z)) then endpoint = z;
*if vname(input(char(a_to_l, z), $1.) then endpoint = z;
end;
stop;
run;
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
So is the goal of this programming to produce the endpoint values, or do you really need var1-var9 with values 1 or missing? For a given ID, such as ID=2, do you really need all 5 rows of output? Do you really need all of the var1-var9 if you have the endpoint values?
Seems to me that an array ought to work. This gets most of what you want. You can change the 0s to missing if necessary.
data have2;
set source(obs=10);
by id;
array v var1-var9;
if first.id then do;
endpoint=group_var;
output;
end;
do i=1 to dim(v);
if v(i)=1 then do;
endpoint=i+3;
output;
end;
end;
drop i;
run;
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
We can't work with data in a screen capture. We can only work with data provided as working SAS data step code, which you can create yourself or follow these instructions.
UPDATE: Please ignore this message, I thought the screen capture was the data, but it is not.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@PaigeMiller I think the input data is generated in the initial data step so you could copy/paste...?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
So is the goal of this programming to produce the endpoint values, or do you really need var1-var9 with values 1 or missing? For a given ID, such as ID=2, do you really need all 5 rows of output? Do you really need all of the var1-var9 if you have the endpoint values?
Seems to me that an array ought to work. This gets most of what you want. You can change the 0s to missing if necessary.
data have2;
set source(obs=10);
by id;
array v var1-var9;
if first.id then do;
endpoint=group_var;
output;
end;
do i=1 to dim(v);
if v(i)=1 then do;
endpoint=i+3;
output;
end;
end;
drop i;
run;
Paige Miller