- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I want to use case-when logic to condtionally create a new variable - in the example below, I want to assign all observations with value below 10 to the first group an all other to the second group. Can I do it without if-then/else block? I've found the examples of using select statement inside the data step but it does not work as expected - only the first record is assingned properly. What am I doing wrong?
data test;
input var;
datalines;
1
2
3
30
40
50
;
run;
data test;
set test;
select (var);
when (var < 10) target_group = 1;
otherwise target_group = 2;
end;
run;
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The page https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/lestmtsref/p09213s9jc2t99n1vx0omk2rh9ps.htm#p... indicates the correct use.
You should use the code:
select; when (var < 10) target_group = 1; otherwise target_group = 2; end;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The page https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/lestmtsref/p09213s9jc2t99n1vx0omk2rh9ps.htm#p... indicates the correct use.
You should use the code:
select; when (var < 10) target_group = 1; otherwise target_group = 2; end;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Do you have any idea why the intial code produced this strange result (only first observation in the first group)? I could understand if it produced an error because of incorrecet syntax. But it did something strange instead.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
See this article about the SELECT-WHEN logic in the DATA step.
The SELECT-WHEN statement supports two syntaxes. The first is when you are matching a discrete set of values that you know about in advance. For example:
data test2;
set test;
condition = (var < 10);
select (condition);
when (1) target_group = 1;
otherwise target_group = 2;
end;
run;
In the other syntax, you do not specify the name of the variable on the SELECT statement, and you put the logic in the WHEN statement, like this:
data test3;
set test;
select;
when (var < 10) target_group = 1;
otherwise target_group = 2;
end;
run;
Although you can use the SELECT-WHEN statement to create a new variable, you can also use a SAS FORMAT to create the values that you want.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Many thanks for this!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @chris2377,
Such a simple conditional assignment can indeed be written without any additional statements:
data want;
set test;
target_group=2-(var < 10);
run;
The Boolean expression var < 10 is evaluated to 1 if the inequality is true (which includes the case that var contains a missing value!), else to 0. So, two minus the result is exactly the desired value for target_group. This also explains the result of your SELECT statement: the "Boolean" 1 (for TRUE) happened to match your var value 1.
Also note that I used a different output dataset name (WANT) to avoid overwriting the input dataset in case of incorrect logic.