My dataset looks something like this:
Patient | new_group | code | second_code | third_code |
1 | 0 | 96.6 | . | . |
2 | 0 | 96.6 | 96.6 | . |
3 | 0 | 96.6 | 96.6 | 96.6 |
4 | 0 | 96.6 | 78.7 | . |
5 | 0 | 96.6 | 96.6 | 78.7 |
6 | 1 | 78.7 | . | . |
7 | 1 | 78.7 | 78.7 | 56.5 |
8 | 1 | 56.5 | . | . |
9 | 2 | 96.6 | . | . |
10 | 2 | 56.5 | . | . |
I want to create a new dummy group called "policy" which contains only patients from group 0 or 1 and does not have the code 96.6 in any of the columns, for example, I want to omit patients 1-3 because they only contain the code 96.6. However, if a patient has the code 96.6 AND another code like 78.7, 56.5, or any other code that is not 96.6, I want to include it in my "policy" group. This is the code I wrote:
data Final2;
set Final1;
if new_group=0 and code ne "96.6" and second_code ne "." and third_code ne "." then policy=0;
if new_group=0 and code ne "96.6" and second_code ne "96.6" and third_code ne "." then policy=0;
if new_group=0 and code ne "96.6" and second_code ne "96.6" and third_code ne "96.6" then policy=0;
if new_group=1 and code ne "96.6" and second_code ne "." and third_code ne "." then policy=1;
if new_group=1 and code ne "96.6" and second_code ne "96.6" and third_code ne "." then policy=1;
if new_group=1 and code ne "96.6" and second_code ne "96.6" and third_code ne "96.6" then policy=1;
run;
Is this an appropriate code in order to create groups for the dummy variable policy 0 or 1?
data want;
set have;
scr = sum(code, second_code, third_code);
if mod(scr,96.6) then policy = new_group;
run;
I want the patients with new_group=1 to be in policy=1, patients in new_group=0 to be in policy=0 and for both groups, patients who have only 96.6 for columns code, second_code, and third_code cannot be included hence why I would want patients 1,2, and 3 to be omitted as well as patient 9 and 10 when creating my new "policy"group. I want patient 4 and 5 to be in policy=0 and patient 6,7,8 in policy=1. This 96.6 is an actual string/value, not sum. Thus, I am not sure why I might need to use the sum SAS code. Would you mind clarifying?
Expected output should look like this
Patient | Group | Code | Second_code | Third_code | Policy |
1 | 0 | 96.6 | . | . | . |
2 | 0 | 96.6 | 96.6 | . | . |
3 | 0 | 96.6 | 96.6 | 96.6 | . |
4 | 0 | 96.6 | 78.7 | . | 0 |
5 | 0 | 96.6 | 96.6 | 78.7 | 0 |
6 | 1 | 78.7 | . | . | 1 |
7 | 1 | 78.7 | 78.7 | 56.5 | 1 |
8 | 1 | 56.5 | . | . | 1 |
9 | 2 | 96.6 | . | . | . |
10 | 2 | 56.5 | . | . | . |
Thank you beforehand,
A
Please post what you have in usable form: a data step with datalines-statement.
If the code-variables are numeric, don't use quotes!
First thing is post the data in a usable format.
data have ;
input Patient Group Code $ Second_code $ Third_code $ Policy ;
cards;
1 0 96.6 . . .
2 0 96.6 96.6 . .
3 0 96.6 96.6 96.6 .
4 0 96.6 78.7 . 0
5 0 96.6 96.6 78.7 0
6 1 78.7 . . 1
7 1 78.7 78.7 56.5 1
8 1 56.5 . . 1
9 2 96.6 . . .
10 2 56.5 . . .
;
If you want a rule about a group of variables then probably want to use and ARRAY. In this case set the flag variable false and then loop over the array looking for any condition that sets it true. Then use the flag variable in your conditional logic.
data want ;
set have;
array codes code second_code third_code ;
anyx=0;
do i=1 to dim(codes) while(not anyx);
if codes(i) not in ('96.6' ' ') then anyx=1;
end;
if group in (0 1) and anyx then new_policy=group;
drop anyx i ;
run;
Second_ Third_ new_ Obs Patient Group Code code code Policy policy 1 1 0 96.6 . . 2 2 0 96.6 96.6 . . 3 3 0 96.6 96.6 96.6 . . 4 4 0 96.6 78.7 0 0 5 5 0 96.6 96.6 78.7 0 0 6 6 1 78.7 1 1 7 7 1 78.7 78.7 56.5 1 1 8 8 1 56.5 1 1 9 9 2 96.6 . . 10 10 2 56.5 . .
Why doesn't the last row get POLICY set to 2?
how do we handle new_group 2 in the example above?
what is the else condition? or better know as the out of scope records. Do we delete them?
the code below catches all records based on your request besides number 10. Which might be the else records.
data have;
infile datalines dsd;* missover ;
input Patient :$ new_group :$ code second_code third_code @@;
datalines;
1
0
96.6
.
.
2
0
96.6
96.6
.
3
0
96.6
96.6
96.6
4
0
96.6
78.7
.
5
0
96.6
96.6
78.7
6
1
78.7
.
.
7
1
78.7
78.7
56.5
8
1
56.5
.
.
9
2
96.6
.
.
10
2
56.5
.
.
;
data want;
set have;
scr = sum(code, second_code, third_code);
if mod(scr,96.6) then policy = new_group;
run;
Obs | Patient | new_group | code | second_code | third_code | scr | policy |
---|---|---|---|---|---|---|---|
1 | 1 | 0 | 96.6 | . | . | 96.6 | . |
2 | 2 | 0 | 96.6 | 96.6 | . | 193.2 | . |
3 | 3 | 0 | 96.6 | 96.6 | 96.6 | 289.8 | . |
4 | 4 | 0 | 96.6 | 78.7 | . | 175.3 | 0 |
5 | 5 | 0 | 96.6 | 96.6 | 78.7 | 271.9 | 0 |
6 | 6 | 1 | 78.7 | . | . | 78.7 | 1 |
7 | 7 | 1 | 78.7 | 78.7 | 56.5 | 213.9 | 1 |
8 | 8 | 1 | 56.5 | . | . | 56.5 | 1 |
9 | 9 | 2 | 96.6 | . | . | 96.6 | . |
10 | 10 | 2 | 56.5 | . | . | 56.5 | 2 |
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.