BookmarkSubscribeRSS Feed
ara1
Calcite | Level 5

My dataset looks something like this:

Patient

new_group

code

second_code

third_code

1

0

96.6

.

.

2

0

96.6

96.6

.

3

0

96.6

96.6

96.6

4

0

96.6

78.7

.

5

0

96.6

96.6

78.7

6

1

78.7

.

.

7

1

78.7

78.7

56.5

8

1

56.5

.

.

9

2

96.6

.

.

10

2

56.5

.

.

 

I want to create a new dummy group called "policy" which contains only patients from group 0 or 1 and does not have the code 96.6 in any of the columns, for example, I want to omit patients 1-3 because they only contain the code 96.6. However, if a patient has the code 96.6 AND another code like 78.7, 56.5, or any other code that is not 96.6, I want to include it in my "policy" group. This is the code I wrote:

 

data Final2;
set Final1;
if new_group=0 and code ne "96.6" and second_code ne "." and third_code ne "." then policy=0;
if new_group=0 and code ne "96.6" and second_code ne "96.6" and third_code ne "." then policy=0;

if new_group=0 and code ne "96.6" and second_code ne "96.6" and third_code ne "96.6" then policy=0;

if new_group=1 and code ne "96.6" and second_code ne "." and third_code ne "." then policy=1;
if new_group=1 and code ne "96.6" and second_code ne "96.6" and third_code ne "." then policy=1;

if new_group=1 and code ne "96.6" and second_code ne "96.6" and third_code ne "96.6" then policy=1;

run;

 

Is this an appropriate code in order to create groups for the dummy variable policy 0 or 1?

 

 

6 REPLIES 6
VDD
Ammonite | Level 13 VDD
Ammonite | Level 13

 data want;
 	set have;
 	scr = sum(code, second_code, third_code);

   if mod(scr,96.6) then policy = new_group; run;

 

ara1
Calcite | Level 5

I want the patients with new_group=1 to be in policy=1, patients in new_group=0 to be in policy=0 and for both groups, patients who have only 96.6 for columns code, second_code, and third_code cannot be included hence why I would want patients 1,2, and 3 to be omitted as well as patient 9 and 10 when creating my new "policy"group. I want patient 4 and 5 to be in policy=0 and patient 6,7,8 in policy=1. This 96.6 is an actual string/value, not sum. Thus, I am not sure why I might need to use the sum SAS code. Would you mind clarifying?

ara1
Calcite | Level 5

Expected output should look like this

 

Patient

Group

Code

Second_code

Third_code

Policy

1

0

96.6

.

.

.

2

0

96.6

96.6

.

.

3

0

96.6

96.6

96.6

.

4

0

96.6

78.7

.

0

5

0

96.6

96.6

78.7

0

6

1

78.7

.

.

1

7

1

78.7

78.7

56.5

1

8

1

56.5

.

.

1

9

2

96.6

.

.

.

10

2

56.5

.

.

.

 

Thank you beforehand,

A

andreas_lds
Jade | Level 19

Please post what you have in usable form: a data step with datalines-statement.

 

If the code-variables are numeric, don't use quotes!

Tom
Super User Tom
Super User

First thing is post the data in a usable format.

data have ;
 input Patient Group Code $ Second_code $ Third_code $ Policy ;
cards;
1 0 96.6 . . .
2 0 96.6 96.6 . .
3 0 96.6 96.6 96.6 .
4 0 96.6 78.7 . 0
5 0 96.6 96.6 78.7 0
6 1 78.7 . . 1
7 1 78.7 78.7 56.5 1
8 1 56.5 . . 1
9 2 96.6 . . .
10 2 56.5 . . .
;

If you want a rule about a group of variables then probably want to use and ARRAY.  In this case set the flag variable false and then loop over the array looking for any condition that sets it true.  Then use the flag variable in your conditional logic.

data want ;
  set have;
  array codes code second_code third_code ;
  anyx=0;
  do i=1 to dim(codes) while(not anyx);
    if codes(i) not in ('96.6' ' ') then anyx=1;
  end;
  if group in (0 1) and anyx then new_policy=group;
  drop anyx i ;
run;
                                   Second_    Third_               new_
Obs    Patient    Group    Code     code       code     Policy    policy

  1        1        0      96.6                            .         .
  2        2        0      96.6     96.6                   .         .
  3        3        0      96.6     96.6       96.6        .         .
  4        4        0      96.6     78.7                   0         0
  5        5        0      96.6     96.6       78.7        0         0
  6        6        1      78.7                            1         1
  7        7        1      78.7     78.7       56.5        1         1
  8        8        1      56.5                            1         1
  9        9        2      96.6                            .         .
 10       10        2      56.5                            .         .

Why doesn't the last row get POLICY set to 2?

VDD
Ammonite | Level 13 VDD
Ammonite | Level 13

how do we handle new_group 2 in the example above?

what is the else condition?  or better know as the out of scope records.  Do we delete them?

the code below catches all records based on your request besides number 10.  Which might be the else records.

data have;

infile datalines dsd;* missover ;
input Patient :$ new_group :$ code second_code third_code @@;
datalines;
1
0
96.6
.
.
2
0
96.6
96.6
.
3
0
96.6
96.6
96.6
4
0
96.6
78.7
.
5
0
96.6
96.6
78.7
6
1
78.7
.
.
7
1
78.7
78.7
56.5
8
1
56.5
.
.
9
2
96.6
.
.
10
2
56.5
.
.

; data want; set have; scr = sum(code, second_code, third_code);
   if mod(scr,96.6) then policy = new_group; run;

 

Obs Patient new_group code second_code third_code scr policy
1 1 0 96.6 . . 96.6 .
2 2 0 96.6 96.6 . 193.2 .
3 3 0 96.6 96.6 96.6 289.8 .
4 4 0 96.6 78.7 . 175.3 0
5 5 0 96.6 96.6 78.7 271.9 0
6 6 1 78.7 . . 78.7 1
7 7 1 78.7 78.7 56.5 213.9 1
8 8 1 56.5 . . 56.5 1
9 9 2 96.6 . . 96.6 .
10 10 2 56.5 . . 56.5 2

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1836 views
  • 0 likes
  • 4 in conversation