BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
pinkyc
Calcite | Level 5

s I think this is as much a logic question as SAS, but suppose input data looks like this:

Pat_ID     Pat_stat     Charge_code     Discharge

AA          10               11                         9/22/2012

BB          30               11                         5/11/2012

CC          10               55                         12/13/2012

DD           20               11                         6/4/2012

EE          30               55                         12/3/2012

The observations where pat_stat = 30, where charge_code <> 11 and Discharge after 12/1/2012 are to be selected.  However, an observation could have one, two or all three of those criteria (Pat_ID EE).  What is the best way to find the exact records that have these criteria but also determine if they overlap?  Such as, datasets that include ones with just pat_stat = 30, charge_code <> 11, or discharge > 12/1/2012 but also can pinpoint which ones have two of those criteria or three. For example, if I specify if charge_code <> 11, EE is included but it also has pat_stat = 30.  But I want to select it such that I can tell EE also has pat_stat = 30 and it has multiple criteria.

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Code to read the data, create the flag and then print the results:

data have;
informat discharge mmddyy10.;
input Pat_ID  $   Pat_stat     Charge_code     Discharge ;
flag = cats( (pat_stat=30), (charge_code NE 11),(discharge > '01DEC2012'd) );
format discharge mmddyy10.;
datalines;
AA          10               11                         9/22/2012
BB          30               11                         5/11/2012
CC          10               55                         12/13/2012
DD          20               11                         6/4/2012
EE          30               55                         12/3/2012
;
run;

proc print data=have;
var Pat_ID  Pat_stat     Charge_code     Discharge   flag;
run;

Results: (minus the column headers cut by the forum)

1AA101109/22/2012000
2BB301105/11/2012100
3CC105512/13/2012011
4DD201106/04/2012000
5EE305512/03/2012111

So the flag on the first row: Pat_stat is not 30 so the first position of the flag is 0 for false

charge_code IS 11 (not <>) so is also 0 and the date is not greate than 12/01/2012 so 0.

In the second row the Pat_stat is 30 so the first position 1 for true and the remainder are 0 as both are false

In the third row charge_code <> 11 so the 2nd position is 1 and the date is larger so it is true.

This type of flag requires that you constantly be aware of the meaning for each position.

An additional exercise for the interested reader is to build a custom format for each value of flag and then use that format for display to describe what is going one.

An advantage AND weakness to this approach: You can have a lot of comparisons represented, 2^n where n is the number of comparisons. So you can build something hard to interpret quickly. Also if you don't specify a length for the flag variable you'll get a 200 character default.

A third approach would be three separate flags coded 0/1.Proc summary with the flags as class variables could get you a count of the combinations of 1,2 or 3 taken at a time.

View solution in original post

7 REPLIES 7
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Sorry, not following your logic, could you provide an example required output?

pinkyc
Calcite | Level 5

Ballard got the gist of what I was aiming for, though I was hoping there'd be an easier way around the logic but I suppose there isn't a shortcut.  But to give an example:

Dataset pat_stat = 30 ONLY

Pat_ID

BB

EE

Dataset charge_code <> 11 ONLY

Pat_ID

CC

EE

Dataset Pat_stat = 30 AND charge_code <> 11

Pat_ID

EE

Basically, EE would appear in each three of these specifications, but it's redundant in the pat_stat = 30 only and charge_code <> 11 only outputs.  Since there are quite a few observations, I suppose I'll just have to do successive and/or statements to narrow it down.

ballardw
Super User

As I mentioned, the approach could depend on what you are doing next. How are you going to use the inclusion/exclusion?

Another approach is to create a text flag variable that would contain a series of 0 and 1 to summarize the conditions met.

Something like:

flag = cats( (pat_stat=30), (charge_code NE 11),(discharge > '01DEC2012'd) );

if all are true then Flag looks like 111, none 000, only the first true 100. You could then run frequencies on Flag for distributions and counts, use the Flag to select subsets such as:

where flag in ('010','011') .

pinkyc
Calcite | Level 5

Yes, thank you I was just clarifying my question to answer RW9.

For my purposes, nothing further with data analysis but creating a report where I can quantify how many have 1, 2 or all 3 criteria.  Right now I can only say x have 1, y have 2 and z have 3 without being able to actually give detail on how many have 1 or 2 or 3.

The flag approach is just what I was looking for, I will try it out thanks!

pinkyc
Calcite | Level 5

I was wondering if you could give an example on how to actually set/call create the flag variable for those conditions?  I did use the snippet you gave but for some reason I don't think I'm using it correctly.

ballardw
Super User

Code to read the data, create the flag and then print the results:

data have;
informat discharge mmddyy10.;
input Pat_ID  $   Pat_stat     Charge_code     Discharge ;
flag = cats( (pat_stat=30), (charge_code NE 11),(discharge > '01DEC2012'd) );
format discharge mmddyy10.;
datalines;
AA          10               11                         9/22/2012
BB          30               11                         5/11/2012
CC          10               55                         12/13/2012
DD          20               11                         6/4/2012
EE          30               55                         12/3/2012
;
run;

proc print data=have;
var Pat_ID  Pat_stat     Charge_code     Discharge   flag;
run;

Results: (minus the column headers cut by the forum)

1AA101109/22/2012000
2BB301105/11/2012100
3CC105512/13/2012011
4DD201106/04/2012000
5EE305512/03/2012111

So the flag on the first row: Pat_stat is not 30 so the first position of the flag is 0 for false

charge_code IS 11 (not <>) so is also 0 and the date is not greate than 12/01/2012 so 0.

In the second row the Pat_stat is 30 so the first position 1 for true and the remainder are 0 as both are false

In the third row charge_code <> 11 so the 2nd position is 1 and the date is larger so it is true.

This type of flag requires that you constantly be aware of the meaning for each position.

An additional exercise for the interested reader is to build a custom format for each value of flag and then use that format for display to describe what is going one.

An advantage AND weakness to this approach: You can have a lot of comparisons represented, 2^n where n is the number of comparisons. So you can build something hard to interpret quickly. Also if you don't specify a length for the flag variable you'll get a 200 character default.

A third approach would be three separate flags coded 0/1.Proc summary with the flags as class variables could get you a count of the combinations of 1,2 or 3 taken at a time.

ballardw
Super User

Sometimes this could depend on what you are doing next. The basic bit will be understanding the logical OR and AND constructions.

When working with dates though it really helps if your dates are actually SAS date values and not some text value. Otherwise comparisons get real ugly.

If you want all of a condition to be true use AND. You can select using WHERE statements in most procedures or a WHERE dataset option.

For example to print selected records:

proc print data=have;

     where pat_stat=30 AND charge_code NE 11 and discharge > '01DEC2012'd; /*NOTE the SAS construct for a date literal. Also if the charge_code is a character variable then in needs quotes NE '11'. Comparisons with text values other than = also have some tricks as '3' is likely to be treated as greater than '11' but '100' wouldn't.*/

     /* this would print just the records where all 3 conditions are true I use NE for "not equal" as there a multiple character options but this is easier for me to type*/

proc print data=have;

     where pat_stat=30 OR charge_code NE 11 OR discharge > '01DEC2012'd; /* would print the records where any of the conditions is true*/

You can group using () to control order of comparison

proc print data=have;

     where (pat_stat=30 OR charge_code NE 11) AND discharge > '01DEC2012'd; /* would print the records where either of the first two conditions is true and the 3rd must be true.

Also with SAS there is a comparison that lets you select a specific list for comparison(values not variables)

where pat_stat in (30, 10)

The logic for negating is a bit trickier and a good logic reference is in order.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 3635 views
  • 3 likes
  • 3 in conversation