s I think this is as much a logic question as SAS, but suppose input data looks like this:
Pat_ID Pat_stat Charge_code Discharge
AA 10 11 9/22/2012
BB 30 11 5/11/2012
CC 10 55 12/13/2012
DD 20 11 6/4/2012
EE 30 55 12/3/2012
The observations where pat_stat = 30, where charge_code <> 11 and Discharge after 12/1/2012 are to be selected. However, an observation could have one, two or all three of those criteria (Pat_ID EE). What is the best way to find the exact records that have these criteria but also determine if they overlap? Such as, datasets that include ones with just pat_stat = 30, charge_code <> 11, or discharge > 12/1/2012 but also can pinpoint which ones have two of those criteria or three. For example, if I specify if charge_code <> 11, EE is included but it also has pat_stat = 30. But I want to select it such that I can tell EE also has pat_stat = 30 and it has multiple criteria.
Code to read the data, create the flag and then print the results:
data have;
informat discharge mmddyy10.;
input Pat_ID $ Pat_stat Charge_code Discharge ;
flag = cats( (pat_stat=30), (charge_code NE 11),(discharge > '01DEC2012'd) );
format discharge mmddyy10.;
datalines;
AA 10 11 9/22/2012
BB 30 11 5/11/2012
CC 10 55 12/13/2012
DD 20 11 6/4/2012
EE 30 55 12/3/2012
;
run;
proc print data=have;
var Pat_ID Pat_stat Charge_code Discharge flag;
run;
Results: (minus the column headers cut by the forum)
1 | AA | 10 | 11 | 09/22/2012 | 000 |
---|---|---|---|---|---|
2 | BB | 30 | 11 | 05/11/2012 | 100 |
3 | CC | 10 | 55 | 12/13/2012 | 011 |
4 | DD | 20 | 11 | 06/04/2012 | 000 |
5 | EE | 30 | 55 | 12/03/2012 | 111 |
So the flag on the first row: Pat_stat is not 30 so the first position of the flag is 0 for false
charge_code IS 11 (not <>) so is also 0 and the date is not greate than 12/01/2012 so 0.
In the second row the Pat_stat is 30 so the first position 1 for true and the remainder are 0 as both are false
In the third row charge_code <> 11 so the 2nd position is 1 and the date is larger so it is true.
This type of flag requires that you constantly be aware of the meaning for each position.
An additional exercise for the interested reader is to build a custom format for each value of flag and then use that format for display to describe what is going one.
An advantage AND weakness to this approach: You can have a lot of comparisons represented, 2^n where n is the number of comparisons. So you can build something hard to interpret quickly. Also if you don't specify a length for the flag variable you'll get a 200 character default.
A third approach would be three separate flags coded 0/1.Proc summary with the flags as class variables could get you a count of the combinations of 1,2 or 3 taken at a time.
Sorry, not following your logic, could you provide an example required output?
Ballard got the gist of what I was aiming for, though I was hoping there'd be an easier way around the logic but I suppose there isn't a shortcut. But to give an example:
Dataset pat_stat = 30 ONLY
Pat_ID
BB
EE
Dataset charge_code <> 11 ONLY
Pat_ID
CC
EE
Dataset Pat_stat = 30 AND charge_code <> 11
Pat_ID
EE
Basically, EE would appear in each three of these specifications, but it's redundant in the pat_stat = 30 only and charge_code <> 11 only outputs. Since there are quite a few observations, I suppose I'll just have to do successive and/or statements to narrow it down.
As I mentioned, the approach could depend on what you are doing next. How are you going to use the inclusion/exclusion?
Another approach is to create a text flag variable that would contain a series of 0 and 1 to summarize the conditions met.
Something like:
flag = cats( (pat_stat=30), (charge_code NE 11),(discharge > '01DEC2012'd) );
if all are true then Flag looks like 111, none 000, only the first true 100. You could then run frequencies on Flag for distributions and counts, use the Flag to select subsets such as:
where flag in ('010','011') .
Yes, thank you I was just clarifying my question to answer RW9.
For my purposes, nothing further with data analysis but creating a report where I can quantify how many have 1, 2 or all 3 criteria. Right now I can only say x have 1, y have 2 and z have 3 without being able to actually give detail on how many have 1 or 2 or 3.
The flag approach is just what I was looking for, I will try it out thanks!
I was wondering if you could give an example on how to actually set/call create the flag variable for those conditions? I did use the snippet you gave but for some reason I don't think I'm using it correctly.
Code to read the data, create the flag and then print the results:
data have;
informat discharge mmddyy10.;
input Pat_ID $ Pat_stat Charge_code Discharge ;
flag = cats( (pat_stat=30), (charge_code NE 11),(discharge > '01DEC2012'd) );
format discharge mmddyy10.;
datalines;
AA 10 11 9/22/2012
BB 30 11 5/11/2012
CC 10 55 12/13/2012
DD 20 11 6/4/2012
EE 30 55 12/3/2012
;
run;
proc print data=have;
var Pat_ID Pat_stat Charge_code Discharge flag;
run;
Results: (minus the column headers cut by the forum)
1 | AA | 10 | 11 | 09/22/2012 | 000 |
---|---|---|---|---|---|
2 | BB | 30 | 11 | 05/11/2012 | 100 |
3 | CC | 10 | 55 | 12/13/2012 | 011 |
4 | DD | 20 | 11 | 06/04/2012 | 000 |
5 | EE | 30 | 55 | 12/03/2012 | 111 |
So the flag on the first row: Pat_stat is not 30 so the first position of the flag is 0 for false
charge_code IS 11 (not <>) so is also 0 and the date is not greate than 12/01/2012 so 0.
In the second row the Pat_stat is 30 so the first position 1 for true and the remainder are 0 as both are false
In the third row charge_code <> 11 so the 2nd position is 1 and the date is larger so it is true.
This type of flag requires that you constantly be aware of the meaning for each position.
An additional exercise for the interested reader is to build a custom format for each value of flag and then use that format for display to describe what is going one.
An advantage AND weakness to this approach: You can have a lot of comparisons represented, 2^n where n is the number of comparisons. So you can build something hard to interpret quickly. Also if you don't specify a length for the flag variable you'll get a 200 character default.
A third approach would be three separate flags coded 0/1.Proc summary with the flags as class variables could get you a count of the combinations of 1,2 or 3 taken at a time.
Sometimes this could depend on what you are doing next. The basic bit will be understanding the logical OR and AND constructions.
When working with dates though it really helps if your dates are actually SAS date values and not some text value. Otherwise comparisons get real ugly.
If you want all of a condition to be true use AND. You can select using WHERE statements in most procedures or a WHERE dataset option.
For example to print selected records:
proc print data=have;
where pat_stat=30 AND charge_code NE 11 and discharge > '01DEC2012'd; /*NOTE the SAS construct for a date literal. Also if the charge_code is a character variable then in needs quotes NE '11'. Comparisons with text values other than = also have some tricks as '3' is likely to be treated as greater than '11' but '100' wouldn't.*/
/* this would print just the records where all 3 conditions are true I use NE for "not equal" as there a multiple character options but this is easier for me to type*/
proc print data=have;
where pat_stat=30 OR charge_code NE 11 OR discharge > '01DEC2012'd; /* would print the records where any of the conditions is true*/
You can group using () to control order of comparison
proc print data=have;
where (pat_stat=30 OR charge_code NE 11) AND discharge > '01DEC2012'd; /* would print the records where either of the first two conditions is true and the 3rd must be true.
Also with SAS there is a comparison that lets you select a specific list for comparison(values not variables)
where pat_stat in (30, 10)
The logic for negating is a bit trickier and a good logic reference is in order.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.