Hello. I need help with producing 2 separate tables from the original table, where one table contains observations with specific variables=Yes, but only those variables='Yes.'
For example, I only want patients who have HIV='Yes' or cancer='Yes', and nothing else='Yes'.
data have;
patient | diabetes | cholesterol | HIV | cancer | Hep B |
patient_1 | Yes | Yes | No | Yes | Yes |
patient_2 | Yes | No | Yes | No | No |
patient_3 | No | No | No | No | Yes |
patient_4 | No | No | Yes | No | No |
patient_5 | No | Yes | No | No | Yes |
patient_6 | No | No | Yes | Yes | No |
patient_7 | No | No | No | Yes | No |
/*only patients where HIV='Yes' or cancer='Yes' and everything else='No'*/
data want1;
set have;
patient | diabetes | cholesterol | HIV | cancer | Hep B |
patient_4 | No | No | Yes | No | No |
patient_6 | No | No | Yes | Yes | No |
patient_7 | No | No | No | Yes | No |
/*everyone else*/
data want2;
set have;
patient | diabetes | cholesterol | HIV | cancer | Hep B |
patient_1 | Yes | Yes | No | Yes | Yes |
patient_2 | Yes | No | Yes | No | No |
patient_3 | No | No | No | No | Yes |
patient_5 | No | Yes | No | No | Yes |
Is there a way to do it in SQL or SAS datastep? Thank you!
Here is how you could do it using a data step:
data want1 want2; set have; if (HIV eq 'Yes' or cancer eq 'Yes') and (diabetes NE 'Yes' and cholesterol NE 'Yes' and Hep_B ne 'Yes') then output want1; else output want2; run;
Art, CEO, AnalystFinder.com
why is patient1 in want2 when cancer='Yes' ?
and for that matter patient2 in want2 when HIV='Yes'
because diabetes='Yes', cholesterol='Yes', and Hep B='Yes.'
Only patients where either HIV='Yes' or cancer='Yes' and everything else='No.'
data have;
infile datalines truncover;
input (patient diabetes cholesterol HIV cancer Hep_B) (:$10.);
datalines;
patient_1 Yes Yes No Yes Yes
patient_2 Yes No Yes No No
patient_3 No No No No Yes
patient_4 No No Yes No No
patient_5 No Yes No No Yes
patient_6 No No Yes Yes No
patient_7 No No No Yes No
;
data want1 want2;
set have;
array a(*) diabetes cholesterol Hep_B;
if cancer='Yes' and (cancer not in a) then _flag=1;
if HIV='Yes' and (hiv not in a) then __flag=1;
if _flag or __flag then output want1;
else if not(_flag) and not(__flag) then output want2;
drop _:;
run;
@novinosrin: Actually, I like your use of an array in solving this problem, but didn't like your creating two flag variables.
Here is an alternative that uses arrays and, methinks, makes the code both more extensible and easier to read:
data want1 want2; set have; array musthaveone(*) HIV cancer; array canthaveany(*) diabetes cholesterol Hep_B; if 'Yes' in musthaveone and 'Yes' not in canthaveany then output want1; else output want2; run;
Art, CEO, AnalystFinder.com
Thank you Art @art297 Wish the thought of using a constant operand check struck my mind earlier. I often have this approach to avoid using constant operand as opposed to variable operand thinking to make it more squeaking clean. Nonetheless your demonstration proves otherwise. Well, I have learned now. Cheers! and Thanks again
@art297Art, How about still sticking to one array and making it all compound expression?
data want1 want2;
set have;
array a(*) diabetes cholesterol Hep_B;
if (cancer='Yes' and (cancer not in a)) or (HIV='Yes' and (hiv not in a)) then output want1;
else if not((cancer='Yes' and (cancer not in a)) or (HIV='Yes' and (hiv not in a))) then output want2;
run;
@novinosrin: No! Takes longer to run, is less extensible (IMHO), and is (IMHO) harder to read.
Here is how you could do it using a data step:
data want1 want2; set have; if (HIV eq 'Yes' or cancer eq 'Yes') and (diabetes NE 'Yes' and cholesterol NE 'Yes' and Hep_B ne 'Yes') then output want1; else output want2; run;
Art, CEO, AnalystFinder.com
Are your values actually character 'Yes' and 'No' or are they numeric with a format applied?
In many cases splitting data yields less efficient use than having a variable to indicate group membership and using BY processing on that variable.
For completeness since SQL was mentioned:
data patients;
input patient $9. diabetes:$3. cholesterol:$3. hiv:$3. cancer:$3. hep_b:$3.;
datalines;
patient_1 Yes Yes No Yes Yes
patient_2 Yes No Yes No No
patient_3 No No No No Yes
patient_4 No No Yes No No
patient_5 No Yes No No Yes
patient_6 No No Yes Yes No
patient_7 No No No Yes No
;
run;
proc sql;
create table want1 as
select *
from patients
where (diabetes = 'No' and cholesterol = 'No' and hep_b = 'No') and (hiv = 'Yes' or cancer = 'Yes');
quit;
proc sql;
create table want2 as
select *
from patients
where (diabetes = 'Yes' or cholesterol = 'Yes' or hep_b = 'Yes');
quit;
Hi,
The DATA step is an excellent way to achieve what you want as it allows you to create one or more datasets in a single DATA step.
In this case you have a dataset, WORK.HAVE that you want to split out into two datasets, WORK.WANT1 and WORK.WANT2.
Whilst the method shown below works perfectly well, you may want to review things if the number of ailments (diabetes, cholestorol etc.) increases and/or your selection criteria gets more complex.
Having said that, try this out and see if it does what you need. Any record that does not meet the first criterion for inclusion in WORK.WANT1 will be swallowed up by the catch-all ELSE statement and get sent to WORK.WANT2.
Notice that the HIV Or Cancer check is inside a pair of parentheses so that this is evaluated separately from the remainder of the IF statement.
****************************************************************;
*Read in your Patient dataa. ;
****************************************************************;
Data HAVE;
Input patient $9.
diabetes $
cholesterol $
HIV $
cancer $
Hep_B $
;
DATALINES;
patient_1 Yes Yes No Yes Yes
patient_2 Yes No Yes No No
patient_3 No No No No Yes
patient_4 No No Yes No No
patient_5 No Yes No No Yes
patient_6 No No Yes Yes No
patient_7 No No No Yes No
;
Run;
****************************************************************;
*Decide which of the WANTn datasets a record should go to. ;
****************************************************************;
Data WANT1
WANT2
;
Set HAVE;
If (HIV="Yes" Or Cancer="Yes")
And
Diabetes="No"
And
Cholesterol="No"
And
Hep_B="No"
Then Output WANT1;
Else Output WANT2;
Run;
Cheers,
Downunder Dave
Wellington.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.