BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jcapua2
Fluorite | Level 6

Hello. I need help with producing 2 separate tables from the original table, where one table contains observations with specific variables=Yes, but only those variables='Yes.'

 

For example, I only want patients who have HIV='Yes' or cancer='Yes', and nothing else='Yes'.

 

data have;

patientdiabetescholesterolHIVcancerHep B
patient_1 YesYesNoYesYes
patient_2YesNoYesNoNo
patient_3NoNoNoNoYes
patient_4NoNoYesNoNo
patient_5NoYesNoNoYes
patient_6NoNoYesYesNo
patient_7NoNoNoYesNo

 

 

/*only patients where HIV='Yes' or cancer='Yes' and everything else='No'*/

data want1;

set have;

patientdiabetescholesterolHIVcancerHep B
patient_4NoNoYesNoNo
patient_6NoNoYesYesNo
patient_7NoNoNoYesNo

 

 

/*everyone else*/

data want2;

set have;

patientdiabetescholesterolHIVcancerHep B
patient_1 YesYesNoYesYes
patient_2YesNoYesNoNo
patient_3NoNoNoNoYes
patient_5NoYesNoNoYes

 

 

Is there a way to do it in SQL or SAS datastep? Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
art297
Opal | Level 21

Here is how you could do it using a data step:

data want1 want2;
  set have;
  if (HIV eq 'Yes' or cancer eq 'Yes') and
    (diabetes NE 'Yes' and cholesterol NE 'Yes' and
     Hep_B ne 'Yes') then output want1;
  else output want2;
run;

Art, CEO, AnalystFinder.com

 

View solution in original post

11 REPLIES 11
novinosrin
Tourmaline | Level 20

why is patient1 in want2 when cancer='Yes' ? 

and for that matter patient2 in want2 when HIV='Yes'

jcapua2
Fluorite | Level 6

because diabetes='Yes', cholesterol='Yes', and Hep B='Yes.'

 

Only patients where either HIV='Yes' or cancer='Yes' and everything else='No.'

novinosrin
Tourmaline | Level 20


data have;
infile datalines truncover;
input (patient diabetes cholesterol HIV cancer Hep_B) (:$10.);
datalines;
patient_1 Yes Yes No Yes Yes
patient_2 Yes No Yes No No
patient_3 No No No No Yes
patient_4 No No Yes No No
patient_5 No Yes No No Yes
patient_6 No No Yes Yes No
patient_7 No No No Yes No
;

 

data want1 want2;
set have;
array a(*) diabetes cholesterol Hep_B;
if cancer='Yes' and (cancer not in a) then _flag=1;
if HIV='Yes' and (hiv not in a) then __flag=1;
if _flag or __flag then output want1;
else if not(_flag) and not(__flag) then output want2;
drop _:;
run;

art297
Opal | Level 21

@novinosrin: Actually, I like your use of an array in solving this problem, but didn't like your creating two flag variables.

Here is an alternative that uses arrays and, methinks, makes the code both more extensible and easier to read:

data want1 want2;
  set have;
  array musthaveone(*) HIV cancer;
  array canthaveany(*) diabetes cholesterol Hep_B;
  if 'Yes' in musthaveone  and
    'Yes' not in canthaveany then output want1;
  else output want2;
run;

Art, CEO, AnalystFinder.com

 

novinosrin
Tourmaline | Level 20

Thank you Art @art297   Wish the thought of using a constant operand check struck my mind earlier.  I often have this approach to avoid using constant operand as opposed to variable operand thinking to make it more squeaking clean. Nonetheless your demonstration proves otherwise. Well, I have learned now. Cheers! and Thanks again

novinosrin
Tourmaline | Level 20

@art297Art, How about still sticking to one array and making it all compound expression?

 

data want1 want2;
set have;
array a(*) diabetes cholesterol Hep_B;
if (cancer='Yes' and (cancer not in a)) or (HIV='Yes' and (hiv not in a)) then output want1;
else if not((cancer='Yes' and (cancer not in a)) or (HIV='Yes' and (hiv not in a))) then output want2;
run;

art297
Opal | Level 21

@novinosrin: No! Takes longer to run, is less extensible (IMHO), and is (IMHO) harder to read.

 

art297
Opal | Level 21

Here is how you could do it using a data step:

data want1 want2;
  set have;
  if (HIV eq 'Yes' or cancer eq 'Yes') and
    (diabetes NE 'Yes' and cholesterol NE 'Yes' and
     Hep_B ne 'Yes') then output want1;
  else output want2;
run;

Art, CEO, AnalystFinder.com

 

ballardw
Super User

Are your values actually character 'Yes' and 'No' or are they numeric with a format applied?

 

In many cases splitting data yields less efficient use than having a variable to indicate group membership and using BY processing on that variable.

HB
Barite | Level 11 HB
Barite | Level 11

For completeness since SQL was mentioned:

 


data patients;
  input patient $9. diabetes:$3. cholesterol:$3. hiv:$3. cancer:$3. hep_b:$3.;
  datalines;
patient_1 Yes Yes No Yes Yes
patient_2 Yes No Yes No	No
patient_3 No No No No Yes
patient_4 No No	Yes	No	No
patient_5 No Yes No No Yes
patient_6 No No	Yes	Yes	No
patient_7 No No	No Yes No
;
run;

proc sql;
	create table want1 as 
		select * 
		from patients
		where (diabetes = 'No' and cholesterol = 'No' and hep_b = 'No') and (hiv = 'Yes' or cancer = 'Yes');
quit;

proc sql;
	create table want2 as 
		select * 
		from patients
		where (diabetes = 'Yes' or cholesterol = 'Yes' or hep_b = 'Yes');
quit;

DaveShea
Lapis Lazuli | Level 10

Hi,

 

The DATA step is an excellent way to achieve what you want as it allows you to create one or more datasets in a single DATA step.

 

In this case you have a dataset, WORK.HAVE that you want to split out into two datasets, WORK.WANT1 and WORK.WANT2.

 

Whilst the method shown below works perfectly well, you may want to review things if the number of ailments (diabetes, cholestorol etc.) increases and/or your selection criteria gets more complex.

 

Having said that, try this out and see if it does what you need. Any record that does not meet the first criterion for inclusion in WORK.WANT1 will be swallowed up by the catch-all ELSE statement and get sent to WORK.WANT2.

 

Notice that the HIV Or Cancer check is inside a pair of parentheses so that this is evaluated separately from the remainder of the IF statement.

 

****************************************************************;
*Read in your Patient dataa.                                    ;
****************************************************************;
Data HAVE;
 Input  patient	$9.
        diabetes $	
        cholesterol	$
        HIV	$
        cancer	$
        Hep_B $
        ;
DATALINES;
patient_1 Yes Yes No Yes Yes
patient_2 Yes No Yes No No
patient_3 No No No No Yes
patient_4 No No Yes No No
patient_5 No Yes No No Yes
patient_6 No No Yes Yes No
patient_7 No No No Yes No
;
Run;

****************************************************************;
*Decide which of the WANTn datasets a record should go to.      ;
****************************************************************;
Data WANT1
     WANT2
     ;

 Set HAVE;

 If (HIV="Yes" Or Cancer="Yes") 
    And
    Diabetes="No"
    And
    Cholesterol="No"
    And
    Hep_B="No"
    Then Output WANT1;

 Else Output WANT2;

Run;

 

Cheers,

 

Downunder Dave

Wellington.

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 4475 views
  • 13 likes
  • 6 in conversation