BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Pivot_e
Calcite | Level 5

Kindly help with the appropriate SAS code.
Please I want to Create 3 new disease group summary flag variables Dx_G1 to Dx_G3for each patient where:
Dx_G1 = 1 if patient has any disease code value from 000-100 inclusive, otherwise 0,
Dx_G2 = 1 if patient has any disease code value from 101-200 inclusive, otherwise 0,
Dx_GP3= 1 if patient has any disease code value from 201-300 inclusive, otherwise 0.

Note we have :
Disease code1
Disease code2
Disease code 3

1 ACCEPTED SOLUTION

Accepted Solutions
MayurJadhav
Quartz | Level 8

You try this one. If I understood your query correctly, you want to create flag by looking at variables disease code 1 ,2,3 for mentioned range of disease codes then below code might work for you. 

 

Just pass the input data set name and you'll have a new data set with flag variables.

 

proc sql;
create table new_dataset as select *,
CASE WHEN (disease_code1 between 000 and 100) OR
(disease_code2 between 000 and 100) OR
(disease_code3 between 000 and 100) THEN 1
ELSE 0 END as Dx_G1 ,
CASE WHEN (disease_code1 between 101 and 200) OR
(disease_code2 between 101 and 200) OR
(disease_code3 between 101 and 200) THEN 1
ELSE 0 END as Dx_G2 ,
CASE WHEN (disease_code1 between 201 and 300) OR
(disease_code1 between 201 and 300) OR
(disease_code1 between 201 and 300) THEN 1
ELSE 0 END as Dx_G3

from input_dataset;
quit;

 

Mayur Jadhav
BI Developer. Writer. Creative Educator.

SAS Blog → https://learnsascode.com
YouTube Channel: → https://www.youtube.com/@imayurj

View solution in original post

13 REPLIES 13
ballardw
Super User

Are your disease codes numeric or character? This is extremely important as character values and "inclusive" for 000-100.

 

It helps to provide a clearer example of your actual variable names and types. If you are not familiar with that run Proc Contents on your data set and share that information.

Pivot_e
Calcite | Level 5
Yes the disease codes are numeric.
Please also note: for instance the Dx_G1 should check three variables (disease_code1, disease_code2, diseases_code3) if there are codes 000 to 100 and then generate Dx_G1 as 1
MayurJadhav
Quartz | Level 8

You try this one. If I understood your query correctly, you want to create flag by looking at variables disease code 1 ,2,3 for mentioned range of disease codes then below code might work for you. 

 

Just pass the input data set name and you'll have a new data set with flag variables.

 

proc sql;
create table new_dataset as select *,
CASE WHEN (disease_code1 between 000 and 100) OR
(disease_code2 between 000 and 100) OR
(disease_code3 between 000 and 100) THEN 1
ELSE 0 END as Dx_G1 ,
CASE WHEN (disease_code1 between 101 and 200) OR
(disease_code2 between 101 and 200) OR
(disease_code3 between 101 and 200) THEN 1
ELSE 0 END as Dx_G2 ,
CASE WHEN (disease_code1 between 201 and 300) OR
(disease_code1 between 201 and 300) OR
(disease_code1 between 201 and 300) THEN 1
ELSE 0 END as Dx_G3

from input_dataset;
quit;

 

Mayur Jadhav
BI Developer. Writer. Creative Educator.

SAS Blog → https://learnsascode.com
YouTube Channel: → https://www.youtube.com/@imayurj
Pivot_e
Calcite | Level 5
You quite understood my query correctly. Many thanks.
Pivot_e
Calcite | Level 5
@MayurJadhav. Thank you for the response, it was helpful. Please can you educate me on the use of "," i.e comma to end the statements as against the ";" semicolon commonly used
Tom
Super User Tom
Super User

@Pivot_e wrote:
@MayurJadhav. Thank you for the response, it was helpful. Please can you educate me on the use of "," i.e comma to end the statements as against the ";" semicolon commonly used

Commas do not end statements.  But commas are used in many places. And SQL in particular is EXTREMELY fond of using commas.  In the example code the commas are used to separate the individual variables that are being "selected" by the SELECT statement.

 

Formatting hint:  I find it is MUCH EASIER for humans to see (especially when scanning) such "continuation" characters if you place that at the start of the line that the code is continuing onto instead of the end of the previous line.  That way they can be aligned and scanned without having to move you eyes' focus back and forth across the page.

Also when you have single statement (like many SQL SELECT statements) that spans multiple lines it helps to locate the semicolon that ends the statement on a new line.  Just like you would locate and END to closes a DO block on a new line.

create table new_dataset as 
  select *
	,CASE WHEN (disease_code1 between 000 and 100) OR
	          (disease_code2 between 000 and 100) OR
	          (disease_code3 between 000 and 100) THEN 1
			ELSE 0 
     END as Dx_G1
	,CASE WHEN (disease_code1 between 101 and 200) OR 
	          (disease_code2 between 101 and 200) OR 
	          (disease_code3 between 101 and 200) THEN 1
			ELSE 0 
     END as Dx_G2			
	,CASE WHEN (disease_code1 between 201 and 300) OR
			  (disease_code1 between 201 and 300) OR
			  (disease_code1 between 201 and 300) THEN 1
			ELSE 0 
      END as Dx_G3 
from input_dataset
;

 

 

 

Pivot_e
Calcite | Level 5
This is quite helpful.
I really appreciate your timely response.
Please I will like to follow you (to be mentored further). I would appreciate it if you can share your handle or WhatsApp contact.
Regards,
Pivot_e
Calcite | Level 5
Please I have another question.
I like to generate and report the distribution of a variable waiting time. Please How can I do this on SAS?
Pivot_e
Calcite | Level 5
Please I will like to follow you (to be mentored further). I would appreciate it if you can share your handle or WhatsApp contact.
Regards,
ballardw
Super User

Data step:

 

data want;
   set input_dataset;
   Dx_g1 = ( (0 le disease_code1 le 100) or 
             (0 le disease_code2 le 100) or 
             (0 le disease_code3 le 100) 
           );
   Dx_g2 = ( (101 le disease_code1 le 200) or 
             (101 le disease_code2 le 200) or 
             (101 le disease_code3 le 200) 
            );
   Dx_g3 = ( (201 le disease_code1 le 300) or 
             (201 le disease_code2 le 300) or 
             (201 le disease_code3 le 300) 
           );
run;

You may ask how 1/0 get assigned with the above code. SAS treats the results of logical comparisons as numeric 1/0 for true or false. So the result of the OR comparisons will result in true or false or 1/0 results.

The SQL could also have been written

,( (201 le disease_code1 le 300) or 
   (201 le disease_code2 le 300) or 
   (201 le disease_code3 le 300) 
  ) as Dx_g3

instead of using a case/when construct. This comparison typically executes faster than a CASE/WHEN or data step IF/THEN/ELSE block of code. You may not notice the time difference with only a few thousand records but when you get to billions the time can add up.

 

Why use a data step instead of SQL? Depending on the exact code involved SQL may result in reordering the data set and if you have any operation that requires specific order of the data then you are not guaranteed to get the same results as SQL does not process sets sequentially. Also unless one of the SQL specific features such as joins between multiple sets are involved the Data step often executes quicker.

 

This forum prefers that you start a new thread when asking different questions not closely related to this current one.

Pivot_e
Calcite | Level 5
Wow!... I am really learning here.
Many thanks.
Pivot_e
Calcite | Level 5

Please I have another question. I like to generate and report the distribution of a variable waiting time.

Please How can I do this on SAS?

Pivot_e
Calcite | Level 5
@MayurJadhav please what will the codes be for disease_code1 to disease code_3 [character (3) format ‘nnn’]

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 13 replies
  • 1067 views
  • 3 likes
  • 4 in conversation