Kindly help with the appropriate SAS code.
Please I want to Create 3 new disease group summary flag variables Dx_G1 to Dx_G3for each patient where:
Dx_G1 = 1 if patient has any disease code value from 000-100 inclusive, otherwise 0,
Dx_G2 = 1 if patient has any disease code value from 101-200 inclusive, otherwise 0,
Dx_GP3= 1 if patient has any disease code value from 201-300 inclusive, otherwise 0.
Note we have :
Disease code1
Disease code2
Disease code 3
You try this one. If I understood your query correctly, you want to create flag by looking at variables disease code 1 ,2,3 for mentioned range of disease codes then below code might work for you.
Just pass the input data set name and you'll have a new data set with flag variables.
proc sql;
create table new_dataset as select *,
CASE WHEN (disease_code1 between 000 and 100) OR
(disease_code2 between 000 and 100) OR
(disease_code3 between 000 and 100) THEN 1
ELSE 0 END as Dx_G1 ,
CASE WHEN (disease_code1 between 101 and 200) OR
(disease_code2 between 101 and 200) OR
(disease_code3 between 101 and 200) THEN 1
ELSE 0 END as Dx_G2 ,
CASE WHEN (disease_code1 between 201 and 300) OR
(disease_code1 between 201 and 300) OR
(disease_code1 between 201 and 300) THEN 1
ELSE 0 END as Dx_G3
from input_dataset;
quit;
Are your disease codes numeric or character? This is extremely important as character values and "inclusive" for 000-100.
It helps to provide a clearer example of your actual variable names and types. If you are not familiar with that run Proc Contents on your data set and share that information.
You try this one. If I understood your query correctly, you want to create flag by looking at variables disease code 1 ,2,3 for mentioned range of disease codes then below code might work for you.
Just pass the input data set name and you'll have a new data set with flag variables.
proc sql;
create table new_dataset as select *,
CASE WHEN (disease_code1 between 000 and 100) OR
(disease_code2 between 000 and 100) OR
(disease_code3 between 000 and 100) THEN 1
ELSE 0 END as Dx_G1 ,
CASE WHEN (disease_code1 between 101 and 200) OR
(disease_code2 between 101 and 200) OR
(disease_code3 between 101 and 200) THEN 1
ELSE 0 END as Dx_G2 ,
CASE WHEN (disease_code1 between 201 and 300) OR
(disease_code1 between 201 and 300) OR
(disease_code1 between 201 and 300) THEN 1
ELSE 0 END as Dx_G3
from input_dataset;
quit;
@Pivot_e wrote:
@MayurJadhav. Thank you for the response, it was helpful. Please can you educate me on the use of "," i.e comma to end the statements as against the ";" semicolon commonly used
Commas do not end statements. But commas are used in many places. And SQL in particular is EXTREMELY fond of using commas. In the example code the commas are used to separate the individual variables that are being "selected" by the SELECT statement.
Formatting hint: I find it is MUCH EASIER for humans to see (especially when scanning) such "continuation" characters if you place that at the start of the line that the code is continuing onto instead of the end of the previous line. That way they can be aligned and scanned without having to move you eyes' focus back and forth across the page.
Also when you have single statement (like many SQL SELECT statements) that spans multiple lines it helps to locate the semicolon that ends the statement on a new line. Just like you would locate and END to closes a DO block on a new line.
create table new_dataset as
select *
,CASE WHEN (disease_code1 between 000 and 100) OR
(disease_code2 between 000 and 100) OR
(disease_code3 between 000 and 100) THEN 1
ELSE 0
END as Dx_G1
,CASE WHEN (disease_code1 between 101 and 200) OR
(disease_code2 between 101 and 200) OR
(disease_code3 between 101 and 200) THEN 1
ELSE 0
END as Dx_G2
,CASE WHEN (disease_code1 between 201 and 300) OR
(disease_code1 between 201 and 300) OR
(disease_code1 between 201 and 300) THEN 1
ELSE 0
END as Dx_G3
from input_dataset
;
Data step:
data want; set input_dataset; Dx_g1 = ( (0 le disease_code1 le 100) or (0 le disease_code2 le 100) or (0 le disease_code3 le 100) ); Dx_g2 = ( (101 le disease_code1 le 200) or (101 le disease_code2 le 200) or (101 le disease_code3 le 200) ); Dx_g3 = ( (201 le disease_code1 le 300) or (201 le disease_code2 le 300) or (201 le disease_code3 le 300) ); run;
You may ask how 1/0 get assigned with the above code. SAS treats the results of logical comparisons as numeric 1/0 for true or false. So the result of the OR comparisons will result in true or false or 1/0 results.
The SQL could also have been written
,( (201 le disease_code1 le 300) or (201 le disease_code2 le 300) or (201 le disease_code3 le 300) ) as Dx_g3
instead of using a case/when construct. This comparison typically executes faster than a CASE/WHEN or data step IF/THEN/ELSE block of code. You may not notice the time difference with only a few thousand records but when you get to billions the time can add up.
Why use a data step instead of SQL? Depending on the exact code involved SQL may result in reordering the data set and if you have any operation that requires specific order of the data then you are not guaranteed to get the same results as SQL does not process sets sequentially. Also unless one of the SQL specific features such as joins between multiple sets are involved the Data step often executes quicker.
This forum prefers that you start a new thread when asking different questions not closely related to this current one.
Please I have another question. I like to generate and report the distribution of a variable waiting time.
Please How can I do this on SAS?
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.