- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Kindly help with the appropriate SAS code.
Please I want to Create 3 new disease group summary flag variables Dx_G1 to Dx_G3for each patient where:
Dx_G1 = 1 if patient has any disease code value from 000-100 inclusive, otherwise 0,
Dx_G2 = 1 if patient has any disease code value from 101-200 inclusive, otherwise 0,
Dx_GP3= 1 if patient has any disease code value from 201-300 inclusive, otherwise 0.
Note we have :
Disease code1
Disease code2
Disease code 3
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You try this one. If I understood your query correctly, you want to create flag by looking at variables disease code 1 ,2,3 for mentioned range of disease codes then below code might work for you.
Just pass the input data set name and you'll have a new data set with flag variables.
proc sql;
create table new_dataset as select *,
CASE WHEN (disease_code1 between 000 and 100) OR
(disease_code2 between 000 and 100) OR
(disease_code3 between 000 and 100) THEN 1
ELSE 0 END as Dx_G1 ,
CASE WHEN (disease_code1 between 101 and 200) OR
(disease_code2 between 101 and 200) OR
(disease_code3 between 101 and 200) THEN 1
ELSE 0 END as Dx_G2 ,
CASE WHEN (disease_code1 between 201 and 300) OR
(disease_code1 between 201 and 300) OR
(disease_code1 between 201 and 300) THEN 1
ELSE 0 END as Dx_G3
from input_dataset;
quit;
BI Developer. Writer. Creative Educator.
SAS Blog → https://learnsascode.com
YouTube Channel: → https://www.youtube.com/@imayurj
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Are your disease codes numeric or character? This is extremely important as character values and "inclusive" for 000-100.
It helps to provide a clearer example of your actual variable names and types. If you are not familiar with that run Proc Contents on your data set and share that information.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Please also note: for instance the Dx_G1 should check three variables (disease_code1, disease_code2, diseases_code3) if there are codes 000 to 100 and then generate Dx_G1 as 1
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You try this one. If I understood your query correctly, you want to create flag by looking at variables disease code 1 ,2,3 for mentioned range of disease codes then below code might work for you.
Just pass the input data set name and you'll have a new data set with flag variables.
proc sql;
create table new_dataset as select *,
CASE WHEN (disease_code1 between 000 and 100) OR
(disease_code2 between 000 and 100) OR
(disease_code3 between 000 and 100) THEN 1
ELSE 0 END as Dx_G1 ,
CASE WHEN (disease_code1 between 101 and 200) OR
(disease_code2 between 101 and 200) OR
(disease_code3 between 101 and 200) THEN 1
ELSE 0 END as Dx_G2 ,
CASE WHEN (disease_code1 between 201 and 300) OR
(disease_code1 between 201 and 300) OR
(disease_code1 between 201 and 300) THEN 1
ELSE 0 END as Dx_G3
from input_dataset;
quit;
BI Developer. Writer. Creative Educator.
SAS Blog → https://learnsascode.com
YouTube Channel: → https://www.youtube.com/@imayurj
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Pivot_e wrote:
@MayurJadhav. Thank you for the response, it was helpful. Please can you educate me on the use of "," i.e comma to end the statements as against the ";" semicolon commonly used
Commas do not end statements. But commas are used in many places. And SQL in particular is EXTREMELY fond of using commas. In the example code the commas are used to separate the individual variables that are being "selected" by the SELECT statement.
Formatting hint: I find it is MUCH EASIER for humans to see (especially when scanning) such "continuation" characters if you place that at the start of the line that the code is continuing onto instead of the end of the previous line. That way they can be aligned and scanned without having to move you eyes' focus back and forth across the page.
Also when you have single statement (like many SQL SELECT statements) that spans multiple lines it helps to locate the semicolon that ends the statement on a new line. Just like you would locate and END to closes a DO block on a new line.
create table new_dataset as
select *
,CASE WHEN (disease_code1 between 000 and 100) OR
(disease_code2 between 000 and 100) OR
(disease_code3 between 000 and 100) THEN 1
ELSE 0
END as Dx_G1
,CASE WHEN (disease_code1 between 101 and 200) OR
(disease_code2 between 101 and 200) OR
(disease_code3 between 101 and 200) THEN 1
ELSE 0
END as Dx_G2
,CASE WHEN (disease_code1 between 201 and 300) OR
(disease_code1 between 201 and 300) OR
(disease_code1 between 201 and 300) THEN 1
ELSE 0
END as Dx_G3
from input_dataset
;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I really appreciate your timely response.
Please I will like to follow you (to be mentored further). I would appreciate it if you can share your handle or WhatsApp contact.
Regards,
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I like to generate and report the distribution of a variable waiting time. Please How can I do this on SAS?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Regards,
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Data step:
data want; set input_dataset; Dx_g1 = ( (0 le disease_code1 le 100) or (0 le disease_code2 le 100) or (0 le disease_code3 le 100) ); Dx_g2 = ( (101 le disease_code1 le 200) or (101 le disease_code2 le 200) or (101 le disease_code3 le 200) ); Dx_g3 = ( (201 le disease_code1 le 300) or (201 le disease_code2 le 300) or (201 le disease_code3 le 300) ); run;
You may ask how 1/0 get assigned with the above code. SAS treats the results of logical comparisons as numeric 1/0 for true or false. So the result of the OR comparisons will result in true or false or 1/0 results.
The SQL could also have been written
,( (201 le disease_code1 le 300) or (201 le disease_code2 le 300) or (201 le disease_code3 le 300) ) as Dx_g3
instead of using a case/when construct. This comparison typically executes faster than a CASE/WHEN or data step IF/THEN/ELSE block of code. You may not notice the time difference with only a few thousand records but when you get to billions the time can add up.
Why use a data step instead of SQL? Depending on the exact code involved SQL may result in reordering the data set and if you have any operation that requires specific order of the data then you are not guaranteed to get the same results as SQL does not process sets sequentially. Also unless one of the SQL specific features such as joins between multiple sets are involved the Data step often executes quicker.
This forum prefers that you start a new thread when asking different questions not closely related to this current one.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Many thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Please I have another question. I like to generate and report the distribution of a variable waiting time.
Please How can I do this on SAS?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content