BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
A_Halps
Obsidian | Level 7

I am trying to look at an 18 month range of dates and label them properly. However, some of the dates overlap.

 

For example, some of the date ranges will be "7/1/02 - 12/31/03", "7/1/03 - 12/31/04", and "7/1/04 - 12/31/05". I want to a make a new column to show if a specific date is in that date range or not. The only issue I keep having is the overlapping dates. I want to duplicate a row if it is in both date ranges so that new column I created will say each date range. For example, 11/1/03 is in both "7/1/02 - 12/31/03" and "7/1/03 - 12/31/04". So I want the row of data where exam_dt = 11/1/03 to be duplicated so new_exam_date will have one row that says "7/1/02 - 12/31/03" and another row that says "7/1/03 - 12/31/04".

 

Right now, I have the following code:

 

data symptoms_new_date;
    set symptoms;
        if exam_dt ge '01jul2001'd and exam_dt le '31dec2002'd
            then new_exam_date="7/1/01 - 12/31/02";
        else if exam_dt ge '01jul2002'd and exam_dt le '31dec2003'd
            then new_exam_date="7/1/02 - 12/31/03";
        else if exam_dt ge '01jul2003'd and exam_dt le '31dec2004'd
            then new_exam_date="7/1/03 - 12/31/04";
        else if exam_dt ge '01jul2004'd and exam_dt le '31dec2005'd
            then new_exam_date="7/1/04 - 12/31/05";
        else new_exam_date=0;
run;

 

Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

Your program is almost there, but:

 

  1. You are using 
        if ... then ...
        else if ... then ...
    sequences.  This means that a date that could satisfy 2 if tests will only be subject to 1 of the THEN actions,  This is because, once a condition is satisfied the ELSE usage prevents consideration of subsequent conditions.

  2. Once you've solved the above, you should issue an OUTPUT statement for each satisfied condition.  This will allow you to duplicate rows when a date falls into 2 date ranges:

 

data symptoms_new_date;
    set symptoms;
        if exam_dt ge '01jul2001'd and exam_dt le '31dec2002'd then do;
          new_exam_date="7/1/01 - 12/31/02";
          output;
        end;
        if exam_dt ge '01jul2002'd and exam_dt le '31dec2003'd then do;
          new_exam_date="7/1/02 - 12/31/03";
          output;
        end;
        if exam_dt ge '01jul2003'd and exam_dt le '31dec2004'd then do;
          new_exam_date="7/1/03 - 12/31/04";
          output;
        end;
        if exam_dt ge '01jul2004'd and exam_dt le '31dec2005'd then do;
          new_exam_date="7/1/04 - 12/31/05";
          output;
        end;
		if exam_dt lt '01jul2001'd or exam_dt>'31dec2005'd then do;
          new_exam_date=0;
          output;
        end;
run;


    

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

5 REPLIES 5
SuryaKiran
Meteorite | Level 14

Hello,

 

Use multiple IF-THEN conditions to create multiple columns based on the date range, later you can use these new columns created to identify if the date falls in more than one group.

 

data symptoms_new_date;
    set symptoms;
        if exam_dt ge '01jul2001'd and exam_dt le '31dec2002'd
            then date1="7/1/01 - 12/31/02";
        if exam_dt ge '01jul2002'd and exam_dt le '31dec2003'd
            then date2="7/1/02 - 12/31/03";
        if exam_dt ge '01jul2003'd and exam_dt le '31dec2004'd
            then date3="7/1/03 - 12/31/04";
        if exam_dt ge '01jul2004'd and exam_dt le '31dec2005'd
            then date4="7/1/04 - 12/31/05";			
run;
Thanks,
Suryakiran
A_Halps
Obsidian | Level 7

Thank you, but how would I continue with the following if I used your method:

 


proc sql;
    create table symptoms_18_mo as
        select b.*
            from arep.pop_2018 a
        left join Symptoms_new_date b
            on a.study_id=b.study_id

            group by a.study_id, b.new_exam_date;

    create table check_symptoms as
        select distinct new_exam_date,
            count(*) as total,
                sum(cough) as cough_abs,
                sum(gerd) as gerd_abs,
        from symptoms_18_mo
            group by new_exam_date;
quit;

proc sort data=symptoms_18_mo;
    by new_exam_date;
run;

proc freq data=symptoms_18_mo;
    tables new_exam_date* (cough gerd)  / missing nopercent;
run;

SuryaKiran
Meteorite | Level 14

Create the column you require at the end of the if condition.

 

new_exam_date=catx(" ",date1,date2,date3,date4);

This will show the two groups if the date exists in both.

 

Please provide some sample data and your required output for more specific solution.

Thanks,
Suryakiran
mkeintz
PROC Star

Your program is almost there, but:

 

  1. You are using 
        if ... then ...
        else if ... then ...
    sequences.  This means that a date that could satisfy 2 if tests will only be subject to 1 of the THEN actions,  This is because, once a condition is satisfied the ELSE usage prevents consideration of subsequent conditions.

  2. Once you've solved the above, you should issue an OUTPUT statement for each satisfied condition.  This will allow you to duplicate rows when a date falls into 2 date ranges:

 

data symptoms_new_date;
    set symptoms;
        if exam_dt ge '01jul2001'd and exam_dt le '31dec2002'd then do;
          new_exam_date="7/1/01 - 12/31/02";
          output;
        end;
        if exam_dt ge '01jul2002'd and exam_dt le '31dec2003'd then do;
          new_exam_date="7/1/02 - 12/31/03";
          output;
        end;
        if exam_dt ge '01jul2003'd and exam_dt le '31dec2004'd then do;
          new_exam_date="7/1/03 - 12/31/04";
          output;
        end;
        if exam_dt ge '01jul2004'd and exam_dt le '31dec2005'd then do;
          new_exam_date="7/1/04 - 12/31/05";
          output;
        end;
		if exam_dt lt '01jul2001'd or exam_dt>'31dec2005'd then do;
          new_exam_date=0;
          output;
        end;
run;


    

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
A_Halps
Obsidian | Level 7

This worked! Thank you!!

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 1041 views
  • 1 like
  • 3 in conversation