Using the FirstDtSrvc and Lastdtsrvc, I want to create 3 new variables in this data set.
Stay Begin
Stay End
Stay Count= number of stays per clients
I specifically want these new date variables for Clinic_Type=1 (see sas data set below) .
FirstDtSrcv-Lastdtsrvc is the client's stay duration, and most clients have multiple stays. I want to roll up all stays that are <=7 days apart into 1 stay for that client. The stay begins variable will equal the first data of service (FirstDtSrcv)for that new stay duration and the stay end will equal the last date of service (Lastdtsrvc) of that new stay duration.
Here is an Example :
Here is my data set:
Data have20;
informat FirstDtSrvc;
informat Lastdtsrvc;
format FirstDtSrvc mmddyy10. ;
format Lastdtsrvc mmddyy10. ;
input Client_ID$ FirstDtSrvc Lastdtsrvc Clinic_Type Clinic_name$ ;
datalines;
A 43412 43415 1 AA
A 43416 43422 1 AA
A 43423 43429 1 AA
A 43430 43434 1 AA
A 43435 43436 1 AA
A 43437 43438 1 AA
B 43031 43039 1 AC
B 43040 43044 1 AC
B 43049 43065 1 AC
B 43066 43069 1 AC
C 42717 42717 1 AB
C 42718 42718 1 AB
C 42719 42735 1 AB
C 42736 42750 1 AB
C 42751 42766 1 AB
C 42767 42781 1 AB
C 42782 42794 1 AB
C 42795 42809 1 AB
C 42810 42825 1 AB
C 42826 42840 1 AG
C 42841 42855 1 AG
C 42856 42870 1 AG
C 42871 42886 1 AG
C 42887 42901 1 AG
C 42902 42916 1 AG
C 42917 42924 1 AG
D 45117 45120 2 AP
E 45012 45033 2 AR
F 45058 45059 3 AF
F 45065 45065 3 AD
;
run;
Thank you!!!
You data step is apparently using some other source of numeric values instead of your actual SAS dates shown in the picture the first line has dates of 11/20/2078 and 11/26/2078. Plus there is no gap large enough for the C client to generate two sequences.
And what ever offset is applied is not consistent. So you may want use this code to generate a new data step:
Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the </> icon or attached as text to show exactly what you have and that we can test code against.
What is supposed to happen with clinic type other than 1? No mention is made. You also so do not provide any rule for if a client has multiple (interleaved) clinic types.
Does the clinic_name have any role in this? As in what would happen with something like
A 43412 43415 1 AA A 43416 43422 1 AB A 43423 43429 1 AA A 43430 43434 1 AB
I would start with something like this.
/* assumes data is sorted by Client id FirstDtSrvc */ Data temp; set have20 ; by client_id; retain seqstart; l_last = lag(Lastdtsrvc); if first.client_id then Seqstart=FirstDtSrvc; else if FirstDtSrvc - l_last >7 then do; seqstart=firstdtsrvc; end; run; proc summary data=temp nway; class client_id seqstart; var FirstDtSrvc Lastdtsrvc; output out=want (drop=_type_ _freq_) min(FirstDtSrvc)=Span_Begin max(Lastdtsrvc)=Span_end ; run;
It is very confusing, potentially quite fatal in terms of code, to discuss a problem in one term and use variables of different names. Your picture shows FirstDOS as does your narrative. But the example data step code uses FirstDtSrvc. A couple more similar themed variables and we might not have a clear understanding.
You data step is apparently using some other source of numeric values instead of your actual SAS dates shown in the picture the first line has dates of 11/20/2078 and 11/26/2078. Plus there is no gap large enough for the C client to generate two sequences.
And what ever offset is applied is not consistent. So you may want use this code to generate a new data step:
Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the </> icon or attached as text to show exactly what you have and that we can test code against.
What is supposed to happen with clinic type other than 1? No mention is made. You also so do not provide any rule for if a client has multiple (interleaved) clinic types.
Does the clinic_name have any role in this? As in what would happen with something like
A 43412 43415 1 AA A 43416 43422 1 AB A 43423 43429 1 AA A 43430 43434 1 AB
I would start with something like this.
/* assumes data is sorted by Client id FirstDtSrvc */ Data temp; set have20 ; by client_id; retain seqstart; l_last = lag(Lastdtsrvc); if first.client_id then Seqstart=FirstDtSrvc; else if FirstDtSrvc - l_last >7 then do; seqstart=firstdtsrvc; end; run; proc summary data=temp nway; class client_id seqstart; var FirstDtSrvc Lastdtsrvc; output out=want (drop=_type_ _freq_) min(FirstDtSrvc)=Span_Begin max(Lastdtsrvc)=Span_end ; run;
It is very confusing, potentially quite fatal in terms of code, to discuss a problem in one term and use variables of different names. Your picture shows FirstDOS as does your narrative. But the example data step code uses FirstDtSrvc. A couple more similar themed variables and we might not have a clear understanding.
Thank you!! This rolls my dates up exactly how I wanted them. Is there any way I can keep all the variables and only have the Span_Begin and Span_End populate for Clinic_Type =1? Span_Begin and Span_End can be . or blank for all other clinic types.
Thank you!
@Whitlea wrote:
Thank you!! This rolls my dates up exactly how I wanted them. Is there any way I can keep all the variables and only have the Span_Begin and Span_End populate for Clinic_Type =1? Span_Begin and Span_End can be . or blank for all other clinic types.
Thank you!
You still have not answered @ballardw's question namely what if clinic types are interleaved? Do you just ignore the interspersed records for other clinic types? Or does a change in clinic type force a boundary on spans for clinic type 1?
And if there are two spans for clinic type 1, but separated by more than 7 days, should they be output as two separate output records?
What if you have a series of clinic type 2 followed by a series of clinic type 3? Is that two output records with missing span_begin and span_end?
Help us help you.
I need all variables (Client_ID ,FirstDtSrvc,Lastdtsrvc, Clinic_Type, Clinic_name, Span Begin, Span End, Span count) to remain in the data set that will also include the span variables. I only need these span variables for clinic type 1, the span variables can be blank for other clinic types as actual date variables that already exists in the data will be used for clinic types <> 1. If there are 2 spans (stays separated by 7 days), that would be 2 separate spans for that client or clinic name. A client and clinic can have more than 1 span.
Ideally, when I am finished adding these new variables, I would like to be able to export the data to Excel and be able to get span and client count by clinic name with a pivot table. That might help understand how I want this data set up.
Hope this answers all of your questions.
Thank you for your help.
@Whitlea wrote:
Thank you!! This rolls my dates up exactly how I wanted them. Is there any way I can keep all the variables and only have the Span_Begin and Span_End populate for Clinic_Type =1? Span_Begin and Span_End can be . or blank for all other clinic types.
Thank you!
If you only want clinic_type=1 you have a couple of choices:
1) if you don't want the clinics in the output at all then use a Where Clinic_type=1; which would typically be either a data set option or a statement shortly after the SET in the data set or in the Proc Summary code.
2) If the clinic_type is not 1 then skip the calculations in the data step (and use the Missing option on the class statement in Summary )or
3) set the variables to missing in a data step that uses the output from Proc Summary as the input.
The last may be the easiest depending on what you actually want the see the clinic types but not the span ranges:
Data final; set want; if clinic_type ne 1 then call missing (Span_begin, Span_end); run;
Call missing will take a list of variables of any type and set them all to missing as appropriate for the variable type.
Those are not date values, unless you a planning something that is 60 years in the future.
147 data _null_; 148 date1=43412; 149 date2=43415; 150 put date1= date9. date2= date9.; 151 run; date1=09NOV2078 date2=12NOV2078
From the first photograph you appear to have an XSLX worksheet. You probably incorrectly imported it such that the date columns ended up with mixed numeric and character values. In that case SAS would store the numbers that EXCEL would use to represent the dates as digit strings like you show in your data step.
To fix EXCEL dates into SAS dates you need to add '30DEC1899'd to adjust for the different offset date used (and other issues with EXCEL date numbering).
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.