BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
boom3196
Fluorite | Level 6

So I'm in a little conundrum, and I hope I can get some help here. On a basic level, I'm trying to create a file that has a student record for each day that each student was enrolled between two dates, 08/25/21 and 09/27/21. I have three datasets at play here. I have a dataset with all of the students and their enrollment dates and withdrawal dates:

 

sashelpdemofile.PNG

 

 

 

 

 

 

 

 

 

I also have a dataset with all calendar dates between these two dates, and whether that calendar date was a school day or not, defined in binary in the column 'In Session'.

sashelpcalfile.PNG

 

 

 

 

 

 

 

 

 

 

 

 

 

 

And I have a third dataset that keeps track, by day, of whether a student missed school or not. It does not include students who attended the day in full or student were not enrolled on that school day.

boom3196_0-1633031615033.png

 

I was thinking of using the first two datasets to create a student record for each individual date that was in session AND that each student was enrolled, so that I could merge that file with my third dataset. Using that final dataset, I could then count the number of days students were enrolled (by how many records each student has) and the days that they missed (using the variable 'Amount of School Missed'n) all in one (or two) step(s). What has me stuck is how to do the first step, creating a student record for each date in session and that each student was enrolled. I'm assuming that it is some type of loop with an array or list (of my calendar dates) of some kind?

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Since it is hard to program to read pictures I am going to ask that you either provide the data example in the form of data step code or text.

 

Second, are your "dates" actually SAS date values? If you run Proc Contents on your data set what does the result show for variable name (you don't show typical acceptable SAS variable names), type and format?

If the "date" variables are actually character then the first step would be to create SAS date values. Then this is trivial for creating the interval.

 

data have;
   input studentid $ enroll : date11. withdraw date11.;
   format enroll withdraw date9.;
datalines;
A 25-Aug-21  22-Jun-22
;

data expand;
   set have;
   do studentdate =enroll to withdraw;
      output;
   end;
   format studentdate date9.;
run;

The Do loop works because at heart SAS date values are numbers of days. So when you add one, such as with do loop, the next value is the next date in sequence.

The Output statement tells SAS to write to the output data set at that point in the code instead of the default end of the data step.

View solution in original post

4 REPLIES 4
ballardw
Super User

Since it is hard to program to read pictures I am going to ask that you either provide the data example in the form of data step code or text.

 

Second, are your "dates" actually SAS date values? If you run Proc Contents on your data set what does the result show for variable name (you don't show typical acceptable SAS variable names), type and format?

If the "date" variables are actually character then the first step would be to create SAS date values. Then this is trivial for creating the interval.

 

data have;
   input studentid $ enroll : date11. withdraw date11.;
   format enroll withdraw date9.;
datalines;
A 25-Aug-21  22-Jun-22
;

data expand;
   set have;
   do studentdate =enroll to withdraw;
      output;
   end;
   format studentdate date9.;
run;

The Do loop works because at heart SAS date values are numbers of days. So when you add one, such as with do loop, the next value is the next date in sequence.

The Output statement tells SAS to write to the output data set at that point in the code instead of the default end of the data step.

boom3196
Fluorite | Level 6

Yes, they are date values! And this worked perfectly! I didn't know you could create a loop using two variables already in your table that way! I'm so used to syntax using things like

i=x i to n by y

that I couldn't think of doing things this way. I also was forgetting to think about the date values as numbers rather than character values entirely and didn't know that I could just put the dates in a loop. Thanks so much for your help!

ballardw
Super User

@boom3196 wrote:

Yes, they are date values! And this worked perfectly! I didn't know you could create a loop using two variables already in your table that way! I'm so used to syntax using things like

i=x i to n by y

that I couldn't think of doing things this way. I also was forgetting to think about the date values as numbers rather than character values entirely and didn't know that I could just put the dates in a loop. Thanks so much for your help!


Lets parse that (incomplete) do loop. Better: Do i=x to n by y;

1) i= means set i to a starting value of x (typically these are numeric values. Dates are numeric values.)

2) "to n" means you intend to increment i until it is at least as large (or possibly small if decrementing the values) as the numeric value of n

3) "by y" means to increment in numeric steps of size n .

So once you get the idea that dates are numbers and that the values are integer number of days the do loop is easy.

 

The obnoxious bit is that so many of our date intervals are not constant so "By month" , "by year", "by quarter" don't have nice number values to use in a DO loop. [Aside, note that the graphics procedures where you are controlling which values to display as tick marks will allow value with "by month" and several common intervals though they will generally want a literal date: xaxis values=('01Jan2020'd to '01Jan2021'd by month) will get a tick for each month. The appearance of the tick label would depend on the format assigned to the xaxis variable).

 

Incrementing dates ( and usually datetimes ) often involves use of Do Until or Do While and use of the INTNX function to increment the value of the index variable inside the loop.

 

Simple time works relatively well as long as you remember the measure is seconds. So increment by 1 for one second, 60 for one minute, 3600 for one hour (or combinations/multiples as desired).

boom3196
Fluorite | Level 6
That all makes sense! Thanks for adding in the extra context and mentioning the INTNX function, I'll go look it up.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1147 views
  • 2 likes
  • 2 in conversation