BookmarkSubscribeRSS Feed
Jordan88
Obsidian | Level 7

So usually I try to solve something first before asking a question, but I really have no idea where to start with this one.

 

Basically, I am working with an aggregated data set of the number of students who are absent from school on a given day. The data is aggregated by school name, sex, age, grade, absence reason, and day. When a school doesn't have any absenteeism for a given sex, age, grade, absence reason, and day combination, there is simply no entry. 

 

I need to create an entry for each missing combination of school name, sex, age, grade, absence reason, and day, where the number of students absent = 0. 

 

It needs to be automated (this code would be run daily), and thus be able to identify which combinations are not present and then create them. All fields would be populated from existing fields, except for the number of absent students which would always be equal to 0. 

 

I have no idea where to begin.

 

(Using SAS 9.4, if needed I can supply a sample dataset, it would just take some)

4 REPLIES 4
ballardw
Super User

Since you are dealing with aggregate data and presumably should only have one record for each combination a possible relatively simple approach would be to make a base data set with all zero values and use either a merge or update to create a new combined data set.

 

But your "day" variable may or may not make that easy. Is it supposed to be the day of the week or a date? Or possibly the DAY could be provided differently. Example data would help.

 

Here is a rough example:

data base;
   do sname = 'Washington','Lincoln','Jefferson';
   do grade = 1 to 6;
   do age   = 5 to 10;
      absentee = 0;
      output;
   end;
   end;
   end;
run;
proc sort data=base;
  by sname grade age;
run;

data transaction;
   informat sname $10.;
   input sname grade age absentee;
datalines;
Washington 1   5  3 
Washington 2   7  4 
Washington 5   10 1 
Lincoln 2   5  9 
Lincoln 2   7  18 
Lincoln 3   10 6
Jefferson 3 8 99
;
run;
proc sort data=transaction;
  by sname grade age;
run;


data daily;
   update base
          transaction
   ;
  by sname grade age;
run;

   

Note that the sets have to be sorted the same for update to work.

 

I appreciate that your combinations of age and grade are likely to be school dependent. Ensure your rules are enforced in the base data. One thing to be aware of is that if the school name spelling is not exactly the same in the transaction data set you have new schools in the results. Similarly new combinations of the values of other variables will be added to the result.

 

You could probably start on your BASE data set with a good enrollment count data set.

Jordan88
Obsidian | Level 7

Hmm, that is very helpful and a great place to start! Thank you. 

 

The "day" variable is in DATETIME20. format, so mm/dd/yyyy 00:00:00 AM, but the time portion is always 12:00:00 AM. (Why they use datetime instead of just date, I'm not sure). 

 

I wonder if there is a way to make a reference file for the school names instaed of listing them each out like that, because there are currently over 500 schools and more will be added in the future. 

 

And yes, you are right, I will need to see which schools have which grades because they vary. I'm assuming that the 

do sname = 'Washington','Lincoln','Jefferson';
   do grade = 1 to 6;
   do age   = 5 to 10;
      absentee = 0;

portion will make a combination of each school, age, and grade. I can play around with this, because tyically a given grade will have a range of 3 years of ages for example grade 6 should have age 5,6,7 etc.), so this may produce many entries/combinations that wouldn't exist in the actual data. 

 

I would also need to incorporate the date in there. Of course I woulnd't need it for weekends so that may be a pain. 

 

 

Reeza
Super User

If you have a large dataset, which it sounds like you do, you can look at the SPARSE option in PROC FREQ which will generate all possible combinations including your 0's. It may be too many or too little but you can filter it out.

 

 

ballardw
Super User

I work with my state's public school data a little bit. As far as the School and grade issues I would really hope that there is an ENROLLMENT data set somewhere, even last year's would help as a start. That should let you pull School and grade information with students actually enrolled.

 

Gender is no probem.

Age is likely grade dependent but I would hesitate to recommend a limits an ages. I would expect that there might be a guideline about that. I also wonder about age as component of this report as children's ages change during the year.

 

The date time value they send you can be easily reduced to just the date with the DATEPART function. The reason I consider most likely for the datetime is lazy database admin or programmer. If they aren't doing something like absent between 10am and 1Pm then the time component is useless. You don't need it, get rid of it.

 

If you can get separate datasets of some of these elements, such as your date you can combine them relatively easily. Here's an example:

data school;
   informat sname $10.;
   input sname grade;
datalines;
Washington 1
Washington 2
Washington 3
Jefferson  3
Jefferson  4
Jefferson  5
Jefferson  6
Lincoln 5
Lincoln 6
Lincoln 7
Lincoln 8
Adams  9
Adams  10
Adams  11
Adams  12
;

data sex;
   input sex $;
datalines;
Male
Female
;
run;



proc sql;
   create table base as
   select sname,sex,grade
   from school, sex
   order by sname,grade,sex;
quit;

You get a message from the Proc Sql along these lines:

 

NOTE: The execution of this query involves performing one or more Cartesian product joins that

can not be optimized.

That is basically what you are requesting combine everything and there really isn't a way to optimize that for this example.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1394 views
  • 1 like
  • 3 in conversation