BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
urban58
Quartz | Level 8

Hello SAS Community,

 

I need to create an actual vs. expected enrollment graph, like the following

mclancy_0-1606320499875.png

In my dataset, dates (coming from a few different sources) have been condensed down to year/month - this is an example of the actual data 

 

idyear_monthhave_data
12020101
22019121
32019111
42019111
52020101
62019121
72020011
82020011

 

Target enrollment is 1900 participants (~4 participants/day), I do not have a date when the targeted enrollment is expected to be met

I started on trying to work on the expected numbers (excluding holidays, institution days off) which I can do looking back but as no final enrollment date I am stumped.
proc freq data=have;
table year_month / out=counts (keep=year_month count) nopercent nocum;
run;
 
For the graph, I think it's probably proc sgplot but how do I
 
1. include the counts from proc contents above and
2. how to include the expected of 1900 so the 2 lines are on the same graph
 
proc sgplot data=have
(where=(date <= "31OCT2020"d and have_data = 1));
title "Enrollment: Actual vs. Expected";
series x=year_month y=count;
series ???
run;
 
TIA,
Margaret
1 ACCEPTED SOLUTION

Accepted Solutions
snoopy369
Barite | Level 11

Given your exact requirements, it seems like you're going to have to figure out your expected day by day, or else be satisfied with an estimate.  Here's one example of day by day.  Note the use of the HOLIDAY function - you'll need to customize that to your specific needs, look up the function's documentation to see what the possible holidays are.  I don't know what you consider holidays in your situation.

 

Here I make a dummy dataset with respondent level data, then an expected dataset with day-level data, then combine them, fix the yearmonth variable to be consistent between the two, and then use PROC FREQ to get the CUM_FREQ variable which is what we want - cumulative enrollment.  I added the MARKERS option to show you how that works, but you don't have to use it.

 

data have;
  call streaminit(7);  
  do year_month = 202001 to 202011;
    do _x = 1 to floor(rand('Uniform',40,120));
      output;
    end;
  end;
  drop _x;
run;

data expected;
  do dt = '02JAN2020'd to '30NOV2020'd;
    if weekday(dt) in (2:6) then do;
      if ( dt eq holiday('mlk',2020)
      	or dt eq holiday('uspresidents',2020)
      	or dt eq holiday('memorial',2020)
      	or dt eq holiday('usindependence',2020)
      	or dt eq holiday('labor',2020)
      	or dt eq holiday('columbus',2020)
      	or dt eq holiday('veteransusg',2020)
      	or dt eq holiday('thanksgiving',2020)
      	or dt eq holiday('christmas',2020)
      ) then continue;
      do _i = 1 to 4;
        output;
      end;
    end;
  end;
  keep dt;
  format dt date9.;
run;

data for_plot;
  set have(in=_have) expected(in=_exp);
  if _have then group='A';
  else group='E';
  if group='A' then do;
  	year_month_dt = input(cats(year_month,'01'),yymmdd8.);
  end;
  else year_month_dt = dt;
  format year_month_dt yymmn6.;  	
run;

proc freq data=for_plot;
  by group;
  tables year_month_dt/out=freqs_for_plot outcum; 
run;

proc sgplot data=freqs_for_plot;
  vline year_month_dt /group=group response=cum_freq markers;
run;

And the chart it produces:

 

snoopy369_0-1606327895399.png

 

View solution in original post

9 REPLIES 9
ballardw
Super User

So what exactly in your data represents "actual" or "expected"?? I do not see anything to differentiate between the values.

 

Here is an example of what you date might look like to accomplish your desired graph using a generic x and y value.

data example;
   input x y z $;
datalines;
1 2 A
2 2 A
3 4 A
5 4 A
6 3 A
1 1 E
2 2 E
3 3 E
4 4 E
5 5 E
6 6 E
;

proc sgplot data=example;
   series x=x y=y /group=z;
run;

A group variable lets you display different properties such as A(ctual) and E(xpected) with different appearance and pretty simple code. BUT you have to include the group values for a variable.

snoopy369
Barite | Level 11

You could do this two ways - have Actual and Expected in two rows, or two variables; both have advantages.  Here's the two variables solution.  I also show how to generate your Expected - I assume "have" is a dataset you already have (your PROC FREQ output); the for_plot dataset adds the expected (it just adds 4 every time), and makes the year _month variable a proper date variable.

 

**have is just me making dummy data, this is output from PROC FREQ above;
data have;
  count=0;
  call streaminit(7);  
  do year_month = 202001 to 202011;
    count + floor(rand('Uniform',1,8));
    output;
  end;
run;

*here we add the Expected variable and reformat year_month;
data for_plot;
  set have;
  expected+4;
  year_month_dt = input(cats(year_month,'01'),yymmdd8.);
  format year_month_dt yymmn6.;
run;

*and here we plot, using series (or vline also is okay);
proc sgplot data=for_plot;
  series x=year_month_dt y=count /lineattrs=(color=red);
  series x=year_month_dt y=expected /lineattrs=(color=blue);
run;

Here is the output from that, from here you can change things to add formatting however you want.snoopy369_0-1606324880446.png

 

 

As I mentioned in your SASL post, you could also see this paper for more details on how to make this kind of plot:

https://www.pharmasug.org/proceedings/china2018/DV/Pharmasug-China-2018-DV34.pdf

urban58
Quartz | Level 8
Hello ballardw,



Unfortunately, my data is not so simple as your example solution.

For expected enrollment, I only a value of 1900 participants expected (~4/day*) - nothing more

*Mon-Fri, excluding institution holidays



For actual enrollment, this is how the data looks

data have;

input id year_month have_data ;

datalines;

1 202010 1

2 201912 1

3 201911 1

4 201911 1

5 202010 1

6 201912 1

7 202001 1

8 202001 1

;

run;



proc sgplot data=have

(where=( have_data = 1));

title "Enrollment: Actual vs. Expected";

series x=id y=year_month;

run;



graph looks strange and does not include expected.



Thanks,

Margaret




Reeza
Super User
Where is your expected data in the data set? How is SAS supposed to know about the 4 per day?
snoopy369
Barite | Level 11

Given your exact requirements, it seems like you're going to have to figure out your expected day by day, or else be satisfied with an estimate.  Here's one example of day by day.  Note the use of the HOLIDAY function - you'll need to customize that to your specific needs, look up the function's documentation to see what the possible holidays are.  I don't know what you consider holidays in your situation.

 

Here I make a dummy dataset with respondent level data, then an expected dataset with day-level data, then combine them, fix the yearmonth variable to be consistent between the two, and then use PROC FREQ to get the CUM_FREQ variable which is what we want - cumulative enrollment.  I added the MARKERS option to show you how that works, but you don't have to use it.

 

data have;
  call streaminit(7);  
  do year_month = 202001 to 202011;
    do _x = 1 to floor(rand('Uniform',40,120));
      output;
    end;
  end;
  drop _x;
run;

data expected;
  do dt = '02JAN2020'd to '30NOV2020'd;
    if weekday(dt) in (2:6) then do;
      if ( dt eq holiday('mlk',2020)
      	or dt eq holiday('uspresidents',2020)
      	or dt eq holiday('memorial',2020)
      	or dt eq holiday('usindependence',2020)
      	or dt eq holiday('labor',2020)
      	or dt eq holiday('columbus',2020)
      	or dt eq holiday('veteransusg',2020)
      	or dt eq holiday('thanksgiving',2020)
      	or dt eq holiday('christmas',2020)
      ) then continue;
      do _i = 1 to 4;
        output;
      end;
    end;
  end;
  keep dt;
  format dt date9.;
run;

data for_plot;
  set have(in=_have) expected(in=_exp);
  if _have then group='A';
  else group='E';
  if group='A' then do;
  	year_month_dt = input(cats(year_month,'01'),yymmdd8.);
  end;
  else year_month_dt = dt;
  format year_month_dt yymmn6.;  	
run;

proc freq data=for_plot;
  by group;
  tables year_month_dt/out=freqs_for_plot outcum; 
run;

proc sgplot data=freqs_for_plot;
  vline year_month_dt /group=group response=cum_freq markers;
run;

And the chart it produces:

 

snoopy369_0-1606327895399.png

 

urban58
Quartz | Level 8

Wow, I ran your code Snoopy and loved the graphs – need to digest but thanks so much.

While you were doing that, I was working on coming up with my expected – not totally finished – need to exclude a few more working days – put it at the end.

 

In your code

do _i = 1 to 4; - is this where you assigned 4 participants for each valid date?

If I change do dt = '02JAN2020'd to '30NOV2020'd; to a later date (I’ve calculated that at the rate of 4 ppts/day, enrolled would be expected to be done by September 2021) will the expected line keep going while the actual will stop?


What does this mean? if weekday(dt) in (2:6) then do;

Many thanks to all for your help, I need to read ALL the answers carefully but realize the holiday is here and you will be off to better things.

Margaret

 

data target (keep=begin);

format begin date9.;

array holidays(10);

do date = '22OCT2019'd to '31OCT2021'd ;

if date eq intnx('year',date,0,'b') then do;

call missing(of holidays(*));

i=0;

end;

if date eq holiday('NEWYEAR', year(date)) or

date eq holiday('MLK',year(date)) or

date eq holiday('USPRESIDENTS',year(date)) or

date eq holiday('MEMORIAL', year(date)) or

date eq holiday('USINDEPENDENCE',year(date)) or

date eq holiday('LABOR',year(date)) or

date eq holiday('COLUMBUS',year(date)) or

date eq holiday('VETERANSUSG',year(date)) or

date eq holiday('THANKSGIVING', year(date)) or

date eq holiday('CHRISTMAS', year(date)) then do;

i+1;

holidays(i)=date;

end;

if not(date in holidays or weekday(date) in (1,7)) then do;

season=1;

begin=date;

output;

end;

end;

run;

data target2; *n=254 days;

set target;

if begin in('23DEC2019'd,'24DEC2019'd,'26DEC2019'd,'27DEC2019'd,'30DEC2019'd,'31DEC2019'd) then DELETE; *need to also add days to remove for 2020;

run;

 

options intervalds=(workdays=target2);

 

data expected;

days=intck('workdays','22OCT2019'd,'30SEP2021'd);

run;

ballardw
Super User

@urban58 wrote:

 


What does this mean? if weekday(dt) in (2:6) then do;

 


WEEKDAY is a SAS function that returns a number of the day of a week from a date value with 1= Sunday through 7=Saturday. The IN operator compares the value before IN to a list of items in parentheses. The 2:6 is a shorthand list of all integers 2 through 6. So checking to see if the date is Monday through Friday.

 

 

Reeza
Super User
You can use the LINEPARM statement to graph the expected line if you know the intercept and slope (~4/day or ~120/month).
If you want it to be daily to account for the different number of days in a month you'll need to pre-create the data that includes the expected value so you can graph it as well.
snoopy369
Barite | Level 11
LINEPARM is a good suggestion. Another possibility is, in addition to the expected line, to graph a REFLINE at exactly 1900 (straight horizontal), to see what the final target is since you say you don't have a specific day you expect to hit it at.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 2672 views
  • 2 likes
  • 4 in conversation