Hi,
I'm trying to get if someone is enrolled in a program for each month from Jan 2020 to Feb 2021. So I'm interested in an array with month_1 to month_14.
The data is something like this:
ID Start_date end_date
011 1/1/1996 5/3/2020
003 2/1/2020 12/1/2020
So typically the output I'm looking for is
ID Start_date end_date month_1 month_2 month_3 month_4...............................
011 1/1/1996 5/3/2020 1 1 1 0 0 0
003 2/1/2020 12/1/2020 0 1 1 1 1 1
Would love some help on this!
@rdum96 wrote:
Hi,
I'm trying to get if someone is enrolled in a program for each month from Jan 2020 to Feb 2021. So I'm interested in an array with month_1 to month_14.
The data is something like this:
ID Start_date end_date
011 1/1/1996 5/3/2020
003 2/1/2020 12/1/2020
So typically the output I'm looking for is
ID Start_date end_date month_1 month_2 month_3 month_4...............................
011 1/1/1996 5/3/2020 1 1 1 0 0 0
003 2/1/2020 12/1/2020 0 1 1 1 1 1
Would love some help on this!
This is an incredibly poor choice of data layout, that will make your programming more difficult; and almost never necessary. No array is necessary to determine if "someone is enrolled in a program for each month from Jan 2020 to Feb 2021".
To answer that qestion, how about this:
data want;
set have;
if start_date<='01JAN2020'd and end_date>='01FEB2021'd then result=1;
else result=0
run;
Add an actual DATE variable. Then use a proper date format to make nice appearing values.
This creates a Have data that looks like your "example" (Hint: data steps are the best way to show values so we can code against them.)
Then adds a date that is the first day of the month in the interval you asked for.
Since your "month" values contained 0/1 coded values I have added a random value that takes 0/1 values.
Then show use of Proc Summary to create a data set for the sum of 1's which I think you imply as the desired result.
Proc Report or Tabulate could make nice tables of either the sums or the counts for each of 1 and 0.
data have; input ID $ Start_date :mmddyy10. end_date :mmddyy10.; format start_date end_date mmddyy10.; datalines; 011 1/1/1996 5/3/2020 003 2/1/2020 12/1/2020 ; data want; set have; do date = '01Jan2020'd to '01Feb2021'd ; randomvalue = rand('bernoulli',.5); output; date=intnx('month',date,1,'B'); end; format date yymon7.; run; /* how to sum (count) 1/0 variable by calendar month*/ proc summary data=want nway; class date; format date yymon7.; var randomvalue; output out=summary (drop=_:) sum=sumrandvalues; run;
When dealing with dates it is almost always a good idea to use actual SAS date values. Formats can create groups that will be honored by analysis, reporting or graphing procedures. The functions INTNX and INTCK can increment or determine intervals between dates (and times or datetimes).
https://communities.sas.com/t5/SAS-Communities-Library/Working-with-Dates-and-Times-in-SAS-Tutorial/... has a PDF with much information about dates.
@rdum96 wrote:
I'm trying to get a total count for each month. I'm new to this so I'm prolly not thinking the right way. Would appreciate your help
Could you please state clearly the desired output? Could you please give a realistic example of the input data set (not just two records) that covers the real-world examples that might occur in your data, and the output from that example data?
Again, arrays are not the way to get total counts by month.
@rdum96 wrote:
I'm trying to get a total count for each month. I'm new to this so I'm prolly not thinking the right way. Would appreciate your help
That's why you want a long dataset (Maxim 19):
data want;
set have;
month = start_date;
format month yymmd7.;
do while (month le end_date);
output;
month = intnx('month',month,1,'s');
end;
keep id month;
run;
From that, it is now easy to get a count of enrolled ID's for a given time period:
proc sql;
create table enrolled as
select
month,
count(id) as count
from want
where '01jan2020'd le month le '01feb2021'd
group by month
;
quit;
or
proc freq data=want;
where '01jan2020'd le month le '01feb2021'd;
tables month / nopercent nocum out=enrolled;
run;
@rdum96 wrote:
So I would need to include people who joined somewhere in the middle of the year or left in the middle too. I think this would include only people enrolled during the entire spell?
I don't know what this means, or how it relates to the data you have already provided. Please provide example data that covers the real world situations that you are likely to have, and the desired output from this data.
What do expect as final result?
data have;
input ID Start_date : ddmmyy10. end_date : ddmmyy10.;
format Start_date end_date ddmmyy10.;
cards;
011 1/1/1996 5/3/2020
003 2/1/2020 12/1/2020
;
data want;
set have;
array x{*} Jan2020 Feb2020 Mar2020 Apr2020;
do j=1 to dim(x);
x{j}=0;
end;
do i= Start_date to end_date;
do j=1 to dim(x);
if upcase(put(i,monyy7.))=upcase(vname(x{j})) then x{j}=1;
end;
end;
drop i j;
run;
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.