10-29-2013 12:47 PM
I have a quick question. I need a sas code for this analysis and looking for your help. Hope you would help me.
I have a monthly data of people receiving some service starting July of 2010 and forward until October 2013. As some are coming, some are leaving too. What I want to find is how many exactly are new comers to the program? My assumption is:
For new entry: Anyone who has used the service in any of last 12 months, he is not a new comer. IF he has not used the service for last 12 months but seen to be using service this month, he is a new comer for this month. So, my analysis will start from 2011 July since I have a lag time of 12 months (and my data starts from 2010 July). So, I want how many are new comers every month to the program.
For exit: Anyone who has used the service in last 12 months, but we do not see any service for next 12 months, then in the 13th month, we claim him as exit. This also has a lg time of 12 months. So, I want how many exit every month.
Then I can add new comers and subtract exit to come up with some exact number of new recipients in the program. I hope you understand what I mean. Let me know if you can help me with coding for this issue. Also, let me know if you do not understand what I mean. I will be waiting for your reply. Thanks.
10-29-2013 01:21 PM
My data is monthly data...and its not only males....i just referred to 'he' for an ease and like u said to refer a single client...My data has ID number for each recipient and ID number is same whenever he appears..no matter what. Hope you understand. Pls help. Thanks.
10-29-2013 01:30 PM
I don't understand. You need to provide your data structure.
I have a monthly file and get a file every month so I'll be working with 24 files for this example but it will change every month, adding in a new file.
The file names are along the lines of Monthly2010_01, Monthly2010_02...etc
The file has 3 fields as follows:
client id - unique client identifier
Date_enrolled - Field to indicate the date the client enrolled
Date_exited -date field to indicate the date the client exited the program.
10-29-2013 01:31 PM
PS. You may not be in North America, but I believe the preference, and definitely my preference, is to use Client or Customer instead of s/he, more professional and courteous.
10-29-2013 01:34 PM
I have a quick question.
I hope you realize this won't be a quick answer
What I would do is a cartesian join of all months in the study, and all ID numbers, probably using PROC SQL.
From there, you would need to assign a zero if the user did not use the service in a given month, and a 1 if the user did use the service in a given month. Then, determine the sum of these 0/1 variables over the previous 12 months, probably using the LAG function. If the previous 12 months sum to zero, and the current month is a 1, then the person is a new entry. If the previous 12 months sum to zero and the current month is also a zero, then the person has exited the service. Take care so that you aren't summing the months of a previous user ID.
10-29-2013 01:45 PM
Thanks Reeza and Paige for your comments and suggestions. Yeah, I will be using customer and customer ID in my further correspondence.
My data structure is:
Date Customer ID
Hope I am able to clraify this. Please let me know. Thanks.
10-29-2013 02:02 PM
PaigeMiller: What do you want me to join? This is a single table with 2 columns. One with date and another with customer ID.
Suppose there are 20 customers in July 2010. In August, 2010, if there are 24 customers, out of which 16 are the same customers that were in July, 14 new came in and 6 left (16+14-6=24). So, i need to know how to find this 14 new and 6 who left. New and exit criteria are based on assumptions i have stated in my first question. Thanks in advance for your help.
10-29-2013 02:18 PM
You perform a Cartesian join of the date column with the ID column, so you now have all possible combinations of ID and date ... then merge it with the original data set (like the one shown in reply #6), so you can now count how many times in the previous 12 months the service was used for a given user ID.
10-29-2013 02:18 PM
Use proc transpose on your data, so it becomes:
Client Month1 month2 month3 month4
Then use a data step with an array to calculate what you you need.
You've posted what your data looks like, but not how you want your output to look like...are you calculating the stats per month for the 12 months?
10-29-2013 02:47 PM
Just to be 100% clear, Reeza and I have two different and mutually exclusive solutions here. Both should work. Both require a bit of programming. Pick one and go with it.
P.S.: I like my solution better
10-29-2013 03:27 PM
First, Thanks that you did understand my problem and also got better idea what I am expecting as an output. However, I could not figure out yet how to perform a Cartesian join of the date column with the ID column, so that I have all possible combinations of ID and date. I know you might be thinking that you replied a dumb student's question (hey i am a college student)....but it is what it is...please help me how to code the join as well. Thanks again.
Reeza, I will follow your step as well later. First, let me work with Paige's approach. Thanks.
10-29-2013 03:36 PM
create table distinct_date as select distinct date from whatever;
create table distinct_id as select distinct customerID from whatever;
create table cartesian_join as select distinct_date.date,distinct_ID.customerID from distinct_date,distinct_id
order by distinct_ID.customerID,distinct_date.date;
This assumes that the dates are going to sort properly (i.e. if they are text strings they will not sort properly, they will sort alphabetically and APR will come before MAR and so on; if they are numerical as shown in reply #6 this would not be a problem)
10-29-2013 03:42 PM
I think my solutions better.
@adhi2828 I would recommend learning how to properly ask a question, both on here and to potential colleagues/employers. There are people on here who ask questions that I would NEVER want to work with, based on their communication skills. Its also a better use of your time and ours. If you take the time to formulate your question properly you're more likely to get better responses faster and the same is true in a work environment.
Speaking of, you still haven't fully explained what you want your result to be, so Paige is still assuming things and that his solution will work. I am assuming different things as you can see from my initial assumptions about your data. I generally don't like to post any code until I see what the solution should be, otherwise its pretty much a waste of typing