Hi
I have huge datasets (7 mil) and I need to find out how many times a client was placed.
The client _id in this case is REFER_ID:
Placement times being with BEG_RMVL_DT and end with END_RMVL_DT (these are datetime variables / numbers)
Every time there is a new Placement there is a new Placement number: RMVL_ID
So I could either work woth the placement beginning and end times, or simply with the PLacement number, bound or grouped by REFER_ID.
What I want to know is
How many times was a client (REFER_ID ) placed, and that could have happened several times per client, once or not at all.
My thinking was to create a new variable to say there was a new placement, more so as this data will later on be used for machine learning purposes. Yes, I could use RMVL_ID to cout the incidents with standard proceedures, but I really want a new column indicating when a new incident happenened, grouped by refer_id.
How would I do that? And / Or are there better ideas.
Please find a test dataset attached.
In a different, but similiar case, I only have the datetimes and need to find out how often a new incident takes place, again grouped by Refer_ID or Cl_ID or even a specific Role. How does it work there?
Many thanks
REFER_ID | RMVL_ID | BEG_RMVL_DT | END_RMVL_DT |
1800217 | |||
1800217 | |||
1800217 | |||
1800217 | |||
1800217 | |||
1800217 | |||
1800217 | |||
1800219 | |||
1800219 | |||
1800219 | |||
1800219 | |||
1800219 | |||
1800219 | |||
1800221 | 1559766 | 25Mar2007 0:00:00 | 13Sep2007 0:00:00 |
1800221 | 1559766 | 25Mar2007 0:00:00 | 13Sep2007 0:00:00 |
1800221 | 54497 | 15Nov1989 0:00:00 | 13Jan1990 0:00:00 |
1800221 | 1559766 | 25Mar2007 0:00:00 | 13Sep2007 0:00:00 |
1800221 | 1559766 | 25Mar2007 0:00:00 | 13Sep2007 0:00:00 |
1800221 | 1559766 | 25Mar2007 0:00:00 | 13Sep2007 0:00:00 |
1800221 | 1559766 | 25Mar2007 0:00:00 | 13Sep2007 0:00:00 |
1800221 | 1559766 | 25Mar2007 0:00:00 | 13Sep2007 0:00:00 |
1800221 | 1559766 | 25Mar2007 0:00:00 | 13Sep2007 0:00:00 |
1800221 | 1559766 | 25Mar2007 0:00:00 | 13Sep2007 0:00:00 |
1800221 | 54498 | 31Jul1991 0:00:00 | 31Jul1991 0:00:00 |
1800221 | 1559766 | 25Mar2007 0:00:00 | 13Sep2007 0:00:00 |
1800221 | 1559766 | 25Mar2007 0:00:00 | 13Sep2007 0:00:00 |
1800221 | 1559766 | 25Mar2007 0:00:00 | 13Sep2007 0:00:00 |
1800221 | 1559766 | 25Mar2007 0:00:00 | 13Sep2007 0:00:00 |
1800221 | 1559766 | 25Mar2007 0:00:00 | 13Sep2007 0:00:00 |
1800221 | 1559766 | 25Mar2007 0:00:00 | 13Sep2007 0:00:00 |
1800221 | 1559766 | 25Mar2007 0:00:00 | 13Sep2007 0:00:00 |
1800221 | 1559766 | 25Mar2007 0:00:00 | 13Sep2007 0:00:00 |
1800221 | 1559766 | 25Mar2007 0:00:00 | 13Sep2007 0:00:00 |
1800221 | 1559766 | 25Mar2007 0:00:00 | 13Sep2007 0:00:00 |
1800221 | 1559766 | 25Mar2007 0:00:00 | 13Sep2007 0:00:00 |
1800221 | 1559766 | 25Mar2007 0:00:00 | 13Sep2007 0:00:00 |
1800221 | 1559766 | 25Mar2007 0:00:00 | 13Sep2007 0:00:00 |
1800221 | 1559766 | 25Mar2007 0:00:00 | 13Sep2007 0:00:00 |
1800221 | 1559766 | 25Mar2007 0:00:00 | 13Sep2007 0:00:00 |
1800221 | 1559766 | 25Mar2007 0:00:00 | 13Sep2007 0:00:00 |
1800221 | 1559766 | 25Mar2007 0:00:00 | 13Sep2007 0:00:00 |
1800221 | 1559766 | 25Mar2007 0:00:00 | 13Sep2007 0:00:00 |
1800221 | 1559766 | 25Mar2007 0:00:00 | 13Sep2007 0:00:00 |
1800221 | 1559766 | 25Mar2007 0:00:00 | 13Sep2007 0:00:00 |
1800221 | 1559766 | 25Mar2007 0:00:00 | 13Sep2007 0:00:00 |
1800221 | 1559766 | 25Mar2007 0:00:00 | 13Sep2007 0:00:00 |
1800221 | 1559766 | 25Mar2007 0:00:00 | 13Sep2007 0:00:00 |
1800221 | 1559766 | 25Mar2007 0:00:00 | 13Sep2007 0:00:00 |
1800221 | 1559766 | 25Mar2007 0:00:00 | 13Sep2007 0:00:00 |
1800221 | 1559766 | 25Mar2007 0:00:00 | 13Sep2007 0:00:00 |
1800221 | 1559766 | 25Mar2007 0:00:00 | 13Sep2007 0:00:00 |
1800222 | |||
1800222 | |||
1800222 | |||
1800222 | |||
1800222 | |||
1800223 | |||
1800223 | |||
1800223 | |||
1800223 | |||
1800223 | |||
1800223 | |||
1800223 | |||
1800223 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800227 | 1548424 | 03Feb2006 0:00:00 | 09Feb2006 0:00:00 |
1800227 | 1548424 | 03Feb2006 0:00:00 | 09Feb2006 0:00:00 |
1800227 | 1548424 | 03Feb2006 0:00:00 | 09Feb2006 0:00:00 |
1800227 | 1548424 | 03Feb2006 0:00:00 | 09Feb2006 0:00:00 |
1800227 | 1548424 | 03Feb2006 0:00:00 | 09Feb2006 0:00:00 |
1800227 | 1548424 | 03Feb2006 0:00:00 | 09Feb2006 0:00:00 |
1800227 | 1548424 | 03Feb2006 0:00:00 | 09Feb2006 0:00:00 |
1800227 | 1548424 | 03Feb2006 0:00:00 | 09Feb2006 0:00:00 |
1800227 | 1548424 | 03Feb2006 0:00:00 | 09Feb2006 0:00:00 |
1800227 | 1548424 | 03Feb2006 0:00:00 | 09Feb2006 0:00:00 |
1800227 | 1548424 | 03Feb2006 0:00:00 | 09Feb2006 0:00:00 |
1800227 | 1548424 | 03Feb2006 0:00:00 | 09Feb2006 0:00:00 |
1800227 | 1548424 | 03Feb2006 0:00:00 | 09Feb2006 0:00:00 |
1800227 | 1548424 | 03Feb2006 0:00:00 | 09Feb2006 0:00:00 |
1800227 | 1548424 | 03Feb2006 0:00:00 | 09Feb2006 0:00:00 |
1800227 | 1548424 | 03Feb2006 0:00:00 | 09Feb2006 0:00:00 |
1800227 | 1548424 | 03Feb2006 0:00:00 | 09Feb2006 0:00:00 |
1800227 | 1548424 | 03Feb2006 0:00:00 | 09Feb2006 0:00:00 |
1800227 | 1548424 | 03Feb2006 0:00:00 | 09Feb2006 0:00:00 |
1800227 | 1548424 | 03Feb2006 0:00:00 | 09Feb2006 0:00:00 |
I think this post should answer your questions:
https://stats.idre.ucla.edu/sas/faq/how-can-i-create-an-enumeration-variable-by-groups/
For BY group processing, in general, a data step is much easier to code than SQL.
The SAS documentation has some good diagrams that illustrate BY group processing.
@AnnaNZ wrote:
I think PROC SQL could be the way to go
http://www2.sas.com/proceedings/sugi27/p191-27.pdf
CAn somebody help me though how to group by REFER_ID , create a variable called INCIDENT and create a condition, whereby every time that a new (BEG_RMVL_DT and END_RMVL_DT) pair occurs within the same REFER_ID , there is a 1 (opposed to a 0) in the INCIDENT variable
That would be marvellous, thanks,
I know that,
but it has to be grouped by Refer_ID.
IN a diferent case and I wanted to add this to my question above, I only have datetimes and need to find out how often a new incident takes place.
So how can I get tehse two case working?
I think PROC SQL could be the way to go
http://www2.sas.com/proceedings/sugi27/p191-27.pdf
CAn somebody help me though how to group by REFER_ID , create a variable called INCIDENT and create a condition, whereby every time that a new (BEG_RMVL_DT and END_RMVL_DT) pair occurs within the same REFER_ID , there is a 1 (opposed to a 0) in the INCIDENT variable
That would be marvellous, thanks,
You're asking about fairly standard programming techniques. For example:
proc sort data=have;
by refer_id beg_rmvl_dt;
run;
data want;
set have;
by refer_id beg_rmvl_dt;
incident = first.beg_rmvl_dt;
run;
It's up to you how you want to treat missing values for BEG_RMVL_DT.
If all you want is the counts of INCIDENT, then SQL might come into play. But if you want to keep all the original observations, SQL becomes more complex.
I think this post should answer your questions:
https://stats.idre.ucla.edu/sas/faq/how-can-i-create-an-enumeration-variable-by-groups/
For BY group processing, in general, a data step is much easier to code than SQL.
The SAS documentation has some good diagrams that illustrate BY group processing.
@AnnaNZ wrote:
I think PROC SQL could be the way to go
http://www2.sas.com/proceedings/sugi27/p191-27.pdf
CAn somebody help me though how to group by REFER_ID , create a variable called INCIDENT and create a condition, whereby every time that a new (BEG_RMVL_DT and END_RMVL_DT) pair occurs within the same REFER_ID , there is a 1 (opposed to a 0) in the INCIDENT variable
That would be marvellous, thanks,
THat´s it thank you
Proc sort data= Kea.REFER_DT; by RMVL_ID ; run; Data Kea.REFERRAL; Set Kea.REFER_DT; count + 1; by RMVL_ID; if first.RMVL_ID then count = 1; If RMVL_ID = . then count = 0; run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.