Hi
I have huge datasets (7 mil) and I need to find out how many times a client was placed.
The client _id in this case is REFER_ID:
Placement times being with BEG_RMVL_DT and end with END_RMVL_DT (these are datetime variables / numbers)
Every time there is a new Placement there is a new Placement number: RMVL_ID
So I could either work woth the placement beginning and end times, or simply with the PLacement number, bound or grouped by REFER_ID.
What I want to know is
How many times was a client (REFER_ID ) placed, and that could have happened several times per client, once or not at all.
My thinking was to create a new variable to say there was a new placement, more so as this data will later on be used for machine learning purposes. Yes, I could use RMVL_ID to cout the incidents with standard proceedures, but I really want a new column indicating when a new incident happenened, grouped by refer_id.
How would I do that? And / Or are there better ideas.
Please find a test dataset attached.
In a different, but similiar case, I only have the datetimes and need to find out how often a new incident takes place, again grouped by Refer_ID or Cl_ID or even a specific Role. How does it work there?
Many thanks
REFER_ID | RMVL_ID | BEG_RMVL_DT | END_RMVL_DT |
1800217 | |||
1800217 | |||
1800217 | |||
1800217 | |||
1800217 | |||
1800217 | |||
1800217 | |||
1800219 | |||
1800219 | |||
1800219 | |||
1800219 | |||
1800219 | |||
1800219 | |||
1800221 | 1559766 | 25Mar2007 0:00:00 | 13Sep2007 0:00:00 |
1800221 | 1559766 | 25Mar2007 0:00:00 | 13Sep2007 0:00:00 |
1800221 | 54497 | 15Nov1989 0:00:00 | 13Jan1990 0:00:00 |
1800221 | 1559766 | 25Mar2007 0:00:00 | 13Sep2007 0:00:00 |
1800221 | 1559766 | 25Mar2007 0:00:00 | 13Sep2007 0:00:00 |
1800221 | 1559766 | 25Mar2007 0:00:00 | 13Sep2007 0:00:00 |
1800221 | 1559766 | 25Mar2007 0:00:00 | 13Sep2007 0:00:00 |
1800221 | 1559766 | 25Mar2007 0:00:00 | 13Sep2007 0:00:00 |
1800221 | 1559766 | 25Mar2007 0:00:00 | 13Sep2007 0:00:00 |
1800221 | 1559766 | 25Mar2007 0:00:00 | 13Sep2007 0:00:00 |
1800221 | 54498 | 31Jul1991 0:00:00 | 31Jul1991 0:00:00 |
1800221 | 1559766 | 25Mar2007 0:00:00 | 13Sep2007 0:00:00 |
1800221 | 1559766 | 25Mar2007 0:00:00 | 13Sep2007 0:00:00 |
1800221 | 1559766 | 25Mar2007 0:00:00 | 13Sep2007 0:00:00 |
1800221 | 1559766 | 25Mar2007 0:00:00 | 13Sep2007 0:00:00 |
1800221 | 1559766 | 25Mar2007 0:00:00 | 13Sep2007 0:00:00 |
1800221 | 1559766 | 25Mar2007 0:00:00 | 13Sep2007 0:00:00 |
1800221 | 1559766 | 25Mar2007 0:00:00 | 13Sep2007 0:00:00 |
1800221 | 1559766 | 25Mar2007 0:00:00 | 13Sep2007 0:00:00 |
1800221 | 1559766 | 25Mar2007 0:00:00 | 13Sep2007 0:00:00 |
1800221 | 1559766 | 25Mar2007 0:00:00 | 13Sep2007 0:00:00 |
1800221 | 1559766 | 25Mar2007 0:00:00 | 13Sep2007 0:00:00 |
1800221 | 1559766 | 25Mar2007 0:00:00 | 13Sep2007 0:00:00 |
1800221 | 1559766 | 25Mar2007 0:00:00 | 13Sep2007 0:00:00 |
1800221 | 1559766 | 25Mar2007 0:00:00 | 13Sep2007 0:00:00 |
1800221 | 1559766 | 25Mar2007 0:00:00 | 13Sep2007 0:00:00 |
1800221 | 1559766 | 25Mar2007 0:00:00 | 13Sep2007 0:00:00 |
1800221 | 1559766 | 25Mar2007 0:00:00 | 13Sep2007 0:00:00 |
1800221 | 1559766 | 25Mar2007 0:00:00 | 13Sep2007 0:00:00 |
1800221 | 1559766 | 25Mar2007 0:00:00 | 13Sep2007 0:00:00 |
1800221 | 1559766 | 25Mar2007 0:00:00 | 13Sep2007 0:00:00 |
1800221 | 1559766 | 25Mar2007 0:00:00 | 13Sep2007 0:00:00 |
1800221 | 1559766 | 25Mar2007 0:00:00 | 13Sep2007 0:00:00 |
1800221 | 1559766 | 25Mar2007 0:00:00 | 13Sep2007 0:00:00 |
1800221 | 1559766 | 25Mar2007 0:00:00 | 13Sep2007 0:00:00 |
1800221 | 1559766 | 25Mar2007 0:00:00 | 13Sep2007 0:00:00 |
1800221 | 1559766 | 25Mar2007 0:00:00 | 13Sep2007 0:00:00 |
1800221 | 1559766 | 25Mar2007 0:00:00 | 13Sep2007 0:00:00 |
1800222 | |||
1800222 | |||
1800222 | |||
1800222 | |||
1800222 | |||
1800223 | |||
1800223 | |||
1800223 | |||
1800223 | |||
1800223 | |||
1800223 | |||
1800223 | |||
1800223 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800226 | |||
1800227 | 1548424 | 03Feb2006 0:00:00 | 09Feb2006 0:00:00 |
1800227 | 1548424 | 03Feb2006 0:00:00 | 09Feb2006 0:00:00 |
1800227 | 1548424 | 03Feb2006 0:00:00 | 09Feb2006 0:00:00 |
1800227 | 1548424 | 03Feb2006 0:00:00 | 09Feb2006 0:00:00 |
1800227 | 1548424 | 03Feb2006 0:00:00 | 09Feb2006 0:00:00 |
1800227 | 1548424 | 03Feb2006 0:00:00 | 09Feb2006 0:00:00 |
1800227 | 1548424 | 03Feb2006 0:00:00 | 09Feb2006 0:00:00 |
1800227 | 1548424 | 03Feb2006 0:00:00 | 09Feb2006 0:00:00 |
1800227 | 1548424 | 03Feb2006 0:00:00 | 09Feb2006 0:00:00 |
1800227 | 1548424 | 03Feb2006 0:00:00 | 09Feb2006 0:00:00 |
1800227 | 1548424 | 03Feb2006 0:00:00 | 09Feb2006 0:00:00 |
1800227 | 1548424 | 03Feb2006 0:00:00 | 09Feb2006 0:00:00 |
1800227 | 1548424 | 03Feb2006 0:00:00 | 09Feb2006 0:00:00 |
1800227 | 1548424 | 03Feb2006 0:00:00 | 09Feb2006 0:00:00 |
1800227 | 1548424 | 03Feb2006 0:00:00 | 09Feb2006 0:00:00 |
1800227 | 1548424 | 03Feb2006 0:00:00 | 09Feb2006 0:00:00 |
1800227 | 1548424 | 03Feb2006 0:00:00 | 09Feb2006 0:00:00 |
1800227 | 1548424 | 03Feb2006 0:00:00 | 09Feb2006 0:00:00 |
1800227 | 1548424 | 03Feb2006 0:00:00 | 09Feb2006 0:00:00 |
1800227 | 1548424 | 03Feb2006 0:00:00 | 09Feb2006 0:00:00 |
I think this post should answer your questions:
https://stats.idre.ucla.edu/sas/faq/how-can-i-create-an-enumeration-variable-by-groups/
For BY group processing, in general, a data step is much easier to code than SQL.
The SAS documentation has some good diagrams that illustrate BY group processing.
@AnnaNZ wrote:
I think PROC SQL could be the way to go
http://www2.sas.com/proceedings/sugi27/p191-27.pdf
CAn somebody help me though how to group by REFER_ID , create a variable called INCIDENT and create a condition, whereby every time that a new (BEG_RMVL_DT and END_RMVL_DT) pair occurs within the same REFER_ID , there is a 1 (opposed to a 0) in the INCIDENT variable
That would be marvellous, thanks,
I know that,
but it has to be grouped by Refer_ID.
IN a diferent case and I wanted to add this to my question above, I only have datetimes and need to find out how often a new incident takes place.
So how can I get tehse two case working?
I think PROC SQL could be the way to go
http://www2.sas.com/proceedings/sugi27/p191-27.pdf
CAn somebody help me though how to group by REFER_ID , create a variable called INCIDENT and create a condition, whereby every time that a new (BEG_RMVL_DT and END_RMVL_DT) pair occurs within the same REFER_ID , there is a 1 (opposed to a 0) in the INCIDENT variable
That would be marvellous, thanks,
You're asking about fairly standard programming techniques. For example:
proc sort data=have;
by refer_id beg_rmvl_dt;
run;
data want;
set have;
by refer_id beg_rmvl_dt;
incident = first.beg_rmvl_dt;
run;
It's up to you how you want to treat missing values for BEG_RMVL_DT.
If all you want is the counts of INCIDENT, then SQL might come into play. But if you want to keep all the original observations, SQL becomes more complex.
I think this post should answer your questions:
https://stats.idre.ucla.edu/sas/faq/how-can-i-create-an-enumeration-variable-by-groups/
For BY group processing, in general, a data step is much easier to code than SQL.
The SAS documentation has some good diagrams that illustrate BY group processing.
@AnnaNZ wrote:
I think PROC SQL could be the way to go
http://www2.sas.com/proceedings/sugi27/p191-27.pdf
CAn somebody help me though how to group by REFER_ID , create a variable called INCIDENT and create a condition, whereby every time that a new (BEG_RMVL_DT and END_RMVL_DT) pair occurs within the same REFER_ID , there is a 1 (opposed to a 0) in the INCIDENT variable
That would be marvellous, thanks,
THat´s it thank you
Proc sort data= Kea.REFER_DT; by RMVL_ID ; run; Data Kea.REFERRAL; Set Kea.REFER_DT; count + 1; by RMVL_ID; if first.RMVL_ID then count = 1; If RMVL_ID = . then count = 0; run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Lock in the best rate now before the price increases on April 1.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.