BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
AnnaNZ
Quartz | Level 8

Hi

 

I have huge datasets (7 mil) and I need to find out how many times a client was placed. 

The client _id in this case is REFER_ID: 

Placement times being with BEG_RMVL_DT and end with END_RMVL_DT (these are datetime variables / numbers)

Every time there is a new Placement there is a new Placement number: RMVL_ID

 

So I could either work woth the placement beginning and end times, or simply with the PLacement number, bound or grouped by REFER_ID.

What I want to know is 

How many times was a client (REFER_ID ) placed, and that could have happened several times per client, once or not at all. 

 

My thinking was to create a new variable to say there was a new placement, more so as this data will later on be used for machine learning purposes. Yes, I could use RMVL_ID to cout the incidents with standard proceedures, but I really want a new column indicating when a new incident happenened, grouped by refer_id.

How would I do that? And / Or are there better ideas. 

Please find a test dataset attached. 

 

In a different, but similiar  case, I only have the datetimes and need to find out how often a new incident takes place, again grouped by Refer_ID or Cl_ID or even a specific Role.  How does it work there? 

 

Many thanks

 

 

REFER_IDRMVL_IDBEG_RMVL_DTEND_RMVL_DT
1800217   
1800217   
1800217   
1800217   
1800217   
1800217   
1800217   
1800219   
1800219   
1800219   
1800219   
1800219   
1800219   
1800221155976625Mar2007 0:00:0013Sep2007 0:00:00
1800221155976625Mar2007 0:00:0013Sep2007 0:00:00
18002215449715Nov1989 0:00:0013Jan1990 0:00:00
1800221155976625Mar2007 0:00:0013Sep2007 0:00:00
1800221155976625Mar2007 0:00:0013Sep2007 0:00:00
1800221155976625Mar2007 0:00:0013Sep2007 0:00:00
1800221155976625Mar2007 0:00:0013Sep2007 0:00:00
1800221155976625Mar2007 0:00:0013Sep2007 0:00:00
1800221155976625Mar2007 0:00:0013Sep2007 0:00:00
1800221155976625Mar2007 0:00:0013Sep2007 0:00:00
18002215449831Jul1991 0:00:0031Jul1991 0:00:00
1800221155976625Mar2007 0:00:0013Sep2007 0:00:00
1800221155976625Mar2007 0:00:0013Sep2007 0:00:00
1800221155976625Mar2007 0:00:0013Sep2007 0:00:00
1800221155976625Mar2007 0:00:0013Sep2007 0:00:00
1800221155976625Mar2007 0:00:0013Sep2007 0:00:00
1800221155976625Mar2007 0:00:0013Sep2007 0:00:00
1800221155976625Mar2007 0:00:0013Sep2007 0:00:00
1800221155976625Mar2007 0:00:0013Sep2007 0:00:00
1800221155976625Mar2007 0:00:0013Sep2007 0:00:00
1800221155976625Mar2007 0:00:0013Sep2007 0:00:00
1800221155976625Mar2007 0:00:0013Sep2007 0:00:00
1800221155976625Mar2007 0:00:0013Sep2007 0:00:00
1800221155976625Mar2007 0:00:0013Sep2007 0:00:00
1800221155976625Mar2007 0:00:0013Sep2007 0:00:00
1800221155976625Mar2007 0:00:0013Sep2007 0:00:00
1800221155976625Mar2007 0:00:0013Sep2007 0:00:00
1800221155976625Mar2007 0:00:0013Sep2007 0:00:00
1800221155976625Mar2007 0:00:0013Sep2007 0:00:00
1800221155976625Mar2007 0:00:0013Sep2007 0:00:00
1800221155976625Mar2007 0:00:0013Sep2007 0:00:00
1800221155976625Mar2007 0:00:0013Sep2007 0:00:00
1800221155976625Mar2007 0:00:0013Sep2007 0:00:00
1800221155976625Mar2007 0:00:0013Sep2007 0:00:00
1800221155976625Mar2007 0:00:0013Sep2007 0:00:00
1800221155976625Mar2007 0:00:0013Sep2007 0:00:00
1800221155976625Mar2007 0:00:0013Sep2007 0:00:00
1800221155976625Mar2007 0:00:0013Sep2007 0:00:00
1800222   
1800222   
1800222   
1800222   
1800222   
1800223   
1800223   
1800223   
1800223   
1800223   
1800223   
1800223   
1800223   
1800226   
1800226   
1800226   
1800226   
1800226   
1800226   
1800226   
1800226   
1800226   
1800226   
1800226   
1800226   
1800226   
1800226   
1800226   
1800226   
1800226   
1800226   
1800226   
1800226   
1800226   
1800226   
1800226   
1800226   
1800226   
1800226   
1800226   
1800226   
1800226   
1800226   
1800226   
1800226   
1800226   
1800226   
1800226   
1800226   
1800226   
1800226   
1800226   
1800226   
1800226   
1800226   
1800226   
1800226   
1800226   
1800226   
1800226   
1800226   
1800226   
1800226   
1800226   
1800226   
1800226   
1800226   
1800227154842403Feb2006 0:00:0009Feb2006 0:00:00
1800227154842403Feb2006 0:00:0009Feb2006 0:00:00
1800227154842403Feb2006 0:00:0009Feb2006 0:00:00
1800227154842403Feb2006 0:00:0009Feb2006 0:00:00
1800227154842403Feb2006 0:00:0009Feb2006 0:00:00
1800227154842403Feb2006 0:00:0009Feb2006 0:00:00
1800227154842403Feb2006 0:00:0009Feb2006 0:00:00
1800227154842403Feb2006 0:00:0009Feb2006 0:00:00
1800227154842403Feb2006 0:00:0009Feb2006 0:00:00
1800227154842403Feb2006 0:00:0009Feb2006 0:00:00
1800227154842403Feb2006 0:00:0009Feb2006 0:00:00
1800227154842403Feb2006 0:00:0009Feb2006 0:00:00
1800227154842403Feb2006 0:00:0009Feb2006 0:00:00
1800227154842403Feb2006 0:00:0009Feb2006 0:00:00
1800227154842403Feb2006 0:00:0009Feb2006 0:00:00
1800227154842403Feb2006 0:00:0009Feb2006 0:00:00
1800227154842403Feb2006 0:00:0009Feb2006 0:00:00
1800227154842403Feb2006 0:00:0009Feb2006 0:00:00
1800227154842403Feb2006 0:00:0009Feb2006 0:00:00
1800227154842403Feb2006 0:00:0009Feb2006 0:00:00
 
1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

I think this post should answer your questions:

https://stats.idre.ucla.edu/sas/faq/how-can-i-create-an-enumeration-variable-by-groups/

 

For BY group processing, in general, a data step is much easier to code than SQL. 

The SAS documentation has some good diagrams that illustrate BY group processing.

 

http://support.sas.com/documentation/cdl/en/lrcon/69852/HTML/default/viewer.htm#n01a08zkzy5igbn173zj...

 


@AnnaNZ wrote:

I think PROC SQL could be the way to go 

http://www2.sas.com/proceedings/sugi27/p191-27.pdf

 

CAn somebody help me though how to group by REFER_ID , create a variable called INCIDENT and create a condition, whereby every time that a new (BEG_RMVL_DT and END_RMVL_DT)   pair occurs within the same REFER_ID , there is a 1 (opposed to a 0)  in the  INCIDENT variable

 

That would be marvellous, thanks, 

 


 

View solution in original post

6 REPLIES 6
LinusH
Tourmaline | Level 20
Why do you need to add something to the detail data?
There are tons of ways in SAS to do counts: SQL, PROC MEANS/SUMMARY, PROC FREQ.. .
Data never sleeps
AnnaNZ
Quartz | Level 8

I know that,

but it has to be grouped by Refer_ID.

IN a diferent case and I wanted to add this to my question above, I only have datetimes and need to find out how often a new incident takes place.

So how can I get tehse two case working?

 

AnnaNZ
Quartz | Level 8

I think PROC SQL could be the way to go 

http://www2.sas.com/proceedings/sugi27/p191-27.pdf

 

CAn somebody help me though how to group by REFER_ID , create a variable called INCIDENT and create a condition, whereby every time that a new (BEG_RMVL_DT and END_RMVL_DT)   pair occurs within the same REFER_ID , there is a 1 (opposed to a 0)  in the  INCIDENT variable

 

That would be marvellous, thanks, 

 

Astounding
PROC Star

You're asking about fairly standard programming techniques.  For example:

 

proc sort data=have;

by refer_id beg_rmvl_dt;

run;

 

data want;

set have;

by refer_id beg_rmvl_dt;

incident = first.beg_rmvl_dt;

run;

 

It's up to you how you want to treat missing values for BEG_RMVL_DT.

 

If all you want is the counts of INCIDENT, then SQL might come into play.  But if you want to keep all the original observations, SQL becomes more complex.

Reeza
Super User

I think this post should answer your questions:

https://stats.idre.ucla.edu/sas/faq/how-can-i-create-an-enumeration-variable-by-groups/

 

For BY group processing, in general, a data step is much easier to code than SQL. 

The SAS documentation has some good diagrams that illustrate BY group processing.

 

http://support.sas.com/documentation/cdl/en/lrcon/69852/HTML/default/viewer.htm#n01a08zkzy5igbn173zj...

 


@AnnaNZ wrote:

I think PROC SQL could be the way to go 

http://www2.sas.com/proceedings/sugi27/p191-27.pdf

 

CAn somebody help me though how to group by REFER_ID , create a variable called INCIDENT and create a condition, whereby every time that a new (BEG_RMVL_DT and END_RMVL_DT)   pair occurs within the same REFER_ID , there is a 1 (opposed to a 0)  in the  INCIDENT variable

 

That would be marvellous, thanks, 

 


 

AnnaNZ
Quartz | Level 8

THat´s it thank you

 

  Proc sort data=   Kea.REFER_DT; by  RMVL_ID ; run; 
  Data Kea.REFERRAL; 
Set  Kea.REFER_DT; 
 count + 1;
by RMVL_ID;
if first.RMVL_ID then count = 1;
If RMVL_ID = . then count = 0;
run; 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 2069 views
  • 1 like
  • 4 in conversation