Solved: create new variable when group-statement met

AnnaNZ · Posted 07-08-2017 12:23 AM

Hi

I have huge datasets (7 mil) and I need to find out how many times a client was placed.

The client _id in this case is REFER_ID:

Placement times being with BEG_RMVL_DT and end with END_RMVL_DT (these are datetime variables / numbers)

Every time there is a new Placement there is a new Placement number: RMVL_ID

So I could either work woth the placement beginning and end times, or simply with the PLacement number, bound or grouped by REFER_ID.

What I want to know is

How many times was a client (REFER_ID ) placed, and that could have happened several times per client, once or not at all.

My thinking was to create a new variable to say there was a new placement, more so as this data will later on be used for machine learning purposes. Yes, I could use RMVL_ID to cout the incidents with standard proceedures, but I really want a new column indicating when a new incident happenened, grouped by refer_id.

How would I do that? And / Or are there better ideas.

Please find a test dataset attached.

In a different, but similiar case, I only have the datetimes and need to find out how often a new incident takes place, again grouped by Refer_ID or Cl_ID or even a specific Role. How does it work there?

Many thanks

REFER_ID	RMVL_ID	BEG_RMVL_DT	END_RMVL_DT
1800217
1800217
1800217
1800217
1800217
1800217
1800217
1800219
1800219
1800219
1800219
1800219
1800219
1800221	1559766	25Mar2007 0:00:00	13Sep2007 0:00:00
1800221	1559766	25Mar2007 0:00:00	13Sep2007 0:00:00
1800221	54497	15Nov1989 0:00:00	13Jan1990 0:00:00
1800221	1559766	25Mar2007 0:00:00	13Sep2007 0:00:00
1800221	1559766	25Mar2007 0:00:00	13Sep2007 0:00:00
1800221	1559766	25Mar2007 0:00:00	13Sep2007 0:00:00
1800221	1559766	25Mar2007 0:00:00	13Sep2007 0:00:00
1800221	1559766	25Mar2007 0:00:00	13Sep2007 0:00:00
1800221	1559766	25Mar2007 0:00:00	13Sep2007 0:00:00
1800221	1559766	25Mar2007 0:00:00	13Sep2007 0:00:00
1800221	54498	31Jul1991 0:00:00	31Jul1991 0:00:00
1800221	1559766	25Mar2007 0:00:00	13Sep2007 0:00:00
1800221	1559766	25Mar2007 0:00:00	13Sep2007 0:00:00
1800221	1559766	25Mar2007 0:00:00	13Sep2007 0:00:00
1800221	1559766	25Mar2007 0:00:00	13Sep2007 0:00:00
1800221	1559766	25Mar2007 0:00:00	13Sep2007 0:00:00
1800221	1559766	25Mar2007 0:00:00	13Sep2007 0:00:00
1800221	1559766	25Mar2007 0:00:00	13Sep2007 0:00:00
1800221	1559766	25Mar2007 0:00:00	13Sep2007 0:00:00
1800221	1559766	25Mar2007 0:00:00	13Sep2007 0:00:00
1800221	1559766	25Mar2007 0:00:00	13Sep2007 0:00:00
1800221	1559766	25Mar2007 0:00:00	13Sep2007 0:00:00
1800221	1559766	25Mar2007 0:00:00	13Sep2007 0:00:00
1800221	1559766	25Mar2007 0:00:00	13Sep2007 0:00:00
1800221	1559766	25Mar2007 0:00:00	13Sep2007 0:00:00
1800221	1559766	25Mar2007 0:00:00	13Sep2007 0:00:00
1800221	1559766	25Mar2007 0:00:00	13Sep2007 0:00:00
1800221	1559766	25Mar2007 0:00:00	13Sep2007 0:00:00
1800221	1559766	25Mar2007 0:00:00	13Sep2007 0:00:00
1800221	1559766	25Mar2007 0:00:00	13Sep2007 0:00:00
1800221	1559766	25Mar2007 0:00:00	13Sep2007 0:00:00
1800221	1559766	25Mar2007 0:00:00	13Sep2007 0:00:00
1800221	1559766	25Mar2007 0:00:00	13Sep2007 0:00:00
1800221	1559766	25Mar2007 0:00:00	13Sep2007 0:00:00
1800221	1559766	25Mar2007 0:00:00	13Sep2007 0:00:00
1800221	1559766	25Mar2007 0:00:00	13Sep2007 0:00:00
1800221	1559766	25Mar2007 0:00:00	13Sep2007 0:00:00
1800221	1559766	25Mar2007 0:00:00	13Sep2007 0:00:00
1800222
1800222
1800222
1800222
1800222
1800223
1800223
1800223
1800223
1800223
1800223
1800223
1800223
1800226
1800226
1800226
1800226
1800226
1800226
1800226
1800226
1800226
1800226
1800226
1800226
1800226
1800226
1800226
1800226
1800226
1800226
1800226
1800226
1800226
1800226
1800226
1800226
1800226
1800226
1800226
1800226
1800226
1800226
1800226
1800226
1800226
1800226
1800226
1800226
1800226
1800226
1800226
1800226
1800226
1800226
1800226
1800226
1800226
1800226
1800226
1800226
1800226
1800226
1800226
1800226
1800226
1800226
1800227	1548424	03Feb2006 0:00:00	09Feb2006 0:00:00
1800227	1548424	03Feb2006 0:00:00	09Feb2006 0:00:00
1800227	1548424	03Feb2006 0:00:00	09Feb2006 0:00:00
1800227	1548424	03Feb2006 0:00:00	09Feb2006 0:00:00
1800227	1548424	03Feb2006 0:00:00	09Feb2006 0:00:00
1800227	1548424	03Feb2006 0:00:00	09Feb2006 0:00:00
1800227	1548424	03Feb2006 0:00:00	09Feb2006 0:00:00
1800227	1548424	03Feb2006 0:00:00	09Feb2006 0:00:00
1800227	1548424	03Feb2006 0:00:00	09Feb2006 0:00:00
1800227	1548424	03Feb2006 0:00:00	09Feb2006 0:00:00
1800227	1548424	03Feb2006 0:00:00	09Feb2006 0:00:00
1800227	1548424	03Feb2006 0:00:00	09Feb2006 0:00:00
1800227	1548424	03Feb2006 0:00:00	09Feb2006 0:00:00
1800227	1548424	03Feb2006 0:00:00	09Feb2006 0:00:00
1800227	1548424	03Feb2006 0:00:00	09Feb2006 0:00:00
1800227	1548424	03Feb2006 0:00:00	09Feb2006 0:00:00
1800227	1548424	03Feb2006 0:00:00	09Feb2006 0:00:00
1800227	1548424	03Feb2006 0:00:00	09Feb2006 0:00:00
1800227	1548424	03Feb2006 0:00:00	09Feb2006 0:00:00
1800227	1548424	03Feb2006 0:00:00	09Feb2006 0:00:00

Reeza · Posted 07-08-2017 01:09 PM

I think this post should answer your questions:

https://stats.idre.ucla.edu/sas/faq/how-can-i-create-an-enumeration-variable-by-groups/

For BY group processing, in general, a data step is much easier to code than SQL.

The SAS documentation has some good diagrams that illustrate BY group processing.

http://support.sas.com/documentation/cdl/en/lrcon/69852/HTML/default/viewer.htm#n01a08zkzy5igbn173zj...

@AnnaNZ wrote:

I think PROC SQL could be the way to go

http://www2.sas.com/proceedings/sugi27/p191-27.pdf

CAn somebody help me though how to group by REFER_ID , create a variable called INCIDENT and create a condition, whereby every time that a new (BEG_RMVL_DT and END_RMVL_DT) pair occurs within the same REFER_ID , there is a 1 (opposed to a 0) in the INCIDENT variable

That would be marvellous, thanks,

View solution in original post

LinusH · Posted 07-08-2017 12:44 AM

Why do you need to add something to the detail data?
There are tons of ways in SAS to do counts: SQL, PROC MEANS/SUMMARY, PROC FREQ.. .

Data never sleeps

AnnaNZ · Posted 07-08-2017 01:01 AM

I know that,

but it has to be grouped by Refer_ID.

IN a diferent case and I wanted to add this to my question above, I only have datetimes and need to find out how often a new incident takes place.

So how can I get tehse two case working?

AnnaNZ · Posted 07-08-2017 04:00 AM

I think PROC SQL could be the way to go

http://www2.sas.com/proceedings/sugi27/p191-27.pdf

CAn somebody help me though how to group by REFER_ID , create a variable called INCIDENT and create a condition, whereby every time that a new (BEG_RMVL_DT and END_RMVL_DT) pair occurs within the same REFER_ID , there is a 1 (opposed to a 0) in the INCIDENT variable

That would be marvellous, thanks,

Astounding · Posted 07-08-2017 09:17 AM

You're asking about fairly standard programming techniques. For example:

proc sort data=have;

by refer_id beg_rmvl_dt;

run;

data want;

set have;

by refer_id beg_rmvl_dt;

incident = first.beg_rmvl_dt;

run;

It's up to you how you want to treat missing values for BEG_RMVL_DT.

If all you want is the counts of INCIDENT, then SQL might come into play. But if you want to keep all the original observations, SQL becomes more complex.

Reeza · Posted 07-08-2017 01:09 PM

I think this post should answer your questions:

https://stats.idre.ucla.edu/sas/faq/how-can-i-create-an-enumeration-variable-by-groups/

For BY group processing, in general, a data step is much easier to code than SQL.

The SAS documentation has some good diagrams that illustrate BY group processing.

http://support.sas.com/documentation/cdl/en/lrcon/69852/HTML/default/viewer.htm#n01a08zkzy5igbn173zj...

@AnnaNZ wrote:

I think PROC SQL could be the way to go

http://www2.sas.com/proceedings/sugi27/p191-27.pdf

CAn somebody help me though how to group by REFER_ID , create a variable called INCIDENT and create a condition, whereby every time that a new (BEG_RMVL_DT and END_RMVL_DT) pair occurs within the same REFER_ID , there is a 1 (opposed to a 0) in the INCIDENT variable

That would be marvellous, thanks,

AnnaNZ · Posted 07-08-2017 10:56 PM

THat´s it thank you

  Proc sort data=   Kea.REFER_DT; by  RMVL_ID ; run; 
  Data Kea.REFERRAL; 
Set  Kea.REFER_DT; 
 count + 1;
by RMVL_ID;
if first.RMVL_ID then count = 1;
If RMVL_ID = . then count = 0;
run;

create new variable when group-statement met

Re: create new variable when group-statement met

Re: create new variable when group-statement met

Re: create new variable when group-statement met

Re: create new variable when group-statement met

Re: create new variable when group-statement met

Re: create new variable when group-statement met

Re: create new variable when group-statement met

Registration is open