SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Group data and time difference between groups of interest

Accepted Solution Solved
Reply
Highlighted
Contributor
Posts: 40
Accepted Solution

Group data and time difference between groups of interest

Hi

I have a case I am needed help with and I show the dataas a picture first to make my question better understandable. Data is below.

 

I have a dataset with repeated measurements

For each grouped Client_ID : CL_ID, I need the time-difference(in form of a new time variable) from the second to the firsr, then the third to the second then the fourth to third (and so on) referal Time: Refer_DT.

 

I have showcased this in excel as in the picture below: timedifference between referral 1 and 2 in CLient_ID 7 would be d5 - d4: 0 days; timedifference between referral 2 and 3 in CLient_ID 7 would be d6 - d7: 462 days; timedifference between referral 3 and 4 in CLient_ID 7 would be d7 - d6: 0 days. that is all for them , no time difference for Client_ID 12 and 19 as they occur only once, but then the same thing for Client_ID 21

 

Also, as can be seen in this dataset, names don't always agree with CLient_ID and I would need to knw how ofen and where this occurs: like CL_ID 21, which has the names Alessandra and Yoda. In this case Allessandra cannot be right.

 

The way I thought about this was using First.byvariable and Last.byvariable, would it would not resolve any of the calculation inbetween first and last.

Where do I start with sometihng like this?

Many thanks for any help and time .

 

test.JPG

 

 

REFER_IDCL_IDPUPIL_NAMEREFER_DT
15324034Sandra16/10/2001
16475947Allessandra4/10/2003
16475947Victor4/10/2003
17285127Allessandra8/01/2005
17285127Sandra8/01/2005
161992212Sandra21/04/2003
164822419Victor8/10/2003
178622121Allessandra15/11/2005
178622121Yoda15/11/2005
184635521Yoda2/09/2006
189213021Yoda11/05/2007
162733027Sandra30/05/2003
162733028Sandra30/05/2003
177665929Victor23/09/2005
194304729Allessandra17/01/2008
194304729Yoda17/01/2008
175385330Allessandra16/05/2005
184772030Allessandra22/09/2006

Accepted Solutions
Solution
yesterday
Esteemed Advisor
Posts: 7,052

Re: Group data and time difference between groups of interest

ifn and ifc are if-then-else functions. the if, then and else are separated by commas. Thus, the the ,. at the end of the call to the function is setting the value to missing if it is the first in a series of CL_IDs.

 

Something like the following would be a good start for your question regarding names:

proc sql noprint;
  create table test as
    select pupil_name,refer_id
      from have
        group by refer_id
          having count(*) gt 1 and
                 count(distinct pupil_name) gt 1
  ;
quit;

Art, CEO, AnalystFinder.com

 

View solution in original post


All Replies
Esteemed Advisor
Posts: 7,052

Re: Group data and time difference between groups of interest

[ Edited ]

Regarding the calculations, the following should work:

data want;
  set have;
  by CL_ID;
  days_between_referal=ifn(first.cl_id eq 0,refer_dt-lag(REFER_DT),.);
run;

However, about the name discrepancies, you'd have to specify what you want to achieve.

 

Art, CEO, AnalystFinder.com

 

Contributor
Posts: 40

Re: Group data and time difference between groups of interest

Thank you  ! That works.

I'll post this as SOLUTION, but wanted to replay first

 

What does the lag statemenet do exactly?

refer_dt-lag(REFER_DT),.) How would this part translate in words?

Lag =  the previous record in ()REFER_DT ?

, .  - what dooes this stand for - eventually I want to be able to program somethying like this myself, therefore I would like to learn more about this formula :-)

 

 

Now with the names. I would like to see a variable that tells me where for each CL_ID there is a mismatch in names

Then I could filter for this mismatch and find out more about why there are different values in there.

Many thanks, Anna

 

 

Esteemed Advisor
Posts: 7,052

Re: Group data and time difference between groups of interest

[ Edited ]

Yes, lag(refer_dt) gets the previous value of refer_dt. A full description of the function can be found at: http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000212547.htm

 

I used it with the ifn function, starting with the condition first.cl_id eq 0 so that two things would occur

1. it would always execute the lag statement

but

2. only use it if it wasn't the first cl_id in a set of cl_ids.

 

As for the names, are you thinking about resetting cl_id to only contain the same names?

 

If so, would you consider Sandra and Allessandra to be the same or different persons?

 

Art, CEO, AnalystFinder.com

 

Contributor
Posts: 40

Re: Group data and time difference between groups of interest

Many thanks for the replay! Awesome. What does the comma  dot   ,.  at the end of your formaula archieve?

 

With the names, No CL_ID and names would have to remain intact, as it would be difficult to go through 7 mio records to assess which ones would be similiar. I would only need to know, which CL_ID have different names to them - and to be more specifoc

Which CL_ID with the same Refer_ID and the same Refer_DT have different names to them, and if so, where and which ones are they.

 

So looking at the example table I have provided: CL_ID 7, has two different Refer_ID:

 

Refer_ID 647594 with the names Allessandra and Victor on the same date: 4/10/2003

Refer_ID 1728512 with the names Allessandra and Sandra on the same date : 8/1/2005

 

So I would want to know that there is a discrepancy in names in Refer_ID 647594 on the same date: 4/10/2003 - and if possible I would want to know what that discrepancy is, unless i then can filter for that

 

I would tihink a lot of grouping then data and then requesting an output based on same groups. But I am not sure how to start.

 

many thanks for your help

 

 

 

Solution
yesterday
Esteemed Advisor
Posts: 7,052

Re: Group data and time difference between groups of interest

ifn and ifc are if-then-else functions. the if, then and else are separated by commas. Thus, the the ,. at the end of the call to the function is setting the value to missing if it is the first in a series of CL_IDs.

 

Something like the following would be a good start for your question regarding names:

proc sql noprint;
  create table test as
    select pupil_name,refer_id
      from have
        group by refer_id
          having count(*) gt 1 and
                 count(distinct pupil_name) gt 1
  ;
quit;

Art, CEO, AnalystFinder.com

 

Contributor
Posts: 40

Re: Group data and time difference between groups of interest

Hi Art, Thank you so much. I tweaked the code to me needs and it works! Fantastic!!

 

 

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 108 views
  • 2 likes
  • 2 in conversation