Help using Base SAS procedures

In search of an efficient way to deal with ID<>ID records

Reply
Occasional Contributor
Posts: 9

In search of an efficient way to deal with ID<>ID records

Hey SAS folks,

I will have to work with Telco CDR data,that is Call Data Record ; basically, one phone number associated with one phone number and details on communication (such as operator).

The table is sorted according to time (as one can expect -- this is tracking).


From those data, I have to extract a list of distinct phone numbers and their latests status (say phone operator).

I can think of reading data, outputing per CDR two records (phone numbers) with the date and perform a PROC SORT nodedupkey on it.

I was wondering wether there is a way to perform the task reading only once the source data.

In particular, is there the place where hashkey could help?

Sample code and what needed:

DATA CDR;
INPUT phone1 phone2 date:date9. phone1_status:$1. phone2_status:$1.;
format date date9.;
CARDS;
1 2 01JAN2007 A B
1 3 02JAN2007 A B
2 3 03JAN2007 A B
2 4 04JAN2007 C A
4 1 05JAN2007 C B
RUN;


* should already be sorted;
proc sort data=cdr; by date; run;
* ;

data phones_dup(keep=phone status date);
set cdr;
phone=phone1; status=phone1_status; output;
phone=phone2; status=phone2_status; output;
run;

proc sort data=phones_dup out=phones_dup;
by phone date;
run;
data phones;
set phones_dup; by phone; if last.phone;
run;
proc print; run;

Thank you for any insight,

Eric

Frequent Contributor
Posts: 104

In search of an efficient way to deal with ID<>ID records

Your code is:  input -> sort -> split -> sort -> find last

The split operation is not dependent on the sort.  Maybe this will work better:

input+split all in one step -> sort -> find last

DATA CDR1(keep=phone1 phone2 status1 status2 date)

          CDR2(keep=phone status date)

     ;


     INPUT phone1 phone2 date:date9. status1:$1. status2:$1.;
     format date date9.;

     

     output cdr1;

     phone=phone1; status=status1; output cdr2;

     phone=phone2; status=status2; output cdr2;

CARDS;
1 2 01JAN2007 A B
1 3 02JAN2007 A B
2 3 03JAN2007 A B
2 4 04JAN2007 C A
4 1 05JAN2007 C B
RUN;


You don't have to create cdr1 if it is no longer required.

Then you can sort CDR2 and find last, or use PROC SQL.

Super User
Posts: 9,687

In search of an efficient way to deal with ID<>ID records

Yes. there is a way for Hash Table. But why do you want to do this so.

Your code is very good.

DATA CDR;
INPUT phone1 phone2 date:date9. phone1_status:$1. phone2_status:$1.;
format date date9.;
CARDS;
1 2 01JAN2007 A B
1 3 02JAN2007 A B
2 3 03JAN2007 A B
2 4 04JAN2007 C A
4 1 05JAN2007 C B
;
RUN;
data want(keep=phone status date);
 declare hash ha(hashexp: 16,ordered: 'D',multidata: 'Y');
 declare hiter hi('ha');
  ha.definekey('phone','date');
  ha.definedata('phone','date','status');
  ha.definedone();

 do until(last);
  set cdr end=last;
  phone=phone1;status=phone1_status;ha.add();
  phone=phone2;status=phone2_status;ha.add();
 end;

 rc=hi.first(); pre_phone=.;
 do while(rc=0);
  if phone ne pre_phone then output;
  pre_phone=phone;
  rc=hi.next();
 end;
stop;
run;

Ksharp

Valued Guide
Posts: 765

Re: In search of an efficient way to deal with ID<>ID records

Hi ... here's another idea ...

DATA CDR;

INPUT phone1 phone2 date:date9. phone1_status:$1. phone2_status:$1.;

format date date9.;

CARDS;

1 2 01JAN2007 A B

1 3 02JAN2007 A B

2 3 03JAN2007 A B

2 4 04JAN2007 C A

4 1 05JAN2007 C B

RUN;

proc sql;

create table new as

select *

from (select phone1 as phone, phone1_status as status, date

      from cdr

      union

      select phone2 as phone, phone2_status as status, date

      from cdr)

group phone

having date eq max(date);

quit;

proc print data=new noobs;

run;

phone    status         date

  1        B       05JAN2007

  2        C       04JAN2007

  3        B       03JAN2007

  4        C       05JAN2007

Occasional Contributor
Posts: 9

In search of an efficient way to deal with ID<>ID records

Thanks for all your input;

I will try the different versions to test performance.

In particular I was thinking (hoping)  hashkey could be more performant.

With regards to the high number of records, I fear a big SQL may not be the most performant but there again I have to test.

Eric

Ask a Question
Discussion stats
  • 4 replies
  • 124 views
  • 0 likes
  • 4 in conversation