BookmarkSubscribeRSS Feed

Hey SAS folks,

I will have to work with Telco CDR data,that is Call Data Record ; basically, one phone number associated with one phone number and details on communication (such as operator).

The table is sorted according to time (as one can expect -- this is tracking).


From those data, I have to extract a list of distinct phone numbers and their latests status (say phone operator).

I can think of reading data, outputing per CDR two records (phone numbers) with the date and perform a PROC SORT nodedupkey on it.

I was wondering wether there is a way to perform the task reading only once the source data.

In particular, is there the place where hashkey could help?

Sample code and what needed:

DATA CDR;
INPUT phone1 phone2 date:date9. phone1_status:$1. phone2_status:$1.;
format date date9.;
CARDS;
1 2 01JAN2007 A B
1 3 02JAN2007 A B
2 3 03JAN2007 A B
2 4 04JAN2007 C A
4 1 05JAN2007 C B
RUN;


* should already be sorted;
proc sort data=cdr; by date; run;
* ;

data phones_dup(keep=phone status date);
set cdr;
phone=phone1; status=phone1_status; output;
phone=phone2; status=phone2_status; output;
run;

proc sort data=phones_dup out=phones_dup;
by phone date;
run;
data phones;
set phones_dup; by phone; if last.phone;
run;
proc print; run;

Thank you for any insight,

Eric

4 REPLIES 4
DLing
Obsidian | Level 7

Your code is:  input -> sort -> split -> sort -> find last

The split operation is not dependent on the sort.  Maybe this will work better:

input+split all in one step -> sort -> find last

DATA CDR1(keep=phone1 phone2 status1 status2 date)

          CDR2(keep=phone status date)

     ;


     INPUT phone1 phone2 date:date9. status1:$1. status2:$1.;
     format date date9.;

     

     output cdr1;

     phone=phone1; status=status1; output cdr2;

     phone=phone2; status=status2; output cdr2;

CARDS;
1 2 01JAN2007 A B
1 3 02JAN2007 A B
2 3 03JAN2007 A B
2 4 04JAN2007 C A
4 1 05JAN2007 C B
RUN;


You don't have to create cdr1 if it is no longer required.

Then you can sort CDR2 and find last, or use PROC SQL.

Ksharp
Super User

Yes. there is a way for Hash Table. But why do you want to do this so.

Your code is very good.

DATA CDR;
INPUT phone1 phone2 date:date9. phone1_status:$1. phone2_status:$1.;
format date date9.;
CARDS;
1 2 01JAN2007 A B
1 3 02JAN2007 A B
2 3 03JAN2007 A B
2 4 04JAN2007 C A
4 1 05JAN2007 C B
;
RUN;
data want(keep=phone status date);
 declare hash ha(hashexp: 16,ordered: 'D',multidata: 'Y');
 declare hiter hi('ha');
  ha.definekey('phone','date');
  ha.definedata('phone','date','status');
  ha.definedone();

 do until(last);
  set cdr end=last;
  phone=phone1;status=phone1_status;ha.add();
  phone=phone2;status=phone2_status;ha.add();
 end;

 rc=hi.first(); pre_phone=.;
 do while(rc=0);
  if phone ne pre_phone then output;
  pre_phone=phone;
  rc=hi.next();
 end;
stop;
run;

Ksharp

MikeZdeb
Rhodochrosite | Level 12

Hi ... here's another idea ...

DATA CDR;

INPUT phone1 phone2 date:date9. phone1_status:$1. phone2_status:$1.;

format date date9.;

CARDS;

1 2 01JAN2007 A B

1 3 02JAN2007 A B

2 3 03JAN2007 A B

2 4 04JAN2007 C A

4 1 05JAN2007 C B

RUN;

proc sql;

create table new as

select *

from (select phone1 as phone, phone1_status as status, date

      from cdr

      union

      select phone2 as phone, phone2_status as status, date

      from cdr)

group phone

having date eq max(date);

quit;

proc print data=new noobs;

run;

phone    status         date

  1        B       05JAN2007

  2        C       04JAN2007

  3        B       03JAN2007

  4        C       05JAN2007

Thanks for all your input;

I will try the different versions to test performance.

In particular I was thinking (hoping)  hashkey could be more performant.

With regards to the high number of records, I fear a big SQL may not be the most performant but there again I have to test.

Eric

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 724 views
  • 0 likes
  • 4 in conversation