Hey SAS folks,
I will have to work with Telco CDR data,that is Call Data Record ; basically, one phone number associated with one phone number and details on communication (such as operator).
The table is sorted according to time (as one can expect -- this is tracking).
From those data, I have to extract a list of distinct phone numbers and their latests status (say phone operator).
I can think of reading data, outputing per CDR two records (phone numbers) with the date and perform a PROC SORT nodedupkey on it.
I was wondering wether there is a way to perform the task reading only once the source data.
In particular, is there the place where hashkey could help?
Sample code and what needed:
DATA CDR;
INPUT phone1 phone2 date:date9. phone1_status:$1. phone2_status:$1.;
format date date9.;
CARDS;
1 2 01JAN2007 A B
1 3 02JAN2007 A B
2 3 03JAN2007 A B
2 4 04JAN2007 C A
4 1 05JAN2007 C B
RUN;
* should already be sorted;
proc sort data=cdr; by date; run;
* ;
data phones_dup(keep=phone status date);
set cdr;
phone=phone1; status=phone1_status; output;
phone=phone2; status=phone2_status; output;
run;
proc sort data=phones_dup out=phones_dup;
by phone date;
run;
data phones;
set phones_dup; by phone; if last.phone;
run;
proc print; run;
Thank you for any insight,
Eric
Your code is: input -> sort -> split -> sort -> find last
The split operation is not dependent on the sort. Maybe this will work better:
input+split all in one step -> sort -> find last
DATA CDR1(keep=phone1 phone2 status1 status2 date)
CDR2(keep=phone status date)
;
INPUT phone1 phone2 date:date9. status1:$1. status2:$1.;
format date date9.;
output cdr1;
phone=phone1; status=status1; output cdr2;
phone=phone2; status=status2; output cdr2;
CARDS;
1 2 01JAN2007 A B
1 3 02JAN2007 A B
2 3 03JAN2007 A B
2 4 04JAN2007 C A
4 1 05JAN2007 C B
RUN;
You don't have to create cdr1 if it is no longer required.
Then you can sort CDR2 and find last, or use PROC SQL.
Yes. there is a way for Hash Table. But why do you want to do this so.
Your code is very good.
DATA CDR; INPUT phone1 phone2 date:date9. phone1_status:$1. phone2_status:$1.; format date date9.; CARDS; 1 2 01JAN2007 A B 1 3 02JAN2007 A B 2 3 03JAN2007 A B 2 4 04JAN2007 C A 4 1 05JAN2007 C B ; RUN; data want(keep=phone status date); declare hash ha(hashexp: 16,ordered: 'D',multidata: 'Y'); declare hiter hi('ha'); ha.definekey('phone','date'); ha.definedata('phone','date','status'); ha.definedone(); do until(last); set cdr end=last; phone=phone1;status=phone1_status;ha.add(); phone=phone2;status=phone2_status;ha.add(); end; rc=hi.first(); pre_phone=.; do while(rc=0); if phone ne pre_phone then output; pre_phone=phone; rc=hi.next(); end; stop; run;
Ksharp
Hi ... here's another idea ...
DATA CDR;
INPUT phone1 phone2 date:date9. phone1_status:$1. phone2_status:$1.;
format date date9.;
CARDS;
1 2 01JAN2007 A B
1 3 02JAN2007 A B
2 3 03JAN2007 A B
2 4 04JAN2007 C A
4 1 05JAN2007 C B
RUN;
proc sql;
create table new as
select *
from (select phone1 as phone, phone1_status as status, date
from cdr
union
select phone2 as phone, phone2_status as status, date
from cdr)
group phone
having date eq max(date);
quit;
proc print data=new noobs;
run;
phone status date
1 B 05JAN2007
2 C 04JAN2007
3 B 03JAN2007
4 C 05JAN2007
Thanks for all your input;
I will try the different versions to test performance.
In particular I was thinking (hoping) hashkey could be more performant.
With regards to the high number of records, I fear a big SQL may not be the most performant but there again I have to test.
Eric
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.