BookmarkSubscribeRSS Feed
Demoxe
Calcite | Level 5
I have a problem with joining two tables by index. The code below works successfully, but in log I get a lot of errors. Not "ERROR:", "NOTE:" or "WARNING:" type, SAS simply put all variables, _N_, _ERROR_ and _IORC_ in log, with __ERROR_=1 and _IORC_=1230015, and then continues to process the left part of the code. By debugging I was able to determine, that these errors were produced, when "set tarifs key=id_tarif;" was processed and the value of key didn't change from previous step. I really can't understand, why these conditions lead to errors, but I'm also not so good with SAS yet. So I will be grateful for any tips, how to deal with this problem.

Code (three tables are created in macro, then they are merged in data step):

%macro initialize();
data tarifs (index=(id_tarif));
length id_tarif 8 price 8;
do i=1 to 10;
id_tarif=i;
price=floor(ranuni(10)*10+1);
output;
end;
drop i;
run;

data abonents;
length id_abonent 8 id_tarif 8 date_begin 8 date_end 8;
do i=1 to 10;
id_abonent=i;
id_tarif=floor(ranuni(10000)*10+1);
date_begin=17000;
date_end=17200;
output;
end;
do i=1 to 10;
id_abonent=i;
id_tarif=floor(ranuni(10000)*10+1);
date_begin=17201;
date_end=17400;
output;
end;
drop i;
run;

data calls;
length id_abonent 8 dateCall 8 timeCall 8;
do i=1 to 25;
id_abonent=floor(ranuni(10)*10+1);
datecall=17000+floor(ranuni(300)*300);
timeCall=floor(ranuni(300)*10+1);
output;
end;
drop i;
run;

proc sort data=calls;
by id_abonent datecall;
run;

proc sort data=abonents;
by id_abonent ;
run;

%mend;
%initialize; Message was edited by: Demoxe
10 REPLIES 10
Demoxe
Calcite | Level 5
data timecost (drop=date_begin date_end id_abonents rename=(id_calls=id_abonent));
retain Total_Cost 0;
set calls (rename=(id_abonent=id_calls)) end=last;
if (not((id_calls=id_abonents) and (date_begin do until ((id_calls=id_abonents) and (date_begin set abonents (rename=(id_abonent=id_abonents));
end;
set tarifs key=id_tarif;
Total_Cost=Total_Cost+timeCall*price;
run;

--------------------------------------------
And log:

287 data timecost (drop=date_begin date_end id_abonents
287! rename=(id_calls=id_abonent));
288 retain Total_Cost 0;
289 set calls (rename=(id_abonent=id_calls));
290 if (not((id_calls=id_abonents) and (date_begin<=dateCall 291 do until ((id_calls=id_abonents) and (date_begin<=dateCall 292 set abonents (rename=(id_abonent=id_abonents));
293 end;
294 set tarifs key=id_tarif;
295 Total_Cost=Total_Cost+timeCall*price;
296 run;

Total_Cost=65 id_calls=2 dateCall=17111 timeCall=5 id_abonents=2 date_begin=17000
date_end=17200 id_tarif=9 price=5 _ERROR_=1 _IORC_=1230015 _N_=2
Total_Cost=110 id_calls=2 dateCall=17175 timeCall=9 id_abonents=2 date_begin=17000
date_end=17200 id_tarif=9 price=5 _ERROR_=1 _IORC_=1230015 _N_=3
Total_Cost=180 id_calls=3 dateCall=17130 timeCall=3 id_abonents=3 date_begin=17000
date_end=17200 id_tarif=3 price=10 _ERROR_=1 _IORC_=1230015 _N_=5
Total_Cost=292 id_calls=3 dateCall=17268 timeCall=10 id_abonents=3 date_begin=17201
date_end=17400 id_tarif=10 price=7 _ERROR_=1 _IORC_=1230015 _N_=7
Total_Cost=327 id_calls=4 dateCall=17119 timeCall=5 id_abonents=4 date_begin=17000
date_end=17200 id_tarif=10 price=7 _ERROR_=1 _IORC_=1230015 _N_=8
Total_Cost=381 id_calls=5 dateCall=17095 timeCall=5 id_abonents=5 date_begin=17000
date_end=17200 id_tarif=1 price=9 _ERROR_=1 _IORC_=1230015 _N_=10
Total_Cost=507 id_calls=6 dateCall=17298 timeCall=8 id_abonents=6 date_begin=17201
date_end=17400 id_tarif=10 price=7 _ERROR_=1 _IORC_=1230015 _N_=13
Total_Cost=531 id_calls=7 dateCall=17057 timeCall=3 id_abonents=7 date_begin=17000
date_end=17200 id_tarif=8 price=4 _ERROR_=1 _IORC_=1230015 _N_=15
Total_Cost=547 id_calls=7 dateCall=17148 timeCall=4 id_abonents=7 date_begin=17000
date_end=17200 id_tarif=8 price=4 _ERROR_=1 _IORC_=1230015 _N_=16
Total_Cost=579 id_calls=7 dateCall=17199 timeCall=8 id_abonents=7 date_begin=17000
date_end=17200 id_tarif=8 price=4 _ERROR_=1 _IORC_=1230015 _N_=17
Total_Cost=670 id_calls=8 dateCall=17134 timeCall=4 id_abonents=8 date_begin=17000
date_end=17200 id_tarif=10 price=7 _ERROR_=1 _IORC_=1230015 _N_=19
Total_Cost=745 id_calls=10 dateCall=17103 timeCall=1 id_abonents=10
date_begin=17000 date_end=17200 id_tarif=7 price=5 _ERROR_=1 _IORC_=1230015 _N_=22
Total_Cost=898 id_calls=10 dateCall=17217 timeCall=8 id_abonents=10
date_begin=17201 date_end=17400 id_tarif=1 price=9 _ERROR_=1 _IORC_=1230015 _N_=24
Total_Cost=970 id_calls=10 dateCall=17247 timeCall=8 id_abonents=10
date_begin=17201 date_end=17400 id_tarif=1 price=9 _ERROR_=1 _IORC_=1230015 _N_=25
NOTE: There were 25 observations read from the data set WORK.CALLS.
NOTE: There were 20 observations read from the data set WORK.ABONENTS.
NOTE: The data set WORK.TIMECOST has 25 observations and 6 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds Message was edited by: Demoxe
chang_y_chung_hotmail_com
Obsidian | Level 7
@Demoxe: Sorry, but your data step is just very difficult to read and probably incorrect.

Based on the variable names, I guess what you are trying to do is to calculate total cost of phone calls for each subscriber (abonent). The price of a phone call is calculated by multiplying the callTime and price. The price seems to be determined by the value of id_tarif, which in turn decided by the abonent and the date range within which the call date falls into. If this is a correct guess, then it is quite easy to do.

In the below, the total cost for the abonent 10 is 230, because abonent 10's first phone call is made on the date 17103, which is between 17000 and 17199, during which the tarif was 7 thus the price was 5, while the three remaining calls are made during the dates when the tarif was 1 thus the price was 5. Total cost thus was: 1*5 + 9*9 + 8*9 + 8*9 = 230.

[pre]
/* test data */
data tarifs;
input id_tarif price;
cards;
1 9
2 8
3 10
4 6
5 3
6 7
7 5
8 4
9 5
10 7
;
run;

data abonents;
input id_abonent id_tarif date_begin date_end;
cards;
1 7 17000 17200
1 2 17201 17400
2 9 17000 17200
2 8 17201 17400
3 3 17000 17200
3 10 17201 17400
4 10 17000 17200
4 9 17201 17400
5 1 17000 17200
5 3 17201 17400
6 8 17000 17200
6 10 17201 17400
7 8 17000 17200
7 2 17201 17400
8 10 17000 17200
8 3 17201 17400
9 10 17000 17200
9 7 17201 17400
10 7 17000 17200
10 1 17201 17400
;
run;

data calls;
input id_abonent dateCall timeCall;
cards;
2 17099 8
2 17111 5
2 17175 9
3 17019 4
3 17130 3
3 17215 6
3 17268 10
4 17119 5
5 17038 1
5 17095 5
6 17064 7
6 17201 6
6 17298 8
7 17014 3
7 17057 3
7 17148 4
7 17199 8
8 17079 9
8 17134 4
8 17206 2
9 17210 10
10 17103 1
10 17217 9
10 17217 8
10 17247 8
;
run;

proc sql;
/* attach price to abonents matching tarifs */
create view prices as
select a.*, t.price
from abonents as a left join tarifs as t
on a.id_tarif=t.id_tarif;

/* calc the cost of each call looking up the tarif (and thus the price)
based on the call date and multiplying the price with the call time.
assumes that there are no call dates that match multiple date ranges */
create view costs as
select c.*, c.timeCall*p.price as cost
from calls as c left join prices as p
on c.id_abonent=p.id_abonent and
p.date_begin <= c.dateCall and c.dateCall < p.date_end;

/* calculate total cost for each abonent */
create table totals as
select id_abonent
, count(*) as num_calls
, min(dateCall) as date_from
, max(dateCall) as date_to
, sum(cost) as total_cost
from costs
group by id_abonent
order by id_abonent;

/* check */
select * from totals;
quit;
/* on lst
id_abonent num_calls date_from date_to total_cost
------------------------------------------------------
2 3 17099 17175 110
3 4 17019 17268 182
4 1 17119 17119 35
5 2 17038 17095 54
6 3 17064 17298 126
7 4 17014 17199 72
8 3 17079 17206 111
9 1 17210 17210 50
10 4 17103 17247 230
*/
[/pre]
Demoxe
Calcite | Level 5
Well, thanks, but this doesn't solve my problem. My task was to find total cost of all phone calls and to do this in one data step without using sql. I solved it, using posted code (sorry for missing tabs), and I can't understand why these errors occurred. I also couldn't find any detailed explanation on how SET works, when KEY= is used, so I don't see any reasons for this 1230015 error (the KEY= value is not being found in
the master data set). Maybe someone can tell me, where I can find this detailed explanation? Message was edited by: Demoxe
AllenEBingham
Calcite | Level 5
As near as I could tell you had two issues. The first was that the SET statement for reading in the "abonents" data wasn't necessarily going all the way through the data set ... every time you issued the SET statement within your DO WHILE loop it would start off with the next observation after the previous loop ... so it didn't always start with the first record ... and as such some records of the call data set weren't matching with abonent at all ...

... the second issue is that you needed a UNIQUE option on the SET KEY= statement.

Interestingly with your original code and my version of SAS (9.2 TS2M3 on a Windows XP platform) ... I only got out 11 records from your original code ... it only read in the first 13 records from "call" ... that's about the point it went to the end of the Tarif file (id_tarif=10).

In order to 'solve' the first problem I had to create code that would use the POINT= option for the SET ABONENTS statement ... so it would go through that data set one record at a time until it found a match. This is probably not the most efficient code if your real ABONENTS data set is large.

In order to do that I grabbed some code I found for creating a macro variiable with the number of observations in ABONENTS ...

Also I didn't actually check the results to see if I get what you wanted ... but I listed out the end result after the code below.

So here's the end result of the code (starting after you've created your test data sets):
___________________
%macro numobs(dsn);
%global num;
%let dsid=%sysfunc(open(&dsn));
%let num=%sysfunc(attrn(&dsid,nobs));
%let rc=%sysfunc(close(&dsid));
%mend numobs;

%numobs(abonents)

data timecost (drop=date_begin date_end id_abonents rename=(id_calls=id_abonent));
retain Total_Cost 0;

set calls (rename=(id_abonent=id_calls)) end=last;

abo_key=1;
do while(abo_key le &num);
set abonents (rename=(id_abonent=id_abonents)) point=abo_key;
if (id_calls=id_abonents) and (date_begin else abo_key=abo_key+1;
end;

set tarifs key=id_tarif/unique;

Total_Cost=Total_Cost+timeCall*price;
run;
________________
Here's the end result (TIMECOST):
Total_ date time
Obs Cost id_abonent Call Call id_tarif price

1 40 2 17099 8 9 5
2 65 2 17111 5 9 5
3 110 2 17175 9 9 5
4 150 3 17019 4 3 10
5 180 3 17130 3 3 10
6 222 3 17215 6 10 7
7 292 3 17268 10 10 7
8 327 4 17119 5 10 7
9 336 5 17038 1 1 9
10 381 5 17095 5 1 9
11 409 6 17064 7 8 4
12 463 6 17201 6 1 9
13 519 6 17298 8 10 7
14 531 7 17014 3 8 4
15 543 7 17057 3 8 4
16 559 7 17148 4 8 4
17 591 7 17199 8 8 4
18 654 8 17079 9 10 7
19 682 8 17134 4 10 7
20 702 8 17206 2 3 10
21 752 9 17210 10 7 5
22 757 10 17103 1 7 5

(First time posted the reply got truncated)


Message was edited by: AllenEBingham Message was edited by: AllenEBingham
Demoxe
Calcite | Level 5
Thanks a lot, UNIQUE actually helped. Quite strange, because I tried to use it before and only one observation was read from calls set. Now it works just fine.

As for the first issue, abonents set is generated in such way, that it is sorted by id_abonent and date_begin (or date_end). Calls set is sorted by ad_abonent and timeCall. So there is no need to look for next observation from the beginning of abonents set, we can just continue from the previous observation.

Thanks again.
AllenEBingham
Calcite | Level 5
... when I just add the "UNIQUE" option to your orginal code. I get rid of the error statements as well.

BUT --- I only get 11 records in the resulting file --- only 13 of the records of CALL are read ... all 20 of ABONENTS are read.

Here's what I think is happening with your code (at least on my machine):

IF the last read record of ABONENTS is the one the current record of CALL matches --- as in there are two calls records in a row that match with the same record in ABONENTS ... and it does for record number 12 and 13 in my generated dataset from your INITIALIZE macro (since you're doing some RANUNI calls to generate the data ... I think the generated data on your machine and/or others may not always have this happen)

THEN that record of ABONENTS is already gone and the DO WHILE loops through the rest of ABONENTS ... never finds a match and quits the data step.

My solution fixes that --- but now that I understand things a bit better, my code could be rewritten to be more efficient by RETAINING 'temporary' values from the ABONENTS data file --- so that when there are records in a row in the CALL data set that match the same record in the ABONENTS dataset the SET ABONENTS doesn't happen until a record of CALL is read that doesn't match the previous record.

Make sense?
AllenEBingham
Calcite | Level 5
Deleted text of this reply, as I mistakenly replied to my own post. I have re-posted the earlier text as a reply to the orginal poster. (Sorry ...) Message was edited by: AllenEBingham
AllenEBingham
Calcite | Level 5
(Reposted text from earlier reply to my own message --- so this would go to the original poster ... )

I re-read your latest reply ... and note that you probably did run into this
problem the first time you used UNIQUE ... since you only got one record read
from CALLS.


Anyway I re-worked your original code, in order to RETAIN the values on ABONENTS
that are used for the matching criteria ... and so that after the first match in
CALLS is done the remaining records in CALLS are first compared to the last
matching ABONENTS record ... and then goes through ABONENTS if no match is found
... etc.


Here's the code:

 



data timecost (drop=datebegin dateend id_abonents
rename=(id_calls=id_abonent));

   retain Total_Cost 0 datebegin dateend id_abonents;


   set calls (rename=(id_abonent=id_calls))
end=last;


   if _n_ eq 1 then set abonents (rename=(id_abonent=id_abonents


                                         
date_begin=datebegin

                                         
date_end=dateend));

   else;



   if (not((id_calls=id_abonents) and (datebegin<=dateCall<dateend)))
then

   do until ((id_calls=id_abonents) and (datebegin<=dateCall<dateend));

      set abonents (rename=(id_abonent=id_abonents

                           
date_begin=datebegin

                           
date_end=dateend));

   end;

   else;


   set tarifs key=id_tarif/unique;

   Total_Cost=Total_Cost+timeCall*price;

run;

 


Demoxe
Calcite | Level 5
Thanks for your help.
About reading only some observations from calls set - the problem was in condition (date_begin le dateCall lt date_end). I forgot, that intervals of tarifs don't have common values of date. So when dateCall=date_end, DO-loop repeats until the end of abonents set and then no more observations can be added to timecost. With (date_begin le dateCall le date_end) everything works fine. Message was edited by: Demoxe
Doc_Duke
Rhodochrosite | Level 12
This behavior is normal. See
http://support.sas.com/kb/9/371.html

Google search
_IORC_=1230015 site:sas.com

You probably need to do some error checking in your code. See
http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a001104667.htm

Doc Muhlbaier
Duke

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 2269 views
  • 0 likes
  • 4 in conversation