BookmarkSubscribeRSS Feed
ShufeGuoding
Obsidian | Level 7

I have a data set p2p as following

 

sellid buyid id date
1000023 1144058 292173 2016/11/7
1000023 477211 292178 2015/10/27
1000023 1569350 292178 2015/10/27
1000023 318100 289648 2015/4/7
1000023 350832 289652 2015/4/7
1000023 560400 289648 2015/4/7
1000023 259210 292173 2016/11/7
1000023 1466757 289652 2015/4/7
1000023 10626 292213 2016/11/7
1000023 259210 292221 2016/11/7
1000023 328802 292221 2016/11/7
1000023 827855 289648 2015/4/7
1000023 10626 292173 2016/11/7
1000023 823271 292173 2016/11/7
1000023 1597616 292202 2016/11/7
1000023 5068647 292202 2016/11/7
1000023 823271 292213 2016/11/7
1000023 10626 292173 2016/11/7
1000023 2282389 292221 2016/11/7
1000023 1451751 289648 2015/4/7
1000023 823271 292202 2016/11/7
1000023 1661710 289652 2015/4/7
1000023 1086373 292213 2016/11/7
1000023 1854328 292173 2016/11/7
1000023 963867 290497 2015/4/7
1000023 406023 289648 2015/4/7
1000023 211396 290802 2015/4/7
1000023 719501 289652 2015/4/7
1000023 1509387 290496 2015/4/7
1000023 1086373 292221 2016/11/7
1000023 963867 290497 2015/4/7
1000023 1597616 292213 2016/11/7
1000023 1536356 289652 2015/4/7
1000023 2282389 292213 2016/11/7
1000023 1536356 290497 2015/4/7
1000023 1867463 292202 2016/11/7
1000023 2356403 292178 2015/10/27
1000023 196966 290497 2015/4/7
1000023 1144058 292202 2016/11/7
1000023 259210 292202 2016/11/7
1000023 1536356 289648 2015/4/7
1000041 744402 107950 2015/4/7
1000041 1166120 107950 2015/4/7
1000041 1514527 107950 2015/4/7
1000041 793936 357766 2015/4/7
1000041 1617063 357766 2015/4/7
1000041 793936 107950 2015/4/7
1000043 2414987 661415 2016/1/26
1000043 1878751 563862 2016/1/26
1000043 2527958 415597 2016/1/12
1000043 1288675 347600 2016/1/26
1000043 1668672 537277 2016/1/26
1000043 5973865 2084280 2017/1/9
1000043 1012668 299174 2015/10/13
1000043 1301405 415765 2016/1/26
1000043 2252908 669278 2016/1/26
1000043 1910747 133016 2016/10/15
1000043 760352 620837 2015/12/27
1000043 1789216 578613 2016/1/26

that is the trade record peer to  peer.

I want to find whether the seller buy again in the next 30 days,and ran the following code

 

data want;
  set p2p curobs=obs1;
  obs2=obs1+1;
    do while (date2-date<30);
    set p2p(
        rename=(
        date=date2 
        sellid=sellid2   
        buyid=buyid2
        id=id2
        )
      ) point=obs2
    ;
    If buyid2=sellid  Then
      Flag = 1; 
    Else Flag = 0; 
    output;
    obs2+1; 
 end;
   
run;

 

The data step runing in a dead loop!! 

Please give me your help hand, thanks.

11 REPLIES 11
Ksharp
Super User

Better post the output you want.

Here could give you a start.

 

data have;
infile cards expandtabs;
input sellid	buyid	id	date : yymmdd10.;
format date yymmdd10.;
cards;
1000023	1144058	292173	2016/11/7
1000023	477211	292178	2015/10/27
1000023	1569350	292178	2015/10/27
1000023	318100	289648	2015/4/7
1000023	350832	289652	2015/4/7
1000023	560400	289648	2015/4/7
1000023	259210	292173	2016/11/7
1000023	1466757	289652	2015/4/7
1000023	10626	292213	2016/11/7
1000023	259210	292221	2016/11/7
1000023	328802	292221	2016/11/7
1000023	827855	289648	2015/4/7
1000023	10626	292173	2016/11/7
1000023	823271	292173	2016/11/7
1000023	1597616	292202	2016/11/7
1000023	5068647	292202	2016/11/7
1000023	823271	292213	2016/11/7
1000023	10626	292173	2016/11/7
1000023	2282389	292221	2016/11/7
1000023	1451751	289648	2015/4/7
1000023	823271	292202	2016/11/7
1000023	1661710	289652	2015/4/7
1000023	1086373	292213	2016/11/7
1000023	1854328	292173	2016/11/7
1000023	963867	290497	2015/4/7
1000023	406023	289648	2015/4/7
1000023	211396	290802	2015/4/7
1000023	719501	289652	2015/4/7
1000023	1509387	290496	2015/4/7
1000023	1086373	292221	2016/11/7
1000023	963867	290497	2015/4/7
1000023	1597616	292213	2016/11/7
1000023	1536356	289652	2015/4/7
1000023	2282389	292213	2016/11/7
1000023	1536356	290497	2015/4/7
1000023	1867463	292202	2016/11/7
1000023	2356403	292178	2015/10/27
1000023	196966	290497	2015/4/7
1000023	1144058	292202	2016/11/7
1000023	259210	292202	2016/11/7
1000023	1536356	289648	2015/4/7
1000041	744402	107950	2015/4/7
1000041	1166120	107950	2015/4/7
1000041	1514527	107950	2015/4/7
1000041	793936	357766	2015/4/7
1000041	1617063	357766	2015/4/7
1000041	793936	107950	2015/4/7
1000043	2414987	661415	2016/1/26
1000043	1878751	563862	2016/1/26
1000043	2527958	415597	2016/1/12
1000043	1288675	347600	2016/1/26
1000043	1668672	537277	2016/1/26
1000043	5973865	2084280	2017/1/9
1000043	1012668	299174	2015/10/13
1000043	1301405	415765	2016/1/26
1000043	2252908	669278	2016/1/26
1000043	1910747	133016	2016/10/15
1000043	760352	620837	2015/12/27
1000043	1789216	578613	2016/1/26
;
data want;
 if _n_=1 then do;
  if 0 then set have;
  declare hash h(dataset:'have',hashexp:20);
  h.definekey('buyid','date');
  h.definedone();
 end;
set have;
do i=date to date+30;
if h.check(key:sellid,key:i)=0 then buy_in_next_30=1;
end;
drop i;
run;
ShufeGuoding
Obsidian | Level 7

Thanks for your  solution. I don't understand the necessity of  the following statement

if 0 then set have;

and what the 0 stands for in that statements.

Ksharp
Super User

Here initialize the PDV and make PDV have variables in dataset HAVE, 

so there would not be an error(variable is not initialize) information when build a Hash Table in the next code .

mkeintz
PROC Star

The "if 0" is an if test that will always fail, so the statement

   if 0 then set have;

means that this statement will NEVER actually read data from have.

 

But even though data is not actually transcribed from have via this statement, the sas compiler still sees the "set have", and consequently expands the program data vector (PDV) to include all variables in the have dataset.  The reason this is needed is because the subsequent "declare hash h (dataset:'have',hashexp:20)" is NOT similarly used by the sas compiler to expand the pdv.

 

Consider this program, which looks up the data for "Alfred" in the hash object h.

 

data _null_;
  *if 0 then set sashelp.class;
  declare hash h (dataset:'sashelp.class');
    h.definekey('name');
    h.definedata(all:'Y');
    h.definedone();
  name='Alfred';
  rc=h.find();
  put (_all_) (=);
run;

 

 

It generates an error message in the log:

 

ERROR: Undeclared data symbol Sex for hash object at line 290 column 5.
ERROR: DATA STEP Component Object failure.  Aborted during the EXECUTION phase.

which simply means that even though there is data for a variable named SEX in the hash object, there is no variable named SEX in the pdv,  So the FIND method, which is intended to transfer data from the hash object to the pdv, fails. 

 

 

Now try the same program, but de-comment the "if 0 then ..." statement.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
ShufeGuoding
Obsidian | Level 7

When readding an observation from data set  using set statement in each data step iteration, Can we only do the lookup in a subset containing the observations within a date interval of 30 days from current date. In other words, is there a way to load a subset rather than the whole dataset into hash object and remove it after lookup finish, and load another subset into hash dynamically?  

Ksharp
Super User

You can do that, But that need a GROUP variable and lookup an obs within a group .

Ksharp
Super User

The following cod could give a new start. But I am not sure if it would get what you need.

 

 

data have;
infile cards expandtabs;
input sellid	buyid	id	date : yymmdd10.;
format date yymmdd10.;
cards;
1000023	1144058	292173	2016/11/7
1000023	477211	292178	2015/10/27
1000023	1000023	292178	2015/10/28
1000023	318100	289648	2015/4/7
1000023	350832	289652	2015/4/7
1000023	560400	289648	2015/4/7
1000023	259210	292173	2016/11/7
1000023	1466757	289652	2015/4/7
1000023	10626	292213	2016/11/7
1000023	259210	292221	2016/11/7
1000023	328802	292221	2016/11/7
1000023	827855	289648	2015/4/7
1000023	10626	292173	2016/11/7
1000023	823271	292173	2016/11/7
1000023	1597616	292202	2016/11/7
1000023	5068647	292202	2016/11/7
1000023	823271	292213	2016/11/7
1000023	10626	292173	2016/11/7
1000023	2282389	292221	2016/11/7
1000023	1451751	289648	2015/4/7
1000023	823271	292202	2016/11/7
1000023	1661710	289652	2015/4/7
1000023	1086373	292213	2016/11/7
1000023	1854328	292173	2016/11/7
1000023	963867	290497	2015/4/7
1000023	406023	289648	2015/4/7
1000023	211396	290802	2015/4/7
1000023	719501	289652	2015/4/7
1000023	1509387	290496	2015/4/7
1000023	1086373	292221	2016/11/7
1000023	963867	290497	2015/4/7
1000023	1597616	292213	2016/11/7
1000023	1536356	289652	2015/4/7
1000023	2282389	292213	2016/11/7
1000023	1536356	290497	2015/4/7
1000023	1867463	292202	2016/11/7
1000023	2356403	292178	2015/10/27
1000023	196966	290497	2015/4/7
1000023	1144058	292202	2016/11/7
1000023	259210	292202	2016/11/7
1000023	1536356	289648	2015/4/7
1000041	744402	107950	2015/4/7
1000041	1166120	107950	2015/4/7
1000041	1514527	107950	2015/4/7
1000041	793936	357766	2015/4/7
1000041	1617063	357766	2015/4/7
1000041	793936	107950	2015/4/7
1000043	2414987	661415	2016/1/26
1000043	1878751	563862	2016/1/26
1000043	2527958	415597	2016/1/12
1000043	1288675	347600	2016/1/26
1000043	1668672	537277	2016/1/26
1000043	5973865	2084280	2017/1/9
1000043	1012668	299174	2015/10/13
1000043	1301405	415765	2016/1/26
1000043	2252908	669278	2016/1/26
1000043	1910747	133016	2016/10/15
1000043	760352	620837	2015/12/27
1000043	1789216	578613	2016/1/26
;

data temp;
 set have(keep=sellid date rename=(sellid=buyid date=_date)) ;
 do i=_date to _date+30;
  date=i; if i=_date then buy_in_next_30=1;else buy_in_next_30=0;  output;
 end;
 format date yymmdd10.;
 drop _date i ;
run;

proc sql;
create table found as
 select buyid,date
  from have
 intersect
 select buyid,date
  from temp;

create table want as
 select *
  from temp
   where buyid in (select buyid from found) and buy_in_next_30=1;
quit;
mkeintz
PROC Star

 

What is the ID variable?  Is it a product or stock id?  When you say you are looking for a seller id as a buyer id in some later transaction within 30 days, does it also require ID of the sell transaction equal to ID of the buy transaction?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
ShufeGuoding
Obsidian | Level 7
only use sellid and buyid to identify the
person who buys in next 30 days after sell.
ShufeGuoding
Obsidian | Level 7
my data set has 20 millions observation and the lookup costs time very much.Is there any other solution?

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1599 views
  • 1 like
  • 4 in conversation