I have a data set p2p as following
sellid | buyid | id | date |
1000023 | 1144058 | 292173 | 2016/11/7 |
1000023 | 477211 | 292178 | 2015/10/27 |
1000023 | 1569350 | 292178 | 2015/10/27 |
1000023 | 318100 | 289648 | 2015/4/7 |
1000023 | 350832 | 289652 | 2015/4/7 |
1000023 | 560400 | 289648 | 2015/4/7 |
1000023 | 259210 | 292173 | 2016/11/7 |
1000023 | 1466757 | 289652 | 2015/4/7 |
1000023 | 10626 | 292213 | 2016/11/7 |
1000023 | 259210 | 292221 | 2016/11/7 |
1000023 | 328802 | 292221 | 2016/11/7 |
1000023 | 827855 | 289648 | 2015/4/7 |
1000023 | 10626 | 292173 | 2016/11/7 |
1000023 | 823271 | 292173 | 2016/11/7 |
1000023 | 1597616 | 292202 | 2016/11/7 |
1000023 | 5068647 | 292202 | 2016/11/7 |
1000023 | 823271 | 292213 | 2016/11/7 |
1000023 | 10626 | 292173 | 2016/11/7 |
1000023 | 2282389 | 292221 | 2016/11/7 |
1000023 | 1451751 | 289648 | 2015/4/7 |
1000023 | 823271 | 292202 | 2016/11/7 |
1000023 | 1661710 | 289652 | 2015/4/7 |
1000023 | 1086373 | 292213 | 2016/11/7 |
1000023 | 1854328 | 292173 | 2016/11/7 |
1000023 | 963867 | 290497 | 2015/4/7 |
1000023 | 406023 | 289648 | 2015/4/7 |
1000023 | 211396 | 290802 | 2015/4/7 |
1000023 | 719501 | 289652 | 2015/4/7 |
1000023 | 1509387 | 290496 | 2015/4/7 |
1000023 | 1086373 | 292221 | 2016/11/7 |
1000023 | 963867 | 290497 | 2015/4/7 |
1000023 | 1597616 | 292213 | 2016/11/7 |
1000023 | 1536356 | 289652 | 2015/4/7 |
1000023 | 2282389 | 292213 | 2016/11/7 |
1000023 | 1536356 | 290497 | 2015/4/7 |
1000023 | 1867463 | 292202 | 2016/11/7 |
1000023 | 2356403 | 292178 | 2015/10/27 |
1000023 | 196966 | 290497 | 2015/4/7 |
1000023 | 1144058 | 292202 | 2016/11/7 |
1000023 | 259210 | 292202 | 2016/11/7 |
1000023 | 1536356 | 289648 | 2015/4/7 |
1000041 | 744402 | 107950 | 2015/4/7 |
1000041 | 1166120 | 107950 | 2015/4/7 |
1000041 | 1514527 | 107950 | 2015/4/7 |
1000041 | 793936 | 357766 | 2015/4/7 |
1000041 | 1617063 | 357766 | 2015/4/7 |
1000041 | 793936 | 107950 | 2015/4/7 |
1000043 | 2414987 | 661415 | 2016/1/26 |
1000043 | 1878751 | 563862 | 2016/1/26 |
1000043 | 2527958 | 415597 | 2016/1/12 |
1000043 | 1288675 | 347600 | 2016/1/26 |
1000043 | 1668672 | 537277 | 2016/1/26 |
1000043 | 5973865 | 2084280 | 2017/1/9 |
1000043 | 1012668 | 299174 | 2015/10/13 |
1000043 | 1301405 | 415765 | 2016/1/26 |
1000043 | 2252908 | 669278 | 2016/1/26 |
1000043 | 1910747 | 133016 | 2016/10/15 |
1000043 | 760352 | 620837 | 2015/12/27 |
1000043 | 1789216 | 578613 | 2016/1/26 |
that is the trade record peer to peer.
I want to find whether the seller buy again in the next 30 days,and ran the following code
data want;
set p2p curobs=obs1;
obs2=obs1+1;
do while (date2-date<30);
set p2p(
rename=(
date=date2
sellid=sellid2
buyid=buyid2
id=id2
)
) point=obs2
;
If buyid2=sellid Then
Flag = 1;
Else Flag = 0;
output;
obs2+1;
end;
run;
The data step runing in a dead loop!!
Please give me your help hand, thanks.
Better post the output you want.
Here could give you a start.
data have;
infile cards expandtabs;
input sellid buyid id date : yymmdd10.;
format date yymmdd10.;
cards;
1000023 1144058 292173 2016/11/7
1000023 477211 292178 2015/10/27
1000023 1569350 292178 2015/10/27
1000023 318100 289648 2015/4/7
1000023 350832 289652 2015/4/7
1000023 560400 289648 2015/4/7
1000023 259210 292173 2016/11/7
1000023 1466757 289652 2015/4/7
1000023 10626 292213 2016/11/7
1000023 259210 292221 2016/11/7
1000023 328802 292221 2016/11/7
1000023 827855 289648 2015/4/7
1000023 10626 292173 2016/11/7
1000023 823271 292173 2016/11/7
1000023 1597616 292202 2016/11/7
1000023 5068647 292202 2016/11/7
1000023 823271 292213 2016/11/7
1000023 10626 292173 2016/11/7
1000023 2282389 292221 2016/11/7
1000023 1451751 289648 2015/4/7
1000023 823271 292202 2016/11/7
1000023 1661710 289652 2015/4/7
1000023 1086373 292213 2016/11/7
1000023 1854328 292173 2016/11/7
1000023 963867 290497 2015/4/7
1000023 406023 289648 2015/4/7
1000023 211396 290802 2015/4/7
1000023 719501 289652 2015/4/7
1000023 1509387 290496 2015/4/7
1000023 1086373 292221 2016/11/7
1000023 963867 290497 2015/4/7
1000023 1597616 292213 2016/11/7
1000023 1536356 289652 2015/4/7
1000023 2282389 292213 2016/11/7
1000023 1536356 290497 2015/4/7
1000023 1867463 292202 2016/11/7
1000023 2356403 292178 2015/10/27
1000023 196966 290497 2015/4/7
1000023 1144058 292202 2016/11/7
1000023 259210 292202 2016/11/7
1000023 1536356 289648 2015/4/7
1000041 744402 107950 2015/4/7
1000041 1166120 107950 2015/4/7
1000041 1514527 107950 2015/4/7
1000041 793936 357766 2015/4/7
1000041 1617063 357766 2015/4/7
1000041 793936 107950 2015/4/7
1000043 2414987 661415 2016/1/26
1000043 1878751 563862 2016/1/26
1000043 2527958 415597 2016/1/12
1000043 1288675 347600 2016/1/26
1000043 1668672 537277 2016/1/26
1000043 5973865 2084280 2017/1/9
1000043 1012668 299174 2015/10/13
1000043 1301405 415765 2016/1/26
1000043 2252908 669278 2016/1/26
1000043 1910747 133016 2016/10/15
1000043 760352 620837 2015/12/27
1000043 1789216 578613 2016/1/26
;
data want;
if _n_=1 then do;
if 0 then set have;
declare hash h(dataset:'have',hashexp:20);
h.definekey('buyid','date');
h.definedone();
end;
set have;
do i=date to date+30;
if h.check(key:sellid,key:i)=0 then buy_in_next_30=1;
end;
drop i;
run;
Thanks for your solution. I don't understand the necessity of the following statement
if 0 then set have;
and what the 0 stands for in that statements.
Here initialize the PDV and make PDV have variables in dataset HAVE,
so there would not be an error(variable is not initialize) information when build a Hash Table in the next code .
The "if 0" is an if test that will always fail, so the statement
if 0 then set have;
means that this statement will NEVER actually read data from have.
But even though data is not actually transcribed from have via this statement, the sas compiler still sees the "set have", and consequently expands the program data vector (PDV) to include all variables in the have dataset. The reason this is needed is because the subsequent "declare hash h (dataset:'have',hashexp:20)" is NOT similarly used by the sas compiler to expand the pdv.
Consider this program, which looks up the data for "Alfred" in the hash object h.
data _null_;
*if 0 then set sashelp.class;
declare hash h (dataset:'sashelp.class');
h.definekey('name');
h.definedata(all:'Y');
h.definedone();
name='Alfred';
rc=h.find();
put (_all_) (=);
run;
It generates an error message in the log:
ERROR: Undeclared data symbol Sex for hash object at line 290 column 5.
ERROR: DATA STEP Component Object failure. Aborted during the EXECUTION phase.
which simply means that even though there is data for a variable named SEX in the hash object, there is no variable named SEX in the pdv, So the FIND method, which is intended to transfer data from the hash object to the pdv, fails.
Now try the same program, but de-comment the "if 0 then ..." statement.
When readding an observation from data set using set statement in each data step iteration, Can we only do the lookup in a subset containing the observations within a date interval of 30 days from current date. In other words, is there a way to load a subset rather than the whole dataset into hash object and remove it after lookup finish, and load another subset into hash dynamically?
You can do that, But that need a GROUP variable and lookup an obs within a group .
The following cod could give a new start. But I am not sure if it would get what you need.
data have;
infile cards expandtabs;
input sellid buyid id date : yymmdd10.;
format date yymmdd10.;
cards;
1000023 1144058 292173 2016/11/7
1000023 477211 292178 2015/10/27
1000023 1000023 292178 2015/10/28
1000023 318100 289648 2015/4/7
1000023 350832 289652 2015/4/7
1000023 560400 289648 2015/4/7
1000023 259210 292173 2016/11/7
1000023 1466757 289652 2015/4/7
1000023 10626 292213 2016/11/7
1000023 259210 292221 2016/11/7
1000023 328802 292221 2016/11/7
1000023 827855 289648 2015/4/7
1000023 10626 292173 2016/11/7
1000023 823271 292173 2016/11/7
1000023 1597616 292202 2016/11/7
1000023 5068647 292202 2016/11/7
1000023 823271 292213 2016/11/7
1000023 10626 292173 2016/11/7
1000023 2282389 292221 2016/11/7
1000023 1451751 289648 2015/4/7
1000023 823271 292202 2016/11/7
1000023 1661710 289652 2015/4/7
1000023 1086373 292213 2016/11/7
1000023 1854328 292173 2016/11/7
1000023 963867 290497 2015/4/7
1000023 406023 289648 2015/4/7
1000023 211396 290802 2015/4/7
1000023 719501 289652 2015/4/7
1000023 1509387 290496 2015/4/7
1000023 1086373 292221 2016/11/7
1000023 963867 290497 2015/4/7
1000023 1597616 292213 2016/11/7
1000023 1536356 289652 2015/4/7
1000023 2282389 292213 2016/11/7
1000023 1536356 290497 2015/4/7
1000023 1867463 292202 2016/11/7
1000023 2356403 292178 2015/10/27
1000023 196966 290497 2015/4/7
1000023 1144058 292202 2016/11/7
1000023 259210 292202 2016/11/7
1000023 1536356 289648 2015/4/7
1000041 744402 107950 2015/4/7
1000041 1166120 107950 2015/4/7
1000041 1514527 107950 2015/4/7
1000041 793936 357766 2015/4/7
1000041 1617063 357766 2015/4/7
1000041 793936 107950 2015/4/7
1000043 2414987 661415 2016/1/26
1000043 1878751 563862 2016/1/26
1000043 2527958 415597 2016/1/12
1000043 1288675 347600 2016/1/26
1000043 1668672 537277 2016/1/26
1000043 5973865 2084280 2017/1/9
1000043 1012668 299174 2015/10/13
1000043 1301405 415765 2016/1/26
1000043 2252908 669278 2016/1/26
1000043 1910747 133016 2016/10/15
1000043 760352 620837 2015/12/27
1000043 1789216 578613 2016/1/26
;
data temp;
set have(keep=sellid date rename=(sellid=buyid date=_date)) ;
do i=_date to _date+30;
date=i; if i=_date then buy_in_next_30=1;else buy_in_next_30=0; output;
end;
format date yymmdd10.;
drop _date i ;
run;
proc sql;
create table found as
select buyid,date
from have
intersect
select buyid,date
from temp;
create table want as
select *
from temp
where buyid in (select buyid from found) and buy_in_next_30=1;
quit;
What is the ID variable? Is it a product or stock id? When you say you are looking for a seller id as a buyer id in some later transaction within 30 days, does it also require ID of the sell transaction equal to ID of the buy transaction?
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.