BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
hannes101
Calcite | Level 5

I am trying to replicate data cleaing steps, which are only available as SAS code and I am trying to understand, what the code actually does. This is part of code, which was used to clean TRACE data and was published by Jens **bleep**-Nielsen on SSRN. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2337908

 

I am struggling to understand the following part of the code, which are on p. 15-16 in the pdf. It basically is a merge operation and I would like to know, what the commands (in=qqq) and (in=qq) do. Are they some kind of internal ranking system, like Row_Number() in TSQL or what do they do.

* Take out reversals into a dataset;
data reversal temp_raw3;
set temp_raw2;
N=_N_;
if asof_cd=’R’ then output reversal;
else output temp_raw3;
run;
* Sorting the data so that it can be merged;
proc sort data=reversal (drop = N) nodupkey; by trd_exctn_dt
cusip_id trd_exctn_tm rptd_pr entrd_vol_qt rpt_side_cd cntra_mp_id
trd_rpt_dt trd_rpt_tm MSG_SEQ_NB; run;
proc sort data=temp_raw3; by trd_exctn_dt cusip_id trd_exctn_tm
rptd_pr entrd_vol_qt rpt_side_cd cntra_mp_id; run;

* Merges reversals back on and selects matching observations;
data reversal2;
merge temp_raw3 (in=qqq) reversal (in=qq) ;
by trd_exctn_dt cusip_id trd_exctn_tm rptd_pr
entrd_vol_qt rpt_side_cd cntra_mp_id;
if qq=1;
if qqq=1;
* Reversal have to be on a later date
15
* (or else it would not be a reversal);
* i.e. we do not delete potential as_of trades
* from a later date that may match;
if trd_exctn_dt < trd_rpt_dt;
run;

* Selects only 1 matching reversal (and keeps the rest);
proc sort data=reversal2 nodupkey; by trd_exctn_dt bond_sym_id
trd_exctn_tm rptd_pr entrd_vol_qt; run;
proc sort data=reversal2; by N; run;
proc sort data=temp_raw3; by N; run;
* Deletes the macthing reversals;
data temp_raw4;
merge reversal2 (in=qq) temp_raw3;
by N;
if qq=0;
run;
* Ends the filter for PRE-change data;

 I hope this is pretty easy to solve for anyone who has a certain knowledge of SAS and sorry if this might be off-topic.

Thanks a lot.

1 ACCEPTED SOLUTION

Accepted Solutions
RW9
Diamond | Level 26 RW9
Diamond | Level 26

The dataset option in= creates an alias flag indicating whether data from that particular observation appeared in the alias dataset.  So a simple example:

data table1;
  id=1; val=2; output
  id=2; val=5; output;
run;

data table2;
  id=1; val2=7; output;
  id=3; val2=3; output;
run;

data table3;
  merge table1 (in=a) table2 (in=b);
  by id;
run;

Will give (and note that the alias, like any temporary variable is dropped before writing out the dataset):

id   val1   val2  a   b
1 2 7 1 1
2 5 1 0
3 3 0 1

So the second obs only appears in table 1, so the flag for alias b is set to 0 and a to 1, to indicate data only taken from the first table.  The reverse is true for the third observation.  In the first observation, data appears in both tables so both alias flags are set to 1.

View solution in original post

4 REPLIES 4
RW9
Diamond | Level 26 RW9
Diamond | Level 26

The dataset option in= creates an alias flag indicating whether data from that particular observation appeared in the alias dataset.  So a simple example:

data table1;
  id=1; val=2; output
  id=2; val=5; output;
run;

data table2;
  id=1; val2=7; output;
  id=3; val2=3; output;
run;

data table3;
  merge table1 (in=a) table2 (in=b);
  by id;
run;

Will give (and note that the alias, like any temporary variable is dropped before writing out the dataset):

id   val1   val2  a   b
1 2 7 1 1
2 5 1 0
3 3 0 1

So the second obs only appears in table 1, so the flag for alias b is set to 0 and a to 1, to indicate data only taken from the first table.  The reverse is true for the third observation.  In the first observation, data appears in both tables so both alias flags are set to 1.

hannes101
Calcite | Level 5
* Merges reversals back on and selects matching observations;
data reversal2;
merge temp_raw3 (in=qqq) reversal (in=qq) ;
by trd_exctn_dt cusip_id trd_exctn_tm rptd_pr
entrd_vol_qt rpt_side_cd cntra_mp_id;
if qq=1;
if qqq=1;

So this is basically just an inner join, by only selecting rows, which are part of both datasets, correct?

Thanks a lot for your help.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Yes, it will only keep when in both incoming datasets.

Kurt_Bremser
Super User

Instead of three subsetting ifs in the data reversal2 step, it can all be put into one, and you can make use of the fact that numerical missing and zero are false, everything else is true:

data
  reversal
  temp_raw3
;
set temp_raw2;
N = _N_;
if asof_cd = ’R’
then output reversal;
else output temp_raw3;
run;

* Sorting the data so that it can be merged;
proc sort data=reversal (drop = N) nodupkey;
by
  trd_exctn_dt cusip_id trd_exctn_tm rptd_pr
  entrd_vol_qt rpt_side_cd cntra_mp_id
  trd_rpt_dt trd_rpt_tm MSG_SEQ_NB
;
run;

proc sort data=temp_raw3;
by
  trd_exctn_dt cusip_id trd_exctn_tm rptd_pr
  entrd_vol_qt rpt_side_cd cntra_mp_id
;
run;

* Merges reversals back on and selects matching observations;
data reversal2;
merge
  temp_raw3 (in=qqq)
  reversal (in=qq)
;
by
  trd_exctn_dt cusip_id trd_exctn_tm rptd_pr
  entrd_vol_qt rpt_side_cd cntra_mp_id
;
if qq and qqq and trd_exctn_dt < trd_rpt_dt; /* see here! */
* Reversal have to be on a later date
* (or else it would not be a reversal);
* i.e. we do not delete potential as_of trades
* from a later date that may match;
run;

* Selects only 1 matching reversal (and keeps the rest);
proc sort data=reversal2 nodupkey;
by trd_exctn_dt bond_sym_id trd_exctn_tm rptd_pr entrd_vol_qt;
run;

proc sort data=reversal2;
by N;
run;

proc sort data=temp_raw3;
by N;
run;

* Deletes the macthing reversals;
data temp_raw4;
merge
  reversal2 (in=qq)
  temp_raw3
;
by N;
if not qq;
run;

Note how visual formatting goes a long way towards making code intelligible.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1641 views
  • 0 likes
  • 3 in conversation