BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
curiosity
Obsidian | Level 7

Hi,

 

I have a dataset of trading data where some of the data is shrunk if two trades happen quickly so the pricing is off.

The data skips the user going from stock TESLA to USD then USD to AAPL and instead goes straight from TSLA to AAPL.

 

data have1;
input buySymbol $ sellSymbol $ BuyPriceToday BuyPriceTodayUSD;
datalines;
AAPL USD 20 20
NVDA USD 50 50
HAL USD 35 35
DIS USD 30 30
GOOG AAPL 0.25 .
FVRR DIS 0.1 .
TSLA AAPL 0.9 .
;
run;

 

 

 

 

I'd like to look up all cases where BuyPriceTodayUSD = . (missing) and check if the sellSymbol is a buySymbol in the same dataset and where the BuyPriceUSD = not missing.

Then I would like to have the BuyPriceTodayUSD calculated so that its = buyPricetoday * the value we have found earlier in the BuyPriceTodayUSD.

 

In the current example TSLA was bought at a price of 0.9 of an AAPL share.

So it should look to find AAPL and see that AAPL traded at BuyPriceTodayUSD of 20 and then calculate the BuyPriceTodayUSD of TSLA as 0.9 * 20 = 18.

 

In excel i guess this is similar to a vlookup.

 

 

 

Data Want should look like this:

 

Only difference is the 18 is filled in.

 

data want;
input buySymbol $ sellSymbol $ BuyPriceToday BuyPriceTodayUSD;
datalines;
AAPL USD 20 20
NVDA USD 50 50
HAL USD 35 35
DIS USD 30 30
GOOG AAPL 0.25 5
FVRR DIS 0.1 3
TSLA AAPL 0.9 18
;
run;

 

 

I tried a proc sql + case when + a left join using the same dataset in both a and b but i am not doing it right.

 

The simplest solution would be ideal. Thank you in advance 🙂

 

Note: its a small dataset of only 100,000 observations so runtime is not an issue.

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

If I grasped your descriptions correctly and in full then below should work.

data have;
  input buySymbol $ sellSymbol $ BuyPriceToday BuyPriceTodayUSD;
  datalines;
AAPL USD 20 20
NVDA USD 50 50
HAL USD 35 35
DIS USD 30 30
GOOG AAPL 0.25 .
FVRR DIS 0.1 .
TSLA AAPL 0.9 .
;

data desired;
  input buySymbol $ sellSymbol $ BuyPriceToday BuyPriceTodayUSD;
datalines;
AAPL USD 20 20
NVDA USD 50 50
HAL USD 35 35
DIS USD 30 30
GOOG AAPL 0.25 5
FVRR DIS 0.1 3
TSLA AAPL 0.9 18
;

data want;
  if _n_=1 then
    do;
      dcl hash h1(dataset:'have(where=(sellSymbol="USD"))');
      h1.defineKey('buySymbol');
      h1.defineData('BuyPriceTodayUSD');
      h1.defineDone();
    end;

  set have;

  if missing(BuyPriceTodayUSD) and sellSymbol ne 'USD' then
    do;
      if h1.find(key:sellSymbol) = 0 then
        BuyPriceTodayUSD=BuyPriceToday*BuyPriceTodayUSD;
    end;
run;

proc print data=desired;
run;
proc print data=want;
run;

View solution in original post

6 REPLIES 6
A_Kh
Lapis Lazuli | Level 10

For the start, think about using RETAIN statement. Depending on your data and conditions the code could be developed further. 

For your given data, the below code is the simplest way. 

data have1;
input buySymbol $ sellSymbol $ BuyPriceToday BuyPriceUSD;
datalines;
AAPL USD 20 20
TSLA AAPL 0.9 .
AAPL USD 20 10
TSLA AAPL 0.5 .
;
run;

data want;
	set have1;
	retain nonmiss;
	if BuyPriceUSD ne . then nonmiss=BuyPriceUSD;
	if BuyPriceUSD eq . then BuyPriceUSD= BuyPriceToday*nonmiss;
	drop nonmiss; 
proc print;run; 

Capture.PNG

curiosity
Obsidian | Level 7

The RETAIN function doesn't work in a larger dataset for me.

 

I've expanded my have below to illustrate the issue.

data have1;
input buySymbol $ sellSymbol $ BuyPriceToday BuyPriceTodayUSD;
datalines;
AAPL USD 20 20
NVDA USD 50 50
HAL USD 35 35
DIS USD 30 30
GOOG AAPL 0.25 .
FVRR DIS 0.1 .
TSLA AAPL 0.9 .
;
run;

the last three missing rows missing values should be:

0.25 * 20 = 5

0.1 *  30 = 3

0.9 * 20 = 18

Tom
Super User Tom
Super User

I don't follow what you are doing.  And you haven't provided enough examples to fully show the pattern. 

 

It seems like you are making an assumption that because the BuyPriceUSD column is missing that somehow you can derive its value from some other transaction?  Is your set of transactions limited to only one per BUY value?  If not then why did you pick that transaction and not some other one that might also have BUY='AAPL'?  Is there some time frame where the other transaction has to have occurred for it to be considered to be used? 

 

Is there an significance in the fact that the value 'USD" in the SELL variable of the observation you picked matches the suffix on the variable, BuyPriceUSD, with the missing value that you are updating?   Does the SELL have to be of USD for the observation to be considered for use in the transaction?  

 

And why did you multiply?  Does it make any difference that on the observations you picked to find a value that the two prices are identical?  What if they aren't?  How does that change your formula?

 

curiosity
Obsidian | Level 7

Sorry for the lack of clarity, was trying to avoid details for brevity but i realize now they were important details missing.

 

  1. I need the BuyPriceTodayUSD of each BuySymbol ticker and this is a once a day dataset.  I have many of these datasets that I'll do this analysis on.
  2. Some buySymbol are missing a USD value because they were traded with another security instead of against USD.
  3. Each buySymbol only shows up once in this daily dataset.
  4. Each sellSymbol can show up multiple times (as is the case where tickers are often bought with USD).
  5. If the sellsymbol is USD then BuyPriceToday = BuyPriceTodayUSD
  6. The buyPriceToday is the cost to buy a share of buySymbol using a share of sellSymbol.
    For example in the first row, we know the cost to buy one AAPL by selling one USD is equal to 20 and that is also the price today in USD.
    For example in the last row, the cost to buy TSLA was 0.9 shares of APPL.  By looking up we see the price of AAPL in BuyPriceTodayUSD is 20 so the USD price of TSLA must be 0.9 * 20 = 18.

Does that help make it more clear?

Patrick
Opal | Level 21

If I grasped your descriptions correctly and in full then below should work.

data have;
  input buySymbol $ sellSymbol $ BuyPriceToday BuyPriceTodayUSD;
  datalines;
AAPL USD 20 20
NVDA USD 50 50
HAL USD 35 35
DIS USD 30 30
GOOG AAPL 0.25 .
FVRR DIS 0.1 .
TSLA AAPL 0.9 .
;

data desired;
  input buySymbol $ sellSymbol $ BuyPriceToday BuyPriceTodayUSD;
datalines;
AAPL USD 20 20
NVDA USD 50 50
HAL USD 35 35
DIS USD 30 30
GOOG AAPL 0.25 5
FVRR DIS 0.1 3
TSLA AAPL 0.9 18
;

data want;
  if _n_=1 then
    do;
      dcl hash h1(dataset:'have(where=(sellSymbol="USD"))');
      h1.defineKey('buySymbol');
      h1.defineData('BuyPriceTodayUSD');
      h1.defineDone();
    end;

  set have;

  if missing(BuyPriceTodayUSD) and sellSymbol ne 'USD' then
    do;
      if h1.find(key:sellSymbol) = 0 then
        BuyPriceTodayUSD=BuyPriceToday*BuyPriceTodayUSD;
    end;
run;

proc print data=desired;
run;
proc print data=want;
run;
curiosity
Obsidian | Level 7

This is elegant and worked perfectly.


Thank you!

 

The amount of hours I spent trying to solve this ..

I'm glad I posted here for help because I would have likely still not solved it and never with this solution.

Appreciate it 🙏

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 655 views
  • 5 likes
  • 4 in conversation