Hi,
I have a dataset of trading data where some of the data is shrunk if two trades happen quickly so the pricing is off.
The data skips the user going from stock TESLA to USD then USD to AAPL and instead goes straight from TSLA to AAPL.
data have1; input buySymbol $ sellSymbol $ BuyPriceToday BuyPriceTodayUSD; datalines; AAPL USD 20 20 NVDA USD 50 50 HAL USD 35 35 DIS USD 30 30 GOOG AAPL 0.25 . FVRR DIS 0.1 . TSLA AAPL 0.9 . ; run;
I'd like to look up all cases where BuyPriceTodayUSD = . (missing) and check if the sellSymbol is a buySymbol in the same dataset and where the BuyPriceUSD = not missing.
Then I would like to have the BuyPriceTodayUSD calculated so that its = buyPricetoday * the value we have found earlier in the BuyPriceTodayUSD.
In the current example TSLA was bought at a price of 0.9 of an AAPL share.
So it should look to find AAPL and see that AAPL traded at BuyPriceTodayUSD of 20 and then calculate the BuyPriceTodayUSD of TSLA as 0.9 * 20 = 18.
In excel i guess this is similar to a vlookup.
Data Want should look like this:
Only difference is the 18 is filled in.
data want; input buySymbol $ sellSymbol $ BuyPriceToday BuyPriceTodayUSD; datalines; AAPL USD 20 20 NVDA USD 50 50 HAL USD 35 35 DIS USD 30 30 GOOG AAPL 0.25 5 FVRR DIS 0.1 3 TSLA AAPL 0.9 18 ; run;
I tried a proc sql + case when + a left join using the same dataset in both a and b but i am not doing it right.
The simplest solution would be ideal. Thank you in advance 🙂
Note: its a small dataset of only 100,000 observations so runtime is not an issue.
If I grasped your descriptions correctly and in full then below should work.
data have;
input buySymbol $ sellSymbol $ BuyPriceToday BuyPriceTodayUSD;
datalines;
AAPL USD 20 20
NVDA USD 50 50
HAL USD 35 35
DIS USD 30 30
GOOG AAPL 0.25 .
FVRR DIS 0.1 .
TSLA AAPL 0.9 .
;
data desired;
input buySymbol $ sellSymbol $ BuyPriceToday BuyPriceTodayUSD;
datalines;
AAPL USD 20 20
NVDA USD 50 50
HAL USD 35 35
DIS USD 30 30
GOOG AAPL 0.25 5
FVRR DIS 0.1 3
TSLA AAPL 0.9 18
;
data want;
if _n_=1 then
do;
dcl hash h1(dataset:'have(where=(sellSymbol="USD"))');
h1.defineKey('buySymbol');
h1.defineData('BuyPriceTodayUSD');
h1.defineDone();
end;
set have;
if missing(BuyPriceTodayUSD) and sellSymbol ne 'USD' then
do;
if h1.find(key:sellSymbol) = 0 then
BuyPriceTodayUSD=BuyPriceToday*BuyPriceTodayUSD;
end;
run;
proc print data=desired;
run;
proc print data=want;
run;
For the start, think about using RETAIN statement. Depending on your data and conditions the code could be developed further.
For your given data, the below code is the simplest way.
data have1;
input buySymbol $ sellSymbol $ BuyPriceToday BuyPriceUSD;
datalines;
AAPL USD 20 20
TSLA AAPL 0.9 .
AAPL USD 20 10
TSLA AAPL 0.5 .
;
run;
data want;
set have1;
retain nonmiss;
if BuyPriceUSD ne . then nonmiss=BuyPriceUSD;
if BuyPriceUSD eq . then BuyPriceUSD= BuyPriceToday*nonmiss;
drop nonmiss;
proc print;run;
The RETAIN function doesn't work in a larger dataset for me.
I've expanded my have below to illustrate the issue.
data have1; input buySymbol $ sellSymbol $ BuyPriceToday BuyPriceTodayUSD; datalines; AAPL USD 20 20 NVDA USD 50 50 HAL USD 35 35 DIS USD 30 30 GOOG AAPL 0.25 . FVRR DIS 0.1 . TSLA AAPL 0.9 . ; run;
the last three missing rows missing values should be:
0.25 * 20 = 5
0.1 * 30 = 3
0.9 * 20 = 18
I don't follow what you are doing. And you haven't provided enough examples to fully show the pattern.
It seems like you are making an assumption that because the BuyPriceUSD column is missing that somehow you can derive its value from some other transaction? Is your set of transactions limited to only one per BUY value? If not then why did you pick that transaction and not some other one that might also have BUY='AAPL'? Is there some time frame where the other transaction has to have occurred for it to be considered to be used?
Is there an significance in the fact that the value 'USD" in the SELL variable of the observation you picked matches the suffix on the variable, BuyPriceUSD, with the missing value that you are updating? Does the SELL have to be of USD for the observation to be considered for use in the transaction?
And why did you multiply? Does it make any difference that on the observations you picked to find a value that the two prices are identical? What if they aren't? How does that change your formula?
Sorry for the lack of clarity, was trying to avoid details for brevity but i realize now they were important details missing.
Does that help make it more clear?
If I grasped your descriptions correctly and in full then below should work.
data have;
input buySymbol $ sellSymbol $ BuyPriceToday BuyPriceTodayUSD;
datalines;
AAPL USD 20 20
NVDA USD 50 50
HAL USD 35 35
DIS USD 30 30
GOOG AAPL 0.25 .
FVRR DIS 0.1 .
TSLA AAPL 0.9 .
;
data desired;
input buySymbol $ sellSymbol $ BuyPriceToday BuyPriceTodayUSD;
datalines;
AAPL USD 20 20
NVDA USD 50 50
HAL USD 35 35
DIS USD 30 30
GOOG AAPL 0.25 5
FVRR DIS 0.1 3
TSLA AAPL 0.9 18
;
data want;
if _n_=1 then
do;
dcl hash h1(dataset:'have(where=(sellSymbol="USD"))');
h1.defineKey('buySymbol');
h1.defineData('BuyPriceTodayUSD');
h1.defineDone();
end;
set have;
if missing(BuyPriceTodayUSD) and sellSymbol ne 'USD' then
do;
if h1.find(key:sellSymbol) = 0 then
BuyPriceTodayUSD=BuyPriceToday*BuyPriceTodayUSD;
end;
run;
proc print data=desired;
run;
proc print data=want;
run;
This is elegant and worked perfectly.
Thank you!
The amount of hours I spent trying to solve this ..
I'm glad I posted here for help because I would have likely still not solved it and never with this solution.
Appreciate it 🙏
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.