suppsoe that I have the following data
company | date diff | price |
A | -2 | 2 |
A | -1 | 5 |
A | 0 | 3 |
A | 1 | 1 |
A | 2 | 7 |
What I would like to do is to find the "date diff" = 0, and take the price of that "date diff" and subtract it from all the other prices. So here I need to subtract 3 from all the other prices and get the following:
company | date diff | price | price diff |
A | -2 | 2 | -1 |
A | -1 | 5 | 2 |
A | 0 | 3 | 0 |
A | 1 | 1 | -2 |
A | 2 | 7 | 4 |
Of course I can by inspection find the price of the date diff = 0 and then subtract it from the other prices, but I would like to have a dynamic model that can be used form many companies (A, B, C ....). That is why I imagine that somehow the price = 3 can be put into a macro variable and then subtracted from the other prices to create price diff, but I might be completely wrong, so please let me know if there is another way.
Thanks!
If your data set is already sorted by COMPANY, a single DATA step will do:
data want;
do until (last.company);
set have;
by company;
if date_diff=0 then max_price=price;
end;
do until (last.company);
set have;
by company;
price_diff = price - max_price;
output;
end;
drop max_price; /* optional, of course */
run;
The top loop finds the maximum PRICE value. The bottom loop reads the same observations, calculates, and outputs.
Do you have multiple companies or 0's that you need to find or a single one?
many companies, each one having their own date diff = 0
Then @LinusH solution is correct, a single data step to start may be easier IMO.
1. Calculate Diff - if 0 then output record to dataset (data step)
2. Merge back with main data
3. Next calculation required
That is actually what I did originally, but since my data is big I thought that maybe there is a simpler way, but if this is the best solution for the case I will just stick to it
@ilikesas wrote:
That is actually what I did originally, but since my data is big I thought that maybe there is a simpler way, but if this is the best solution for the case I will just stick to it
@ilikesas In the future, please note what you've tried in your initial question and include the code otherwise it's a waste of our time to suggest things that you already know.
The GROUP BY is not necessary for your sample data, but will account for new groups. This does depend on the full set of data being similar to what you provided.
data have;
input company $ date_diff price;
datalines;
A -2 2
A -1 5
A 0 3
A 1 1
A 2 7
;
proc sql;
CREATE TABLE want AS
SELECT *,
price - max((date_diff = 0) * price) AS price_diff
FROM have
GROUP BY company
ORDER BY company, date_diff;
quit;
thanks for the suggestion
price - max((date_diff = 0) * price) AS price_diff
becasue I thought of calculating the price_diff in several steps, but it can be done in one!
If your data set is already sorted by COMPANY, a single DATA step will do:
data want;
do until (last.company);
set have;
by company;
if date_diff=0 then max_price=price;
end;
do until (last.company);
set have;
by company;
price_diff = price - max_price;
output;
end;
drop max_price; /* optional, of course */
run;
The top loop finds the maximum PRICE value. The bottom loop reads the same observations, calculates, and outputs.
If you only have one record per COMPANY where DATE_DIFF=0 then you can just merge the file back with itself.
data have;
input company $ date_diff price @@ ;
cards;
A -2 2 A -1 5 A 0 3 A 1 1 A 2 7
;;;;
data want ;
merge have have(rename=(date_diff=x price=base_price) where=(x=0) );
by company ;
price_diff = price - base_price ;
drop x;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.