- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I have a long dataset (>10,000 ID's over 8 year period) with multiple dates and need to perform some calculations by 'ID'.
I already have ID, date and value but need to create 'New_value'' such that for the month of February and March the value is modified at ID level.
So New_Value: February is 0.5*Jan Value and March is 0.5*Feb Value. Otherwise New_Value=Value.
My date format is YYMMDD10.;
Example:
ID | Date | Value | New_Value |
1 | Jan-15 | 0.50 | 0.50 |
1 | Feb-15 | 1.00 | 0.25 |
1 | Mar-15 | 0.44 | 0.50 |
1 | Apr-15 | 0.06 | 0.06 |
1 | May-15 | 0.42 | 0.42 |
1 | Jun-15 | 0.75 | 0.75 |
1 | Jul-15 | 0.33 | 0.33 |
1 | Aug-15 | 0.11 | 0.11 |
1 | Sep-15 | 0.42 | 0.42 |
1 | Oct-15 | 0.03 | 0.03 |
1 | Nov-15 | 0.58 | 0.58 |
1 | Dec-15 | 0.45 | 0.45 |
1 | Jan-16 | 2.00 | 2.00 |
1 | Feb-16 | 3.00 | 1.00 |
1 | Mar-16 | 0.55 | 1.50 |
...
Any support would be much appreciated.
Thanks
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Do any of your ID's start in the month of Feb or Mar? If so, then the preceding month data would arise from the prior ID. You would need to protect against that, as in:
data want;
set have;
by id;
new_value=ifn(month(date) in (2,3) and first.id^=1,.5*lag(value),value);
run;
If no ID's start in Feb or Mar, then you can drop the BY statement and simplify the new_value= assignment:
data want;
set have;
new_value=ifn(month(date) in (2,3),.5*lag(value),value);
run;
Why do I use the IFN statement, instead of an IF month(date)= then ..., followed by ELSE ....? Because the LAG function is really an "update queue" function - it's not a "look back" like in Excel. You need the queue to always be updated, even if you don't assign the results of the update in NEW_VALUE. The intrinsic property of IFN is that it evaluates both of the outcome fields (the 2nd and 3rd arguments), reqardless of the results of the conditional field (argument 1). So the LAG function underlying queue is always being updated and you are master of your destiny.
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @PetePatel
You can try this, using the LAG() function:
- if date = a character variable:
data want;
set have;
_lag = lag(value);
if substr(Date,1,3) in ("Feb","Mar") then New_value = _lag * 0.5;
else New_Value = Value;
drop _lag;
run;
- if date = a SAS date:
data want;
set have;
_lag = lag(value);
if MONTH(Date) in (2,3) then New_value = _lag * 0.5;
else New_Value = Value;
drop _lag;
run;
Best,
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for your reply.
Please see ifn statement in latest reply that would help with ID's starting in Feb.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Do any of your ID's start in the month of Feb or Mar? If so, then the preceding month data would arise from the prior ID. You would need to protect against that, as in:
data want;
set have;
by id;
new_value=ifn(month(date) in (2,3) and first.id^=1,.5*lag(value),value);
run;
If no ID's start in Feb or Mar, then you can drop the BY statement and simplify the new_value= assignment:
data want;
set have;
new_value=ifn(month(date) in (2,3),.5*lag(value),value);
run;
Why do I use the IFN statement, instead of an IF month(date)= then ..., followed by ELSE ....? Because the LAG function is really an "update queue" function - it's not a "look back" like in Excel. You need the queue to always be updated, even if you don't assign the results of the update in NEW_VALUE. The intrinsic property of IFN is that it evaluates both of the outcome fields (the 2nd and 3rd arguments), reqardless of the results of the conditional field (argument 1). So the LAG function underlying queue is always being updated and you are master of your destiny.
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you mkeintz.
I really like how the code protects against any ID's starting in Feb or Mar.