BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Quartz | Level 8

## Row calculations

Hi,

I have a long dataset (>10,000 ID's over 8 year period) with multiple dates and need to perform some calculations by 'ID'.

I already have ID, date and value but need to create 'New_value'' such that for the month of February and March the value is modified at ID level.

So New_Value:  February is 0.5*Jan Value and March is 0.5*Feb Value. Otherwise New_Value=Value.

My date format is YYMMDD10.;

Example:

 ID Date Value New_Value 1 Jan-15 0.50 0.50 1 Feb-15 1.00 0.25 1 Mar-15 0.44 0.50 1 Apr-15 0.06 0.06 1 May-15 0.42 0.42 1 Jun-15 0.75 0.75 1 Jul-15 0.33 0.33 1 Aug-15 0.11 0.11 1 Sep-15 0.42 0.42 1 Oct-15 0.03 0.03 1 Nov-15 0.58 0.58 1 Dec-15 0.45 0.45 1 Jan-16 2.00 2.00 1 Feb-16 3.00 1.00 1 Mar-16 0.55 1.50

...

Any support would be much appreciated.

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
PROC Star

## Re: Row calculations

Do any of your ID's start in the month of Feb or Mar?   If so, then the preceding month data would arise from the prior ID.  You would need to protect against that, as in:

``````data want;
set have;
by id;
new_value=ifn(month(date) in (2,3) and first.id^=1,.5*lag(value),value);
run;``````

If no ID's start in Feb or Mar, then you  can drop the BY  statement and simplify the new_value= assignment:

``````data want;
set have;
new_value=ifn(month(date) in (2,3),.5*lag(value),value);
run;``````

Why do I use the IFN statement, instead of an IF month(date)= then ..., followed by ELSE ....?   Because the LAG function is really an "update queue" function - it's not a "look back" like in Excel.  You need the queue to always be updated, even if you don't assign the results of the update in NEW_VALUE.   The intrinsic property of IFN is that it evaluates both of the outcome fields (the 2nd and 3rd arguments), reqardless of the results of the conditional field (argument 1).  So the LAG function underlying queue is always being updated and you are master of your destiny.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
4 REPLIES 4
Meteorite | Level 14

## Re: Row calculations

You can try this, using the LAG() function:

- if date = a character variable:

``````data want;
set have;
_lag = lag(value);
if substr(Date,1,3) in ("Feb","Mar") then New_value = _lag * 0.5;
else New_Value = Value;
drop _lag;
run;``````

- if date = a SAS date:

``````data want;
set have;
_lag = lag(value);
if MONTH(Date) in (2,3) then New_value = _lag * 0.5;
else New_Value = Value;
drop _lag;
run;``````

Best,

Quartz | Level 8

## Re: Row calculations

Thank you for your reply.

Please see ifn statement in latest reply that would help with ID's starting in Feb.

PROC Star

## Re: Row calculations

Do any of your ID's start in the month of Feb or Mar?   If so, then the preceding month data would arise from the prior ID.  You would need to protect against that, as in:

``````data want;
set have;
by id;
new_value=ifn(month(date) in (2,3) and first.id^=1,.5*lag(value),value);
run;``````

If no ID's start in Feb or Mar, then you  can drop the BY  statement and simplify the new_value= assignment:

``````data want;
set have;
new_value=ifn(month(date) in (2,3),.5*lag(value),value);
run;``````

Why do I use the IFN statement, instead of an IF month(date)= then ..., followed by ELSE ....?   Because the LAG function is really an "update queue" function - it's not a "look back" like in Excel.  You need the queue to always be updated, even if you don't assign the results of the update in NEW_VALUE.   The intrinsic property of IFN is that it evaluates both of the outcome fields (the 2nd and 3rd arguments), reqardless of the results of the conditional field (argument 1).  So the LAG function underlying queue is always being updated and you are master of your destiny.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Quartz | Level 8

## Re: Row calculations

Thank you mkeintz.

I really like how the code protects against any ID's starting in Feb or Mar.

Discussion stats
• 4 replies
• 630 views
• 2 likes
• 3 in conversation