BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
PetePatel
Quartz | Level 8

Hi,

 

I have a long dataset (>10,000 ID's over 8 year period) with multiple dates and need to perform some calculations by 'ID'.

 

I already have ID, date and value but need to create 'New_value'' such that for the month of February and March the value is modified at ID level.

 

So New_Value:  February is 0.5*Jan Value and March is 0.5*Feb Value. Otherwise New_Value=Value.

 

My date format is YYMMDD10.;

 

Example:

 

IDDateValueNew_Value
1Jan-150.500.50
1Feb-151.000.25
1Mar-150.440.50
1Apr-150.060.06
1May-150.420.42
1Jun-150.750.75
1Jul-150.330.33
1Aug-150.110.11
1Sep-150.420.42
1Oct-150.030.03
1Nov-150.580.58
1Dec-150.450.45
1Jan-162.002.00
1Feb-163.001.00
1Mar-160.551.50

...

 

Any support would be much appreciated.

 

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

Do any of your ID's start in the month of Feb or Mar?   If so, then the preceding month data would arise from the prior ID.  You would need to protect against that, as in:

 

data want;
  set have;
  by id;
  new_value=ifn(month(date) in (2,3) and first.id^=1,.5*lag(value),value);
run;

If no ID's start in Feb or Mar, then you  can drop the BY  statement and simplify the new_value= assignment:

data want;
  set have;
  new_value=ifn(month(date) in (2,3),.5*lag(value),value);
run;

Why do I use the IFN statement, instead of an IF month(date)= then ..., followed by ELSE ....?   Because the LAG function is really an "update queue" function - it's not a "look back" like in Excel.  You need the queue to always be updated, even if you don't assign the results of the update in NEW_VALUE.   The intrinsic property of IFN is that it evaluates both of the outcome fields (the 2nd and 3rd arguments), reqardless of the results of the conditional field (argument 1).  So the LAG function underlying queue is always being updated and you are master of your destiny.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

4 REPLIES 4
ed_sas_member
Meteorite | Level 14

Hi @PetePatel 

You can try this, using the LAG() function:

 

- if date = a character variable:

data want;
	set have;
	_lag = lag(value);
	if substr(Date,1,3) in ("Feb","Mar") then New_value = _lag * 0.5;
	else New_Value = Value;
	drop _lag;
run;

- if date = a SAS date:

data want;
	set have;
	_lag = lag(value);
	if MONTH(Date) in (2,3) then New_value = _lag * 0.5;
	else New_Value = Value;
	drop _lag;
run;

Best,

PetePatel
Quartz | Level 8

Thank you for your reply.

 

Please see ifn statement in latest reply that would help with ID's starting in Feb.

mkeintz
PROC Star

Do any of your ID's start in the month of Feb or Mar?   If so, then the preceding month data would arise from the prior ID.  You would need to protect against that, as in:

 

data want;
  set have;
  by id;
  new_value=ifn(month(date) in (2,3) and first.id^=1,.5*lag(value),value);
run;

If no ID's start in Feb or Mar, then you  can drop the BY  statement and simplify the new_value= assignment:

data want;
  set have;
  new_value=ifn(month(date) in (2,3),.5*lag(value),value);
run;

Why do I use the IFN statement, instead of an IF month(date)= then ..., followed by ELSE ....?   Because the LAG function is really an "update queue" function - it's not a "look back" like in Excel.  You need the queue to always be updated, even if you don't assign the results of the update in NEW_VALUE.   The intrinsic property of IFN is that it evaluates both of the outcome fields (the 2nd and 3rd arguments), reqardless of the results of the conditional field (argument 1).  So the LAG function underlying queue is always being updated and you are master of your destiny.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
PetePatel
Quartz | Level 8

Thank you mkeintz.

 

I really like how the code protects against any ID's starting in Feb or Mar.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 481 views
  • 2 likes
  • 3 in conversation