BookmarkSubscribeRSS Feed
kathleen_07
Calcite | Level 5

Hi all,

 

I have a data set that reports values yearly and monthly. The first year given in a category I need the previous values to be missing. Here is a screenshot below with my general code being: PREVIOUS_YTD=LAG12(YTD);

 

kathleen_07_0-1608316818533.png

 

Here's with: IF FIRST.YEAR THEN PREVIOUS_YTD=. ;
ELSE PREVIOUS_YTD=LAG12(YTD);

kathleen_07_1-1608316953961.png

In this example, 2018 is the first year for this category, so the previous YTD values should be missing. I have no idea where 86 or 104 is coming from. Possibly from the other categories? 

4 REPLIES 4
ballardw
Super User

LAG and DIF functions maintain a queue history. When you use an "if <condition> then value=lag(somevariable). The Lag returns the value the LAST time the condition was true.

The typical work around is

Lagvalue = Lag(variable);

if <condtion> then wantedvalue=lagvalue;  <the variable created above>

 

then Drop the variable lagvalue when not needed later.

 

It really is a good idea to include a complete data step as a single line of code does not show use everything you are doing.

 

First is just that : the first record with a value.  So the Feb through Dec records for the same year are no longer First.  So you will need to provide additional code and probably involves First for the mythical category.

 

And your "data" does not show anything resembling "category" so can't address that.

 

There are certainly ways but we need to know what the data really looks like. Pictures do not work. I can't write code to read a picture that would create data to test with. For moderately complicated output then a better example of what the output should look like is also needed.

kathleen_07
Calcite | Level 5

Here is an example of the code:

 

PROC TRANSPOSE DATA=WORK.DATA1
OUT=WORK.WANT(RENAME=(COL1=MONTH_VALUE))
NAME=CATGEORY;
BY DATE MONTH YEAR;
VAR CATEGORY_1 CATEGORY_2 CATEGORY_3 CATEGORY_4 ;
RUN;

 

PROC TRANSPOSE DATA=WORK.DATA2
OUT=WORK.WANT(RENAME=(COL1=YTD))
NAME=CATEGORY;
BY DATE MONTH YEAR;
VAR CATEGORY_1 CATEGORY_2 CATEGORY_3 CATEGORY_4 ;
RUN;

 

DATA WORK.WANT;
MERGE WORK.DATA1 WORK.DATA2;
BY DATE MONTH YEAR;
RUN;

 

/** Calculates Previous Year Values for Each Metric **/
PROC SORT DATA=WORK.WANT;

BY METRIC YEAR;

QUIT;

DATA WORK.WANT;
SET WORK.WANT;
BY METRIC YEAR;

PREVIOUS_YTD=LAG12(YTD_VALUE);
RUN;

ballardw
Super User

First a warning:

DATA WORK.WANT;
SET WORK.WANT;

Every time you use this sort of code you are completely replacing the data set want. So a minor logic error or typo can result in replacing values with unwanted ones. So the next run does not have the correct input.

While learning, or testing new sorts of code it much better to use

DATA WORK.newname;
SET WORK.WANT;

Is "metric" supposed to be your "category"? If so then maybe

 

DATA WORK.new;
   SET WORK.WANT;
   BY  METRIC YEAR;
   retain referenceyear;
   lv = LAG12(YTD_VALUE);
   if first.metric then referenceyear=year; 
   if year ne referenceyear then PREVIOUS_YTD= lv;

   drop referenceyear lv;
RUN;

If category is a different variable then you may need to sort by category metric year and use first.category. But without data hard to tell. Also sorts by Year that need month order are always suspect. Date values are much more consistent in behavior.

mkeintz
PROC Star

Why do you run the first proc transpose, which reads DATA1 and writes out dataset WORK.WANT, only to immediately follow it with another proc transpose reading DATA2 and over-writing dataset WANT?  Then why run the 2nd transpose, when you follow it with a DATA step that merges DATA1 and DATA2 to once again overwrite WANT?  The first two proc's are completely superfluous.

 

Now if you  are starting the ytd calculations from a data set sorted by metric, and date, with exactly 12 monthly records per year, then this program will set previous_ytd to  missing for the first 12 records of each METRIC.  I name the dataset as HAVE, from which you would produce WANT: 

 

data want ;
  set have;
  by metric date;
  previous_ytd=ifn(metric=lag12(metric),lag12(ytd),.);
run;

Now the "BY METRIC DATE" could be deleted, since the program code doesn't take advantage of it.   But it does tell SAS to stop if the data is out of order.   BTW, I assume that variable DATE is a true sas date value.

 

The IFN  statement tests the current metric vs the lag12(metric).  If they match (i.e. you are in the 13th and subsequent records), IFN returns previous_ytd=lag12(ytd).  Otherwise it's set to missing.

 

But the advantage of the IFN statement is that it will ALWAYS execute the LAG12(ytd) function, even if the METRIC=lag12(METRIC) condition is false, and IFN ultimately returns the third argument (a missing value).  In other words, the queue underneath the misleadingly-named LAG function is ALWAYS updated, even when IFN doesn't use it.  This is a good way to address the problem described by @ballardw.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 717 views
  • 0 likes
  • 3 in conversation