Re: Need previous year values to be missing

kathleen_07 · Posted 12-18-2020 01:45 PM

Hi all,

I have a data set that reports values yearly and monthly. The first year given in a category I need the previous values to be missing. Here is a screenshot below with my general code being: PREVIOUS_YTD=LAG12(YTD);

Here's with: IF FIRST.YEAR THEN PREVIOUS_YTD=. ;
ELSE PREVIOUS_YTD=LAG12(YTD);

In this example, 2018 is the first year for this category, so the previous YTD values should be missing. I have no idea where 86 or 104 is coming from. Possibly from the other categories?

ballardw · Posted 12-18-2020 03:00 PM

LAG and DIF functions maintain a queue history. When you use an "if <condition> then value=lag(somevariable). The Lag returns the value the LAST time the condition was true.

The typical work around is

Lagvalue = Lag(variable);

if <condtion> then wantedvalue=lagvalue; <the variable created above>

then Drop the variable lagvalue when not needed later.

It really is a good idea to include a complete data step as a single line of code does not show use everything you are doing.

First is just that : the first record with a value. So the Feb through Dec records for the same year are no longer First. So you will need to provide additional code and probably involves First for the mythical category.

And your "data" does not show anything resembling "category" so can't address that.

There are certainly ways but we need to know what the data really looks like. Pictures do not work. I can't write code to read a picture that would create data to test with. For moderately complicated output then a better example of what the output should look like is also needed.

kathleen_07 · Posted 12-18-2020 03:28 PM

Here is an example of the code:

PROC TRANSPOSE DATA=WORK.DATA1
OUT=WORK.WANT(RENAME=(COL1=MONTH_VALUE))
NAME=CATGEORY;
BY DATE MONTH YEAR;
VAR CATEGORY_1 CATEGORY_2 CATEGORY_3 CATEGORY_4 ;
RUN;

PROC TRANSPOSE DATA=WORK.DATA2
OUT=WORK.WANT(RENAME=(COL1=YTD))
NAME=CATEGORY;
BY DATE MONTH YEAR;
VAR CATEGORY_1 CATEGORY_2 CATEGORY_3 CATEGORY_4 ;
RUN;

DATA WORK.WANT;
MERGE WORK.DATA1 WORK.DATA2;
BY DATE MONTH YEAR;
RUN;

/** Calculates Previous Year Values for Each Metric **/
PROC SORT DATA=WORK.WANT;

BY METRIC YEAR;

QUIT;

DATA WORK.WANT;
SET WORK.WANT;
BY METRIC YEAR;

PREVIOUS_YTD=LAG12(YTD_VALUE);
RUN;

ballardw · Posted 12-18-2020 03:59 PM

First a warning:

DATA WORK.WANT;
SET WORK.WANT;

Every time you use this sort of code you are completely replacing the data set want. So a minor logic error or typo can result in replacing values with unwanted ones. So the next run does not have the correct input.

While learning, or testing new sorts of code it much better to use

DATA WORK.newname;
SET WORK.WANT;

Is "metric" supposed to be your "category"? If so then maybe

DATA WORK.new;
   SET WORK.WANT;
   BY  METRIC YEAR;
   retain referenceyear;
   lv = LAG12(YTD_VALUE);
   if first.metric then referenceyear=year; 
   if year ne referenceyear then PREVIOUS_YTD= lv;

   drop referenceyear lv;
RUN;

If category is a different variable then you may need to sort by category metric year and use first.category. But without data hard to tell. Also sorts by Year that need month order are always suspect. Date values are much more consistent in behavior.

mkeintz · Posted 12-18-2020 08:43 PM

Why do you run the first proc transpose, which reads DATA1 and writes out dataset WORK.WANT, only to immediately follow it with another proc transpose reading DATA2 and over-writing dataset WANT? Then why run the 2nd transpose, when you follow it with a DATA step that merges DATA1 and DATA2 to once again overwrite WANT? The first two proc's are completely superfluous.

Now if you are starting the ytd calculations from a data set sorted by metric, and date, with exactly 12 monthly records per year, then this program will set previous_ytd to missing for the first 12 records of each METRIC. I name the dataset as HAVE, from which you would produce WANT:

data want ;
  set have;
  by metric date;
  previous_ytd=ifn(metric=lag12(metric),lag12(ytd),.);
run;

Now the "BY METRIC DATE" could be deleted, since the program code doesn't take advantage of it. But it does tell SAS to stop if the data is out of order. BTW, I assume that variable DATE is a true sas date value.

The IFN statement tests the current metric vs the lag12(metric). If they match (i.e. you are in the 13th and subsequent records), IFN returns previous_ytd=lag12(ytd). Otherwise it's set to missing.

But the advantage of the IFN statement is that it will ALWAYS execute the LAG12(ytd) function, even if the METRIC=lag12(METRIC) condition is false, and IFN ultimately returns the third argument (a missing value). In other words, the queue underneath the misleadingly-named LAG function is ALWAYS updated, even when IFN doesn't use it. This is a good way to address the problem described by @ballardw.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Need previous year values to be missing