Re: LOCF - Page 2

Wafflecakes · Posted 09-12-2017 05:04 PM

And basically that logic can be applied to an 'infinite' number of variables in the same datastep?

novinosrin · Posted 09-12-2017 05:08 PM

Well if it's too many(variables), then lazy people like me would have to find a way to shorten the typing. You would have noticed I had to type in another variable.Then it goes to arrays, hashes etc. The fun about sas is that harder the concept as it may seem (eg hashes) the easier is to process 🙂 Have fun! I am leaving for my lectures at the university.

Feel to free to open a new thread with a new question. Cheers!

Wafflecakes · Posted 09-12-2017 05:29 PM

HAVE

id month sales

1 1 1200

1 2 .

1 3 1500

1 4 .

2 1 1000

2 2 1000

2 3 .

2 4 .

WANT

id month sales

1 1 1200

1 2 1200

1 3 1500

1 4 1500

2 1 . <<<<<<<<<<<<<<<

2 2 1000

2 3 1000

2 4 1000

Novo - the code that you suggested seems to work for the most part. It looks like theres an error though because sales2/id 2 is supposed to be 1000, but it shows up as missing?

novinosrin · Posted 09-12-2017 05:33 PM

still not a single change in my code yet again:

data have;
input id month sales sales2;
datalines;
1 1 1200 500
1 2 . .
1 3 1500 .
1 4 . .
2 1 1000 1000
2 2 1000 300
2 3 . .
2 4 . .
;

data want;
set have;
by id;
retain _sales __sales;
if first.id then call missing(_sales,__sales);
if sales then _sales=sales;
if sales2 then __sales=sales2;
drop sales sales2;
rename _sales=sales __sales=sales2;
run;

Wafflecakes · Posted 09-12-2017 05:52 PM

Sorry forget about sales 2. Look at the previous post again. If you notice, observation 2 month 1 is supposed to be 1000, but it is set to missing.

Wafflecakes · Posted 09-12-2017 05:55 PM

In some cases I also get something like this

sales _sales

100 100

. 100

0 100 <<<< This is supposed to be 0.

novinosrin · Posted 09-13-2017 01:09 PM

@Wafflecakes Sorry I was in class and then went home last night. I had a look at your concern. But this was your input-

data have;

input id month sales;

datalines;

1 1 1200

1 2 .

1 3 1500

1 4 .

2 1 .

2 2 1000

2 3 .

2 4 .

2 5 .

2 6 .

2 7 3000

3 1 .

3 2 .

3 3 1100

3 4 .

3 5 .

;

The scope of the answer is to look above or retain the above. So the solution would be is to retain the value once found, however do you also need the solution to look ahead? This takes us to next level, perhaps double DOW or Hashes. BTW let us both enjoy Art's awesome solution and use it. I have been cheeky for a long time shamelessly copying his generosity. Hahaha!

art297 · Posted 09-12-2017 06:28 PM

If you want both LOCF and NOCB for a large number of variables then I would use something like:

data HAVE;
  input id month sales sales2;
  cards;
1   1   1200 .
1   2   . 300
1   3   1500 .
1   4   . 200
2   1   . 400
2   2   1000 500
2   3   . .
2   4   . .
2   5   . .
2   6   . 600
2   7   3000 .
3   1   .  200
3   2   . .
3   3   1100 300
3   4   . .
3   5   . .
;
data want (drop=i);
  set have;
  by id;
  array vars(*) sales sales2;
  array hold(2) _temporary_;
  do i=1 to dim(vars);
    counter+1;
    if first.id then hold(i)=vars(i);
    else if not missing(vars(i)) then hold(i)=vars(i);
    else vars(i)=hold(i);
  end;
run;
proc sort data=want;
  by id descending month;
run;
data want (drop=i);
  set want;
  by id;
  array vars(*) sales sales2;
  array hold(2) _temporary_;
  do i=1 to dim(vars);
    counter+1;
    if first.id then hold(i)=vars(i);
    else if not missing(vars(i)) then hold(i)=vars(i);
    else vars(i)=hold(i);
  end;
run;

Art, CEO, AnalystFinder.com

Wafflecakes · Posted 09-12-2017 06:30 PM

What does the 'i=' and 'dim' represent, Art?

art297 · Posted 09-12-2017 06:37 PM

I have to answer them backwards. dim(vars), in this case, simply gets the number of variables in the array.

Then, with do=i to dim(vars), i is simply a counter. In this case it will be 1 in the first iteration, 2 in the next. Thus it is just saying :

do i = 1 to 2

Unfortunately, one can't use dim to create an array with new variables, thus I had to manually set the second array (hold) to 2, rather than being able to use * or dim(vars)

Art

Tom · Posted 09-13-2017 02:00 PM

The easiest way to apply LOCF is to let the UPDATE statement do it for you. The UPDATE statement was really designed to apply transactions to a master dataset, but its features work well for LOCF operations. Basically if the variable is non-missing in the transaction then it replaces the current value of that variable. But missing values in the transaction are ignored.

To get it to work for LOCF application you need to add a couple of things. One you need a MASTER and TRANSACTIONS dataset. For this we will use an empty version of your current data as the MASTER and your current data as the transactions. Also normally UPDATE wants to output one observation per key value. But since it is just a DATA step we can add an explicit OUTPUT statement and so write each transaction out after it is applied.

data want ;
  update have(obs=0) have ;
  by id ;
  output;
run;

Another wrinkle is if there are variable that you do NOT want to treat as "transactions" then you could add another SET statement to read them back in and overwrite the results of apply the transactions. So for example lets assume that your MONTH variable might have missing values and you do NOT want to apply the LOCF logic to MONTH. Then your code might look like this.

data want ;
  update have(obs=0) have ;
  by id ;
  set have(keep=month);
  output;
run;

You could either use KEEP= to list the variable that you want not perform LOCF on or use DROP= to list the variables that you DO want to perform LOCF on.

Now it also looks like you have added a requirement to carry the first non-missing value back to replace earlier missing values?

If you just have one variable then that is an easy modification as you can just make a first_value dataset using a WHERE statement .

data first_value ;
  set have (keep=id sales);
  by id;
  where not missing(sales);
  if first.id;
run;
data want ;
  update first_value have ;
  by id ;
  output;
run;

But that gets hard to do if you have mulitple variables, say SALES and SALES2, that you want apply this to as the first non-missing values for the different variables might not occur on the same obseration. But you can also use UPDATE to solve this. Just sort the data in reverse order within the ID groups. So for your data you could do this to find the first value for SALES and SALES2.

proc sort data=have out=reverse_have ;
  by id descending month ;
run;

data first_values ;
 update have(obs=0) reverse_have ;
 by id ;
run;

Now use that first_values dataset as the MASTER table in your UPDATE.

data want ;
  update first_values have ;
  by id ;
  output;
run;

SAS Innovate 2025: Call for Content

Classroom Training Available!