BookmarkSubscribeRSS Feed
Wafflecakes
Calcite | Level 5

And basically that logic can be applied to an 'infinite' number of variables in the same datastep?

novinosrin
Tourmaline | Level 20

Well if it's too many(variables), then lazy people like me would have to find a way to shorten the typing. You would have noticed I had to type in another variable.Then it goes to arrays, hashes etc. The fun about sas is that harder the concept as it may seem (eg hashes) the easier is to process 🙂 Have fun! I am leaving for my lectures at the university. 

Feel to free to open a new thread with a new question. Cheers!

Wafflecakes
Calcite | Level 5

HAVE

id month sales 

1   1   1200 

1   2   .         

1   3   1500  

1   4   .         

2   1   1000 

2   2   1000 

2   3   .        

2   4   .        

 

WANT

id month sales 

1   1   1200 

1   2   1200 

1   3   1500 

1   4   1500 

2   1   .               <<<<<<<<<<<<<<<

2   2   1000 

2   3   1000 

2   4   1000

 

Novo - the code that you suggested seems to work for the most part.  It looks like theres an error though because sales2/id 2 is supposed to be 1000, but it shows up as missing?

novinosrin
Tourmaline | Level 20

still not a single change in my code yet again:

data have;
input id month sales sales2;
datalines;
1 1 1200 500
1 2 . .
1 3 1500 .
1 4 . .
2 1 1000 1000
2 2 1000 300
2 3 . .
2 4 . .
;

data want;
set have;
by id;
retain _sales __sales;
if first.id then call missing(_sales,__sales);
if sales then _sales=sales;
if sales2 then __sales=sales2;
drop sales sales2;
rename _sales=sales __sales=sales2;
run;

Wafflecakes
Calcite | Level 5
Sorry forget about sales 2. Look at the previous post again. If you notice, observation 2 month 1 is supposed to be 1000, but it is set to missing.
Wafflecakes
Calcite | Level 5

In some cases I also get something like this

 

sales _sales

100 100

.      100

0     100 <<<<  This is supposed to be 0.

novinosrin
Tourmaline | Level 20

@Wafflecakes Sorry I was in class and then went home last night. I had a look at your concern. But this was your input-

data have;

input id month sales;

datalines;

1   1   1200

1   2   .

1   3   1500

1   4   .

2   1   .

2   2   1000

2   3   .

2   4   .

2   5   .

2   6   .

2   7   3000

3   1   .

3   2   .

3   3   1100

3   4   .

3   5   .

;

 

The scope of the answer is to look above or retain the above. So the solution would be is to retain the value once found, however do you also need the solution to look ahead? This takes us to next level, perhaps double DOW or Hashes. BTW let us both enjoy Art's awesome solution and use it. I have been cheeky for a long time shamelessly copying his generosity. Hahaha!

art297
Opal | Level 21

If you want both LOCF and NOCB for a large number of variables then I would use something like:

data HAVE;
  input id month sales sales2;
  cards;
1   1   1200 .
1   2   . 300
1   3   1500 .
1   4   . 200
2   1   . 400
2   2   1000 500
2   3   . .
2   4   . .
2   5   . .
2   6   . 600
2   7   3000 .
3   1   .  200
3   2   . .
3   3   1100 300
3   4   . .
3   5   . .
;
data want (drop=i);
  set have;
  by id;
  array vars(*) sales sales2;
  array hold(2) _temporary_;
  do i=1 to dim(vars);
    counter+1;
    if first.id then hold(i)=vars(i);
    else if not missing(vars(i)) then hold(i)=vars(i);
    else vars(i)=hold(i);
  end;
run;
proc sort data=want;
  by id descending month;
run;
data want (drop=i);
  set want;
  by id;
  array vars(*) sales sales2;
  array hold(2) _temporary_;
  do i=1 to dim(vars);
    counter+1;
    if first.id then hold(i)=vars(i);
    else if not missing(vars(i)) then hold(i)=vars(i);
    else vars(i)=hold(i);
  end;
run;

Art, CEO, AnalystFinder.com

 

Wafflecakes
Calcite | Level 5
What does the 'i=' and 'dim' represent, Art?
art297
Opal | Level 21

I have to answer them backwards. dim(vars), in this case, simply gets the number of variables in the array.

 

Then, with do=i to dim(vars), i is simply a counter. In this case it will be 1 in the first iteration, 2 in the next. Thus it is just saying :

do i = 1 to 2

 

Unfortunately, one can't use dim to create an array with new variables, thus I had to manually set the second array (hold) to 2, rather than being able to use * or dim(vars)

 

Art

 

Tom
Super User Tom
Super User

The easiest way to apply LOCF is to let the UPDATE statement do it for you.  The UPDATE statement was really designed to apply transactions to a master dataset, but its features work well for LOCF operations.  Basically if the variable is non-missing in the transaction then it replaces the current value of that variable. But missing values in the transaction are ignored.

 

To get it to work for LOCF application you need to add a couple of things. One you need a MASTER and TRANSACTIONS dataset. For this we will use an empty version of your current data as the MASTER and your current data as the transactions.  Also normally UPDATE wants to output one observation per key value.  But since it is just a DATA step we can add an explicit OUTPUT statement and so write each transaction out after it is applied.

 

data want ;
  update have(obs=0) have ;
  by id ;
  output;
run;

Another wrinkle is if there are variable that you do NOT want to treat as "transactions" then you could add another SET statement to read them back in and overwrite the results of apply the transactions.  So for example lets assume that your MONTH variable might have missing values and you do NOT want to apply the LOCF logic to MONTH.  Then your code might look like this.

data want ;
  update have(obs=0) have ;
  by id ;
  set have(keep=month);
  output;
run;

You could either use KEEP= to list the variable that you want not perform LOCF on or use DROP= to list the variables that you DO want to perform LOCF on. 

 

Now it also looks like you have added a requirement to carry the first non-missing value back to replace earlier missing values?  

If you just have one variable then that is an easy modification as you can just make a first_value dataset using a WHERE statement .

data first_value ;
  set have (keep=id sales);
  by id;
  where not missing(sales);
  if first.id;
run;
data want ;
  update first_value have ;
  by id ;
  output;
run;

But that gets hard to do if you have mulitple variables, say SALES and SALES2, that you want apply this to as the first non-missing values for the different variables might not occur on the same obseration.  But you can also use UPDATE to solve this.  Just sort the data in reverse order within the ID groups.  So for your data you could do this to find the first value for SALES and SALES2.

proc sort data=have out=reverse_have ;
  by id descending month ;
run;

data first_values ;
 update have(obs=0) reverse_have ;
 by id ;
run;

Now use that first_values dataset as the MASTER table in your UPDATE.

data want ;
  update first_values have ;
  by id ;
  output;
run;

 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 25 replies
  • 3254 views
  • 1 like
  • 4 in conversation