DATA Step, Macro, Functions and more

How to complete missing values with a specific condition

Accepted Solution Solved
Reply
Contributor
Posts: 58
Accepted Solution

How to complete missing values with a specific condition

Hi, I need to complete missing values from a column with the previous value if there is any and by group. I'll try to explain it:

Obs. Price Name

1     3.5     A

2     3.6     A

3     .        A

4     3.2     A

5     .         A

6     .         A

7     3.4     A

8     .          B

9     3.6     B

10    .        B

11    3.2     B

12   .         B

13   .         B

After the process, the table should be:

Obs. Price Name

1     3.5     A

2     3.6     A

3     3.6     A

4     3.2     A

5     3.2     A

6     3.2     A

7     3.4     A

8     .         B

9     3.6     B

10   3.6      B

11    3.2     B

12   3.2     B

13    3.2     B

Obs. N° 8 must be still missing because for its group B, there is no previous value to copy.

This is my code:

DATA PRICES;

SET PRICESMISSING;

BY NAME;

DO;

IF PRICE <> . THEN A=PRICE;

IF MISSING(PRICE) THEN PRICE=LAG(A);

END;

RUN;

My idea is store a aux variable to save no missing values from price and when the process find a missing value use the lag value from the aux variable but when I run it, I got the same column without any replace.

Could you help me? Any kind of suggestion is welcome! Thank you!


Accepted Solutions
Solution
‎10-04-2013 12:14 PM
Super Contributor
Posts: 1,636

Re: How to complete missing values with a specific condition

data have;

input obs price name $;

cards;

1     3.5     A

2     3.6     A

3     .        A

4     3.2     A

5     .         A

6     .         A

7     3.4     A

8     .          B

9     3.6     B

10    .        B

11    3.2     B

12   .         B

13   .         B

;

data want(rename=(new=price));

retain new;

set have;

by name notsorted;

new=ifn((first.name=0 and price ne .),price,new);

if first.name then new=price;

drop price obs;

proc print;run;

View solution in original post


All Replies
SAS Super FREQ
Posts: 8,864

Re: How to complete missing values with a specific condition


Hi, the use of LAG should be "unconditional" (in other words, using LAG inside an IF is not recommended, per this Tech Support note 24665 - Use the LAG function to conditionally carry information down a data set ) So I would recommend that you look at the note and use the technique in the program (the technique that creates variable C) ...you are using a technique such as the one that creates variable A (the WRONG result) in the example program.

cynthia

Respected Advisor
Posts: 3,799

Re: How to complete missing values with a specific condition

Posted in reply to Cynthia_sas

LAG is not the proper tool for step interpolation or as it is often commonly referred to as LOCF.

data price;
   input price name $ @@;
   cards;
3.5 A 3.6 A  .  A 3.2 A  .  A  .  A 3.4 A
.  B 3.6 B  .  B 3.2 B .   B .   B
;;;;
  run;
data price;
   update price(obs=0 keep=name) price;
   by name;
   output;
  
run;
proc print;
  
run;
Solution
‎10-04-2013 12:14 PM
Super Contributor
Posts: 1,636

Re: How to complete missing values with a specific condition

data have;

input obs price name $;

cards;

1     3.5     A

2     3.6     A

3     .        A

4     3.2     A

5     .         A

6     .         A

7     3.4     A

8     .          B

9     3.6     B

10    .        B

11    3.2     B

12   .         B

13   .         B

;

data want(rename=(new=price));

retain new;

set have;

by name notsorted;

new=ifn((first.name=0 and price ne .),price,new);

if first.name then new=price;

drop price obs;

proc print;run;

Contributor
Posts: 58

Re: How to complete missing values with a specific condition

Thank you very much!! I almost cry when it worked!

Super Contributor
Posts: 339

Re: How to complete missing values with a specific condition

data want(rename=(new=price));

retain new;

set have;

by name notsorted;

if first.name then new=price;

else new=coalesce(price, new);

drop price;

run;

It's really the same approach as Linlin but I dislike multiple ifs with similar conditionnal arrangements so I hid "and price ne ." into the coalesce function. Slight improvement to code readability.

As Cynthia pointed out, LAG is a pile or queue and not an actual previous row addressing so conditioning on it is nearly guaranteed to fail your desired logic. You could use an array _temporary_ to store previous over data step iterations to achieve a logic similar to the one you've coded in your original post.

Vince

Edit @Data _Null_ that's neat. I didn't know of an update statement for data step.

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 267 views
  • 6 likes
  • 5 in conversation