Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Programming
- /
- SAS Procedures
- /
- Impute Missing Valu

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 10-31-2017 07:42 AM
(924 views)

My data is in this format:

```
data have;
input GROUP1 GROUP2 GROUP3 VAR1 VAR2;
cards;
A A1 A11 10 12
A A1 A11 13 .
A A1 A12 15 16
A A2 A12 . 17
A A2 A13 12 12
A A2 A13 19 18
B B1 B11 15 21
B B1 B11 22 29
B B1 B12 31 40
B B2 B12 . .
B B2 B13 5 3
B B2 B13 . .
;
RUN;
```

And I want the result in this format:

```
data desired;
input GROUP1 GROUP2 GROUP3 VAR1 VAR2;
cards;
A A1 A11 10 12
A A1 A11 13 12
A A1 A12 15 16
A A2 A12 15 17
A A2 A13 12 12
A A2 A13 19 18
B B1 B11 15 21
B B1 B11 22 29
B B1 B12 31 40
B B2 B12 31 31
B B2 B13 5 3
B B2 B13 5 3
;
RUN;
```

I want to impute the missing values by replicating the previous value from the same above group. E.g. for group combinations (B, B2, B13), there's a missing value for variable "Value2". This value will be replicated with 20, which is the above value from the same group.

I tried to follow this option. But this does not have multiple value computation option.

```
data imputedData;
set mydata;
n=_n_;
if missing(Value1) then
do;
do until (not missing(value1));
n=n-1;
set mydata(keep=Value1) point=n; *second SET statement;
end;
end;
run;
```

But the problem is it can calculate only 1 variable at a time. And also it was taking a long time to compute. My dataset is around 400K number of rows.

If someone please guide me how to do it.

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

6 REPLIES 6

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

This is a good approach. It fills in for all variables, and you don't even need to know the variable names.

Looking at your data on lines 3 and 4, it looks like you want to fill in values within GROUP1 regardless of whether GROUP2 or GROUP3 changes. So your BY statement might be shorter:

by group1;

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@Astounding: I didn't get your answer. My lowest level is Group3. So you are saying I should use by statement in my code? The code I used works fine for small dataset. But the dataset of my size it's takes over 3 hours. Also, it calculates value I variable at a time. So the entire process is extremely time consuming. Is there any other alternative?

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Sorry, when I said "good approach" I was talking about KSharp's solution.

The question I was trying to address is this. When should the previous value be carried forward? When GROUP1 changes from "A" to "B", is it still OK to carry a value forward from an "A" observation to a "B" observation? (I suspect the answer is no.)

How about when GROUP1 remains "A", but GROUP2 changes from "A1" to "A2"? Is it permissible to carry forward a value from "A1" to "A2" observations? I suspect the answer is yes, because in your original example you do exactly that. (Refer to lines 3 and 4 of the data.)

So if my suspected answers are correct, and if your data is already sorted as indicated, the program becomes:

data ImputedData;

update mydata (obs=0) mydata;

by group1;

output;

run;

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@Astounding: Thanks for the clarification!

**Don't miss out on SAS Innovate - Register now for the FREE Livestream!**

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.