Solved: Replace missing values in one dataset with values from another dataset

ANKH1 · Posted 11-27-2023 02:52 PM

I have this dataset with. Some IDs are missing values for window=0

data ds1;
input ID$ window m1 m2 m3;
datalines;
1	3	157	56	111
1	6	145	35	145
1	9	178	78	200
2	0	132	27	201
2	6	124	95	211
2	12	127	37	532
2	15	178	35	465
3	6	148	74	231
3	12	138	49	532
;
run;

The dataset that should be used to complement data when window=0 is missing is this:

data ds2;
input ID$ window m1 m2 m3;
datalines;
1	0	145	54	201
2	0	143	45	143
3	0	132	47	165
4	0	131	48	253
9	0	124	52	511
;
run;

We need to replace missing window=0 from ds1 with values from ds2. Note that data should be added only when window=0 is missing from ds1. See the desired output below for ID 2, it keeps the data from window=0 from ds1. How can we get the below dataset?

ID	window	m1	m2	m3
1	0	145	54	201
1	3	157	56	111
1	6	145	35	145
1	9	178	78	200
2	0	132	27	201
2	6	124	95	211
2	12	127	37	532
2	15	178	35	465
3	0	132	47	165
3	6	148	74	231
3	12	138	49	532
4	0	131	48	253
9	0	124	52	511

Reeza · Posted 11-27-2023 03:27 PM

Lazy way:

data full_data;
set ds1 ds2 indsname=src;
source=src;
run;

proc sort data=full_data;
by id window source;
run;

data want;
set full_data;
by id window source;
if first.window;
run;

Stack the data sets identifying the source datasets
Sort by ID, window and source datasets
Keep ones where in DS1 first

@ANKH1 wrote:

I have this dataset with. Some IDs are missing values for window=0
data ds1;
input ID$ window m1 m2 m3;
datalines;
1	3	157	56	111
1	6	145	35	145
1	9	178	78	200
2	0	132	27	201
2	6	124	95	211
2	12	127	37	532
2	15	178	35	465
3	6	148	74	231
3	12	138	49	532
;
run;
The dataset that should be used to complement data when window=0 is missing is this:
data ds2;
input ID$ window m1 m2 m3;
datalines;
1	0	145	54	201
2	0	143	45	143
3	0	132	47	165
4	0	131	48	253
9	0	124	52	511
;
run;
We need to replace missing window=0 from ds1 with values from ds2. Note that data should be added only when window=0 is missing from ds1. See the desired output below for ID 2, it keeps the data from window=0 from ds1. How can we get the below dataset?

ID window m1 m2 m3

1 0 145 54 201

1 3 157 56 111

1 6 145 35 145

1 9 178 78 200

2 0 132 27 201

2 6 124 95 211

2 12 127 37 532

2 15 178 35 465

3 0 132 47 165

3 6 148 74 231

3 12 138 49 532

4 0 131 48 253

9 0 124 52 511

View solution in original post

Reeza · Posted 11-27-2023 03:27 PM

Lazy way:

data full_data;
set ds1 ds2 indsname=src;
source=src;
run;

proc sort data=full_data;
by id window source;
run;

data want;
set full_data;
by id window source;
if first.window;
run;

Stack the data sets identifying the source datasets
Sort by ID, window and source datasets
Keep ones where in DS1 first

@ANKH1 wrote:

I have this dataset with. Some IDs are missing values for window=0
data ds1;
input ID$ window m1 m2 m3;
datalines;
1	3	157	56	111
1	6	145	35	145
1	9	178	78	200
2	0	132	27	201
2	6	124	95	211
2	12	127	37	532
2	15	178	35	465
3	6	148	74	231
3	12	138	49	532
;
run;
The dataset that should be used to complement data when window=0 is missing is this:
data ds2;
input ID$ window m1 m2 m3;
datalines;
1	0	145	54	201
2	0	143	45	143
3	0	132	47	165
4	0	131	48	253
9	0	124	52	511
;
run;
We need to replace missing window=0 from ds1 with values from ds2. Note that data should be added only when window=0 is missing from ds1. See the desired output below for ID 2, it keeps the data from window=0 from ds1. How can we get the below dataset?

ID window m1 m2 m3

1 0 145 54 201

1 3 157 56 111

1 6 145 35 145

1 9 178 78 200

2 0 132 27 201

2 6 124 95 211

2 12 127 37 532

2 15 178 35 465

3 0 132 47 165

3 6 148 74 231

3 12 138 49 532

4 0 131 48 253

9 0 124 52 511

ANKH1 · Posted 11-27-2023 05:44 PM

Thank you! The output is exactly what we need. I have a question, how does

if first.window;

accounts for the first window for each value, i.e., first value of window=0 and does not delete the window=6, 12, etc. rows? I would've thought that first.window, filters the very first row of the variable window. Thanks!

A_Kh · Posted 11-27-2023 06:04 PM

first.variable refers to a distinct value (making a group). In your data 0, 6, 12 all makes a distinct subgroups within your ID groups. So, the first.window will keep the first observations from each distinct subgroups. If your statement were first.ID then it would delete all other observations, except window=0 (where your window has this value).

ANKH1 · Posted 11-27-2023 07:29 PM

Oh! Thank you so much for your explanation! I really appreciate it.

Tom · Posted 11-27-2023 06:15 PM

Doesn't look lazy to me. Looks like a lot of extra work.

Just MERGE the two datasets.

Kurt_Bremser · Posted 11-27-2023 03:32 PM

data want;
set ds1 ds2;
by id window;
if first.window;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

ANKH1 · Posted 11-27-2023 05:45 PM

Thank you!

Tom · Posted 11-27-2023 06:13 PM

Just MERGE the two datasets. Make sure to list the dataset you want to WIN last in the MERGE statement.

data want;
  merge ds2 ds1;
  by id window;
run;

In your data only id='2' has any observations with window=0 in DS1.

So let's see what values SAS used for that observation:

Replace missing values in one dataset with values from another dataset

Re: Replace missing values in one dataset with values from another dataset

Re: Replace missing values in one dataset with values from another dataset

Re: Replace missing values in one dataset with values from another dataset

Re: Replace missing values in one dataset with values from another dataset

Re: Replace missing values in one dataset with values from another dataset

Re: Replace missing values in one dataset with values from another dataset

Re: Replace missing values in one dataset with values from another dataset

Re: Replace missing values in one dataset with values from another dataset

Re: Replace missing values in one dataset with values from another dataset

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away