I have this dataset with. Some IDs are missing values for window=0
data ds1;
input ID$ window m1 m2 m3;
datalines;
1 3 157 56 111
1 6 145 35 145
1 9 178 78 200
2 0 132 27 201
2 6 124 95 211
2 12 127 37 532
2 15 178 35 465
3 6 148 74 231
3 12 138 49 532
;
run;
The dataset that should be used to complement data when window=0 is missing is this:
data ds2;
input ID$ window m1 m2 m3;
datalines;
1 0 145 54 201
2 0 143 45 143
3 0 132 47 165
4 0 131 48 253
9 0 124 52 511
;
run;
We need to replace missing window=0 from ds1 with values from ds2. Note that data should be added only when window=0 is missing from ds1. See the desired output below for ID 2, it keeps the data from window=0 from ds1. How can we get the below dataset?
ID | window | m1 | m2 | m3 |
1 | 0 | 145 | 54 | 201 |
1 | 3 | 157 | 56 | 111 |
1 | 6 | 145 | 35 | 145 |
1 | 9 | 178 | 78 | 200 |
2 | 0 | 132 | 27 | 201 |
2 | 6 | 124 | 95 | 211 |
2 | 12 | 127 | 37 | 532 |
2 | 15 | 178 | 35 | 465 |
3 | 0 | 132 | 47 | 165 |
3 | 6 | 148 | 74 | 231 |
3 | 12 | 138 | 49 | 532 |
4 | 0 | 131 | 48 | 253 |
9 | 0 | 124 | 52 | 511 |
Lazy way:
data full_data;
set ds1 ds2 indsname=src;
source=src;
run;
proc sort data=full_data;
by id window source;
run;
data want;
set full_data;
by id window source;
if first.window;
run;
@ANKH1 wrote:
I have this dataset with. Some IDs are missing values for window=0
data ds1; input ID$ window m1 m2 m3; datalines; 1 3 157 56 111 1 6 145 35 145 1 9 178 78 200 2 0 132 27 201 2 6 124 95 211 2 12 127 37 532 2 15 178 35 465 3 6 148 74 231 3 12 138 49 532 ; run;
The dataset that should be used to complement data when window=0 is missing is this:
data ds2; input ID$ window m1 m2 m3; datalines; 1 0 145 54 201 2 0 143 45 143 3 0 132 47 165 4 0 131 48 253 9 0 124 52 511 ; run;
We need to replace missing window=0 from ds1 with values from ds2. Note that data should be added only when window=0 is missing from ds1. See the desired output below for ID 2, it keeps the data from window=0 from ds1. How can we get the below dataset?
ID window m1 m2 m3 1 0 145 54 201 1 3 157 56 111 1 6 145 35 145 1 9 178 78 200 2 0 132 27 201 2 6 124 95 211 2 12 127 37 532 2 15 178 35 465 3 0 132 47 165 3 6 148 74 231 3 12 138 49 532 4 0 131 48 253 9 0 124 52 511
Lazy way:
data full_data;
set ds1 ds2 indsname=src;
source=src;
run;
proc sort data=full_data;
by id window source;
run;
data want;
set full_data;
by id window source;
if first.window;
run;
@ANKH1 wrote:
I have this dataset with. Some IDs are missing values for window=0
data ds1; input ID$ window m1 m2 m3; datalines; 1 3 157 56 111 1 6 145 35 145 1 9 178 78 200 2 0 132 27 201 2 6 124 95 211 2 12 127 37 532 2 15 178 35 465 3 6 148 74 231 3 12 138 49 532 ; run;
The dataset that should be used to complement data when window=0 is missing is this:
data ds2; input ID$ window m1 m2 m3; datalines; 1 0 145 54 201 2 0 143 45 143 3 0 132 47 165 4 0 131 48 253 9 0 124 52 511 ; run;
We need to replace missing window=0 from ds1 with values from ds2. Note that data should be added only when window=0 is missing from ds1. See the desired output below for ID 2, it keeps the data from window=0 from ds1. How can we get the below dataset?
ID window m1 m2 m3 1 0 145 54 201 1 3 157 56 111 1 6 145 35 145 1 9 178 78 200 2 0 132 27 201 2 6 124 95 211 2 12 127 37 532 2 15 178 35 465 3 0 132 47 165 3 6 148 74 231 3 12 138 49 532 4 0 131 48 253 9 0 124 52 511
Thank you! The output is exactly what we need. I have a question, how does
if first.window;
accounts for the first window for each value, i.e., first value of window=0 and does not delete the window=6, 12, etc. rows? I would've thought that first.window, filters the very first row of the variable window. Thanks!
first.variable refers to a distinct value (making a group). In your data 0, 6, 12 all makes a distinct subgroups within your ID groups. So, the first.window will keep the first observations from each distinct subgroups. If your statement were first.ID then it would delete all other observations, except window=0 (where your window has this value).
Oh! Thank you so much for your explanation! I really appreciate it.
Doesn't look lazy to me. Looks like a lot of extra work.
Just MERGE the two datasets.
data want;
set ds1 ds2;
by id window;
if first.window;
run;
Just MERGE the two datasets. Make sure to list the dataset you want to WIN last in the MERGE statement.
data want;
merge ds2 ds1;
by id window;
run;
In your data only id='2' has any observations with window=0 in DS1.
So let's see what values SAS used for that observation:
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.