hi,
I need help while merging multiple datasets. I have 6 datasets as following:
| Dataset a | ||||||||
| id | age | in_2012 | ||||||
| 1 | 30 | Y | ||||||
| Dataset b | ||||||||
| id | age | in_2013 | ||||||
| Dataset c | ||||||||
| id | age | in_2014 | ||||||
| 1 | 32 | Y | ||||||
| Dataset d | ||||||||
| id | age | in_2015 | ||||||
| 1 | 33 | Y | ||||||
| Dataset e | ||||||||
| id | age | in_2016 | ||||||
| 1 | 34 | Y | ||||||
| Dataset f | ||||||||
| id | age | in_2017 | ||||||
| 1 | 35 | Y | ||||||
| Dataset g | ||||||||
| id | age | in_2018 | ||||||
| 1 | 36 | Y | ||||||
| Final output | ||||||||
| id | age | in_2012 | in_2013 | in_2014 | in_2015 | in_2016 | in_2017 | in_2018 |
| 1 | 30 | Y | ||||||
| 1 | 32 | Y | ||||||
| 1 | 33 | Y | ||||||
| 1 | 34 | Y | ||||||
| 1 | 35 | Y | ||||||
| 1 | 36 | Y |
Subject Id 1 has record present from 2012 to 2018 hence we have variable in_2012... to in_2018.
When i am merging the above 6 datasets by id and age , i am getting the above output.
Can anyone pl guide me how can i get the output with the variable in_2012 to in_2018 has "Y" in all the rows(if records are available in all datasets) and not in selective rows as presented above.
This is the ouput which i want:
| id | age | in_2012 | in_2013 | in_2014 | in_2015 | in_2016 | in_2017 | in_2018 |
| 1 | 30 | Y | Y | Y | Y | Y | Y | |
| 1 | 32 | Y | Y | Y | Y | Y | Y | |
| 1 | 33 | Y | Y | Y | Y | Y | Y | |
| 1 | 34 | Y | Y | Y | Y | Y | Y | |
| 1 | 35 | Y | Y | Y | Y | Y | Y | |
| 1 | 36 | Y | Y | Y | Y | Y | Y |
Thanks,
Jeetender
data A;
input ID AGE IN_2012 $1.;
datalines;
1 30 Y
;
run;
data B;
input ID AGE IN_2013 $1.;
datalines;
;
run;
data C;
input ID AGE IN_2014 $1.;
datalines;
1 32 Y
;
run;
data D;
input ID AGE IN_2015 $1.;
datalines;
1 33 Y
;
run;
data E;
input ID AGE IN_2016 $1.;
datalines;
1 34 Y
;
run;
data F;
input ID AGE IN_2017 $1.;
datalines;
1 35 Y
;
run;
data G;
input ID AGE IN_2018 $1.;
datalines;
1 36 Y
;
run;
data A_TO_G;
set A B C D E F G;
keep ID AGE;
run;
proc sort data=A_TO_G nodupkey;
by ID AGE;
run;
proc sort data=A;
by ID AGE;
run;
proc sort data=B;
by ID AGE;
run;
proc sort data=C;
by ID AGE;
run;
proc sort data=D;
by ID AGE;
run;
proc sort data=E;
by ID AGE;
run;
proc sort data=F;
by ID AGE;
run;
proc sort data=G;
by ID AGE;
run;
data ALL_TABLE;
merge A_TO_G(in=a)
A(drop=age)
B(drop=age)
C(drop=age)
D(drop=age)
E(drop=age)
F(drop=age)
G(drop=age);
by ID;
if a;
run;
data A;
input ID AGE IN_2012 $1.;
datalines;
1 30 Y
;
run;
data B;
input ID AGE IN_2013 $1.;
datalines;
;
run;
data C;
input ID AGE IN_2014 $1.;
datalines;
1 32 Y
;
run;
data D;
input ID AGE IN_2015 $1.;
datalines;
1 33 Y
;
run;
data E;
input ID AGE IN_2016 $1.;
datalines;
1 34 Y
;
run;
data F;
input ID AGE IN_2017 $1.;
datalines;
1 35 Y
;
run;
data G;
input ID AGE IN_2018 $1.;
datalines;
1 36 Y
;
run;
data A_TO_G;
set A B C D E F G;
keep ID AGE;
run;
proc sort data=A_TO_G nodupkey;
by ID AGE;
run;
proc sort data=A;
by ID AGE;
run;
proc sort data=B;
by ID AGE;
run;
proc sort data=C;
by ID AGE;
run;
proc sort data=D;
by ID AGE;
run;
proc sort data=E;
by ID AGE;
run;
proc sort data=F;
by ID AGE;
run;
proc sort data=G;
by ID AGE;
run;
data ALL_TABLE;
merge A_TO_G(in=a)
A(drop=age)
B(drop=age)
C(drop=age)
D(drop=age)
E(drop=age)
F(drop=age)
G(drop=age);
by ID;
if a;
run;
I'm sort of interested in how you are going to use that output file.
The structure and variable names makes me think that sometime in 2019 or 2020 you will be adding more columns. Which means any code processing this data set will need to account for the added column a may become cumbersome.
This type of structure to a data set is pretty common to someone attempting to replicate a process that had been done in a spreadsheet and frequently is suboptimal when using other software.
Assuming your data sets are all sorted:
data combined;
set a b c d e f g;
by id;
run;
data years;
update combined (obs=0 drop=age) combined (drop=age);
by id;
run;
data want;
merge combined (keep=id age) years;
by id;
run;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.