Hello All,
I have two data sets as below. Data sets have different number of observations and one common variable "TICKER". What is the best SAS code to merge them. Thanks.
DATA A
| TICKER | YEARP | BSIZEP |
| ACGN | 2019 | 8 |
| ACRS | 2019 | 8 |
| ADAP | 2019 | 10 |
DATA B
| TICKER | YEAR | DBSIZE |
| ACGN | 2020 | 7 |
| ACGN | 2021 | 7 |
| ACMR | 2021 | 5 |
| ACRS | 2020 | 9 |
| ADAP | 2020 | 9 |
Please do also show the intended result of the merge.
Thanks for the response. I would like to have the following data sets after merging. And run a paired T-test. (like between BSIZEP and DBSIZE) Data A has a YEARP observation during 2018-2019, and data B has a Year 2020-2021 observation. I really appreciate the help.
DATA A AND DATA B MERGED | |||||
| TICKER | YEARP | BSIZEP | TICKER | YEAR | DBSIZE |
| ACGN | 2019 | 8 | ACGN | 2020 | 7 |
| . | . | . | ACGN | 2021 | 7 |
| . | . | . | ACMR | 2021 | 5 |
| ACRS | 2019 | 8 | ACRS | 2020 | 9 |
| ADAP | 2018 | 10 | ADAP | 2020 | 9 |
Not really sure if that's the ideal data structure for your t-test but below what you've been asking for.
Please note that I've changed the year in the last row for data A so it matches what you show us as desired result.
data A;
infile datalines dsd truncover dlm=' ';
length TICKER $5;
input TICKER $ YEARP BSIZEP;
datalines;
ACGN 2019 8
ACRS 2019 8
ADAP 2018 10
;
run;
proc sort data=a;
by ticker yearp;
run;
data B;
infile datalines dsd truncover dlm=' ';
length TICKER $5;
input TICKER $ YEAR DBSIZE;
datalines;
ACGN 2020 7
ACGN 2021 7
ACMR 2021 5
ACRS 2020 9
ADAP 2020 9
;
run;
proc sort data=b;
by ticker year;
run;
data want;
merge a b;
by ticker;
output;
call missing(of _all_);
run;
proc print data=want;
run;
Thanks, Patrick. It is helpful. Do you know what the best mean difference test is? DATA A has fewer observations than DATA B. DATA B has companies that DATA A has and also companies that DATA A does not have. Any thought?
Thanks again
@karanfil wrote:
Thanks for the response. I would like to have the following data sets after merging. And run a paired T-test. (like between BSIZEP and DBSIZE) Data A has a YEARP observation during 2018-2019, and data B has a Year 2020-2021 observation. I really appreciate the help.
DATA A AND DATA B MERGED
TICKER YEARP BSIZEP TICKER YEAR DBSIZE ACGN 2019 8 ACGN 2020 7 . . . ACGN 2021 7 . . . ACMR 2021 5 ACRS 2019 8 ACRS 2020 9 ADAP 2018 10 ADAP 2020 9
That result does not make much sense.
How can you have two variables with the same name?
Why did you eliminate the value of BSIZEP from one of the observations?
Did you mean you wanted this:
If so then just just merge by TICKER.
data want;
merge a b ;
by ticker;
run;
Note that both datasets will need to be sorted by TICKER before you can use the MERGE statement.
You can use PROC TTEST to run a paired t-test.
proc ttest data=want;
paired BSIZEP*DBSIZE;
run;
But note that will use only four of the observations since one of them does not have the BSIZEP value. And t-test is is not the right statistic for that dataset since the same TICKER value can have multiple observations. And those observations all have the same value of BSIZEP.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.