BookmarkSubscribeRSS Feed
karanfil
Fluorite | Level 6

Hello All,

I have two data sets as below. Data sets have different number of observations and one common variable "TICKER". What is the best SAS code to merge them. Thanks. 

DATA A                                                             

TICKERYEARPBSIZEP
ACGN20198
ACRS20198
ADAP201910

 

DATA B

TICKERYEARDBSIZE
ACGN20207
ACGN20217
ACMR20215
ACRS20209
ADAP20209
6 REPLIES 6
karanfil
Fluorite | Level 6

 

Thanks for the response. I would like to have the following data sets after merging. And run a paired T-test. (like between BSIZEP and DBSIZE) Data A has a YEARP observation during 2018-2019, and data B has a Year 2020-2021 observation.  I really appreciate the help.

 

 

DATA A AND DATA B MERGED

   
TICKERYEARPBSIZEPTICKERYEAR DBSIZE
ACGN20198ACGN20207
...ACGN20217
...ACMR20215
ACRS20198ACRS20209
ADAP201810ADAP20209

 

Patrick
Opal | Level 21

Not really sure if that's the ideal data structure for your t-test but below what you've been asking for.

Please note that I've changed the year in the last row for data A so it matches what you show us as desired result.

data A;
    infile datalines dsd truncover dlm=' ';
    length TICKER $5;
    input TICKER $ YEARP BSIZEP;
datalines;
ACGN 2019 8
ACRS 2019 8
ADAP 2018 10
;
run;

proc sort data=a;
  by ticker yearp;
run;

data B;
    infile datalines dsd truncover dlm=' ';
    length TICKER $5;
    input TICKER $ YEAR DBSIZE;
datalines;
ACGN 2020 7
ACGN 2021 7
ACMR 2021 5
ACRS 2020 9
ADAP 2020 9
;
run;

proc sort data=b;
  by ticker year;
run;

data want;
  merge a b;
  by ticker;
  output;
  call missing(of _all_);
run;

proc print data=want;
run;

Patrick_0-1767055211203.png

 

 

karanfil
Fluorite | Level 6

Thanks, Patrick. It is helpful.  Do you know what the best mean difference test is? DATA A has fewer observations than DATA B.  DATA B has companies that DATA A has and also companies that DATA A does not have. Any thought?

 

Thanks again

Tom
Super User Tom
Super User

@karanfil wrote:

 

Thanks for the response. I would like to have the following data sets after merging. And run a paired T-test. (like between BSIZEP and DBSIZE) Data A has a YEARP observation during 2018-2019, and data B has a Year 2020-2021 observation.  I really appreciate the help.

 

 

DATA A AND DATA B MERGED

     
TICKER YEARP BSIZEP TICKER YEAR  DBSIZE
ACGN 2019 8 ACGN 2020 7
. . . ACGN 2021 7
. . . ACMR 2021 5
ACRS 2019 8 ACRS 2020 9
ADAP 2018 10 ADAP 2020 9

 


That result does not make much sense. 

How can you have two variables with the same name?

Why did you eliminate the value of BSIZEP from one of the observations?

 

Did you mean you wanted this:

Screenshot 2025-12-29 at 8.01.09 PM.png

If so then just just merge by TICKER.

data want;
  merge a b ;
  by ticker;
run;

Note that both datasets will need to be sorted by TICKER before you can use the MERGE statement.

 

You can use PROC TTEST to run a paired t-test.

proc ttest data=want;
   paired BSIZEP*DBSIZE;
run;

But note that will use only four of the observations since one of them does not have the BSIZEP value.  And t-test is is not the right statistic for that dataset since the same TICKER value can have multiple observations. And those observations all have the same value of BSIZEP.

karanfil
Fluorite | Level 6
Thanks Tom for help.

sas-innovate-2026-white.png



April 27 – 30 | Gaylord Texan | Grapevine, Texas

Registration is open

Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!

Register now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 158 views
  • 0 likes
  • 4 in conversation