BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Rick23
Calcite | Level 5

Hello,

 

I'm having a problem with joining multiple tables to 'fill-in' missing cells. I will provide a rough idea of my procedure below:

 

Table A has 3 variables:

unique_no         date          region

A                   20100126      North

B                   20110718      South

C                   20120819      East

D                   20140211       West

E                   20130919      North

F                   20121109       East

....and so on.

 

I have tables for each region --

 

Table 'West' has 5 variables:

date              region             walk_speed              run_speed             sleep

20140211      West                 12                             11                            6

20131211      West                 15                             19                            7

20100917      West                 9                               26                            8

20110325      West                 10                             14                            6

 

Tables 'East', 'North', 'South' look exactly the same (in terms of variable names) as Table 'West'.

 

My final table should look like below, after joining all the tables (Table A + addition of the three run/walk/sleep variables from Table 'West', 'North', 'East', 'South') :

unique_no         date          region            walk_speed              run_speed             sleep

A                   20100126      North                     12                            20                         6

B                   20110718      South                     13                           15                          6

C                   20120819      East                       8                             16                          8

D                   20140211       West                      9                            17                          10

E                   20130919      North                      7                             26                          9

F                   20121109       East                      10                            35                          5

....and so on

 

 

Here's my attempt at creating this final table.

Step 1) proc sql;

            create table t1 as

            select a.*, b.*

            from Table_A as a

            Table_North as b

            on (a.date=b.date) and (a.region=b.region);

            quit;

 

This seemed to have correctly merged Table A and Table North (cells that are regions other than North have "." in them).

 

Step 2) proc sql;

            create table t2 as

            select a.*, b.*

            from t1 as a

            Table_West as b

            on (a.date=b.date) and (a.region=b.region);

            quit;

 

This second step had no effect; t2 looks exactly the same as t1.... so this is where I stopped.

 

**Please ignore my numerical data values as they are just random.

 

Thank you very much in advance for any help!

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

The "in=" dataset option creates a temporary boolean variable (will not be included in the output) that is set to 1 when a record from this dataset is present.

Since we merge by region and date, duplicate dates are no problem, as long as no duplicate combinations region-date are present.

View solution in original post

5 REPLIES 5
Reeza
Super User

Your only join condition is region and date? Are those unique in your dataset, I wouldn't assume they are by default and that's going to cause issues...

Rick23
Calcite | Level 5

I'm basically trying to add the three variables (walk_speed, run_speed, sleep) onto Table A.

 

Table A has multiple combinations of different dates and regions (N=10,000 rows). 

 

I would appreciate any alternative procedure that you have in mind.

Kurt_Bremser
Super User

Combine the 4 regional tables into one, and keep the table name:

data regions;
set
  east
  west
  north
  south
;
run;

Then sort and merge:

proc sort data=table_a;
by date region;
run;

proc sort data=regions;
by date region;
run;

data want;
merge
  table_a (in=a)
  regions
;
by date region;
if a;
run;

 

Rick23
Calcite | Level 5

Hi Kurt,

Thanks so much for your help! However, I want to make sure 2 things...

Table A has random dates ranging from 1/1/2009 to 12/31/2015 (not every date is represented, i.e. July 22, 2013 might not be in there). Thus, for my region tables, I set them up so that each region table has data for EACH date ranging from 1/1/2009 to 12/31/2015 (only the dates that "match" with Table A will be filled in the columns). Will your method still work (merging the 4 region tables into 1, we would have duplicate dates (i.e. 1/1/2009 for west, north, south, east)) ?

Also, what is meant by "in=a" and "if a" ? I haven't used the "in=" function before so please clarify for me.

Thanks again!

Kurt_Bremser
Super User

The "in=" dataset option creates a temporary boolean variable (will not be included in the output) that is set to 1 when a record from this dataset is present.

Since we merge by region and date, duplicate dates are no problem, as long as no duplicate combinations region-date are present.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 1593 views
  • 1 like
  • 3 in conversation