data matching!

Accepted Solution Solved
Reply
Contributor
Posts: 51
Accepted Solution

data matching!

I want to merge two files with the same IDs as the following. We have the same age categories for each group. Could you show me how to deal with this matter? Thank you so much!

C06001PRPLACE OF BIRTH BY AGE IN PUERTO RICO
C06001PRUniverse:  Total population in Puerto Rico
C06001PRTotal:
C06001PR  Under 5 years
C06001PR  5 to 17 years
C06001PR  18 to 24 years
C06001PR  25 to 44 years
C06001PR  45 to 54 years
C06001PR  55 to 64 years
C06001PR  65 to 74 years
C06001PR  75 to 84 years
C06001PR  85 years and over
C06001PR Born in Puerto Rico:
C06001PR  Under 5 years
C06001PR  5 to 17 years
C06001PR  18 to 24 years
C06001PR  25 to 44 years
C06001PR  45 to 54 years
C06001PR  55 to 64 years
C06001PR  65 to 74 years
C06001PR  75 to 84 years
C06001PR  85 years and over
C06001PR Born in the United States:
C06001PR  Under 5 years
C06001PR  5 to 17 years
C06001PR  18 to 24 years
C06001PR  25 to 44 years
C06001PR  45 to 54 years
C06001PR  55 to 64 years
C06001PR  65 to 74 years
C06001PR  75 to 84 years
C06001PR  85 years and over
C06001PR Native; born elsewhere:
C06001PR  Under 5 years
C06001PR  5 to 17 years
C06001PR  18 to 24 years
C06001PR  25 to 44 years
C06001PR  45 to 54 years
C06001PR  55 to 64 years
C06001PR  65 to 74 years
C06001PR  75 to 84 years
C06001PR  85 years and over
C06001PR Foreign born:
C06001PR  Under 5 years
C06001PR  5 to 17 years
C06001PR  18 to 24 years
C06001PR  25 to 44 years
C06001PR  45 to 54 years
C06001PR  55 to 64 years
C06001PR  65 to 74 years
C06001PR  75 to 84 years
C06001PR  85 years and over

Accepted Solutions
Solution
‎06-04-2013 12:10 PM
PROC Star
Posts: 1,167

Re: data matching!

I am familiar with this result format, although I don't like it. Here's some advice:

I assume that these two columns are followed by some numbers of interest. The first challenge is to create a SAS dataset that contains your row labels in one variable, and your numbers of interest in other variables.

The next issue is that your row labels aren't unique, so you can't use any matching techniques. I recommend you just match row for row, checking that your row labels are the same. You should be fine then.

One question; is the table identifier C06001PR for both files? If so, the above will work. If not, you'll need to do a little more processing.

Here's some code to match two sas datasets that have been loaded. table1id is the id row for table 1, and table2id is the id for table 2. table1stat1, table1stat2, table2stat1 ... are the statistics of interest. Make sure you check the log afterwards for error messages.

data combined;
set table1;
set table2;

if table1id ~= table2id then
  error "keys don't match";
run;

  Tom

View solution in original post


All Replies
Occasional Contributor
Posts: 16

Re: data matching!

Hi, how are the files related? Are the files of the same subjects with repeated measures or different subjects? If different subjects, I would conduct a proc append since the variable names are the same.

Contributor
Posts: 51

Re: data matching!

Hi CaraJ,

Two files are identical for the first two columns as I showed in the previous thread. Thank you so much!

Super Contributor
Posts: 644

Re: data matching!

If I understand your problem the two files contain the same two columns shown and you want to combine them by matching the rows.

The problem is that the first column seems to be redundant (perhaps values change deeper in the file, so maybe not); and the second column has duplcated age categories which however are to be understood to correspond to different populations.

However, provided both files contain the same number of records and they are in the right order you can simply do a datastep merge without a BY statement:

Data want ;

     merge A B ;

Run ;

Now if any of the columns in B (other than the first 2) have identical names to columns in A then you will need to have a rename option

Data want ;

     merge A

           B  (rename = (col3 = col3B col4 = col4B))

          ;

Run ;

Frankly, I would instead do this merge in an Excel copy and paste because it looks like that's where the data originated.

Richard

Occasional Contributor
Posts: 10

Re: data matching!

please can any body tell me how to post a question in disscussion

Solution
‎06-04-2013 12:10 PM
PROC Star
Posts: 1,167

Re: data matching!

I am familiar with this result format, although I don't like it. Here's some advice:

I assume that these two columns are followed by some numbers of interest. The first challenge is to create a SAS dataset that contains your row labels in one variable, and your numbers of interest in other variables.

The next issue is that your row labels aren't unique, so you can't use any matching techniques. I recommend you just match row for row, checking that your row labels are the same. You should be fine then.

One question; is the table identifier C06001PR for both files? If so, the above will work. If not, you'll need to do a little more processing.

Here's some code to match two sas datasets that have been loaded. table1id is the id row for table 1, and table2id is the id for table 2. table1stat1, table1stat2, table2stat1 ... are the statistics of interest. Make sure you check the log afterwards for error messages.

data combined;
set table1;
set table2;

if table1id ~= table2id then
  error "keys don't match";
run;

  Tom

Contributor
Posts: 51

Re: data matching!

Hi CaraJ and  RichardinOz,

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 279 views
  • 3 likes
  • 5 in conversation