Desktop productivity for business analysts and programmers

Merge Statement resulting in extra variables

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 94
Accepted Solution

Merge Statement resulting in extra variables

Here is my code:

Data file_results;

Merge file_a file_b;

by variable_x;

run;

Issue:  There are 134 columns (variables) in file_a and file_b.  When I merge these two files together, the resultant dataset has 135 variables.  There are 2 varaibles added at the end....and neither are in file_a and file_b.

Any advice is appreciated!
Thanks,
Jenne


Accepted Solutions
Solution
‎01-17-2013 08:16 AM
PROC Star
Posts: 7,433

Re: Merge Statement resulting in extra variables

: Just remove The $ sign.  e.g.: managed_function_L4=put(a_managed_function_L4,5.);

View solution in original post


All Replies
PROC Star
Posts: 7,433

Re: Merge Statement resulting in extra variables

I think you will have to post a copy of your log in order for anyone to help.

Frequent Contributor
Posts: 94

Re: Merge Statement resulting in extra variables

Here is the log...thanks in advance for any help.

***************************************

1                                                          The SAS System                          21:55 Wednesday, January 16, 2013

1          ;*';*";*/;quit;run;

2          OPTIONS PAGENO=MIN;

3          %LET _CLIENTTASKLABEL='combined_gdw';

4          %LET _CLIENTPROJECTPATH='/ptr/ops/proj/jle/monthly_headcount_walk/monthly_headcount_walk.egp';

5          %LET _CLIENTPROJECTNAME='monthly_headcount_walk.egp';

6          %LET _SASPROGRAMFILE='/ptr/ops/proj/jle/monthly_headcount_walk/combined_gdw.sas';

7         

8          ODS _ALL_ CLOSE;

9          OPTIONS DEV=ACTIVEX;

NOTE: Procedures may not support all options or statements for all devices. For details, see the documentation for each procedure.

10         GOPTIONS XPIXELS=0 YPIXELS=0;

11         FILENAME EGSR TEMP;

12         ODS tagsets.sasreport12(ID=EGSR) FILE=EGSR STYLE=Analysis

12       ! STYLESHEET=(URL="file:///C:/Program%20Files/SAS/EnterpriseGuide/4.3/Styles/Analysis.css") NOGTITLE NOGFOOTNOTE

12       ! GPATH=&sasworklocation ENCODING=UTF8 options(rolap="on");

NOTE: Writing TAGSETS.SASREPORT12(EGSR) Body file: EGSR

13        

14         GOPTIONS ACCESSIBLE;

15         libname jen1 "/ptr/ops/proj/jle/monthly_headcount_walk/gdw_monthend_files/current_month";

NOTE: Libref JEN1 was successfully assigned as follows:

      Engine:        V9

      Physical Name: /ptr/ops/proj/jle/monthly_headcount_walk/gdw_monthend_files/current_month

16         libname jen2 "/ptr/ops/proj/jle/monthly_headcount_walk/gdw_monthend_files/prior_month";

NOTE: Libref JEN2 was successfully assigned as follows:

      Engine:        V9

      Physical Name: /ptr/ops/proj/jle/monthly_headcount_walk/gdw_monthend_files/prior_month

17        

18         /* CURRENT MONTH - Convert variables from numeric to character */

19        

20         data non_employee_headcount_a;

21              set jen1.non_employee_headcount_current1 (rename=(global_function=a_global_function));

22              global_function=put(a_global_function,$4.);

WARNING: Variable a_global_function has already been defined as numeric.

23              drop a_global_function;

24              run;

NOTE: There were 42844 observations read from the data set JEN1.NON_EMPLOYEE_HEADCOUNT_CURRENT1.

NOTE: The data set WORK.NON_EMPLOYEE_HEADCOUNT_A has 42844 observations and 134 variables.

NOTE: Compressing data set WORK.NON_EMPLOYEE_HEADCOUNT_A decreased size by 57.33 percent.

      Compressed is 762 pages; un-compressed would require 1786 pages.

NOTE: DATA statement used (Total process time):

      real time           4.86 seconds

      cpu time            4.25 seconds

     

25        

26         data jen1.non_employee_headcount_b;

27              set non_employee_headcount_a (rename=(managed_function_L4=a_managed_function_L4));

28              managed_function_L4=put(a_managed_function_L4,$5.);

WARNING: Variable a_managed_function_L4 has already been defined as numeric.

29              drop a_managed_function_L4;

30              run;

NOTE: There were 42844 observations read from the data set WORK.NON_EMPLOYEE_HEADCOUNT_A.

NOTE: The data set JEN1.NON_EMPLOYEE_HEADCOUNT_B has 42844 observations and 134 variables.

NOTE: Compressing data set JEN1.NON_EMPLOYEE_HEADCOUNT_B decreased size by 57.39 percent.

      Compressed is 761 pages; un-compressed would require 1786 pages.

NOTE: DATA statement used (Total process time):

2                                                          The SAS System                          21:55 Wednesday, January 16, 2013

      real time           3.83 seconds

      cpu time            3.51 seconds

     

31        

32         /* CURRENT MONTH - Convert variables from character to numeric */

33        

34         data jen1.direct_staff_headcount_b;

35              set jen1.direct_staff_headcount_current1 (rename=(business_contact_id=a_business_contact_id));

36              business_contact_id=input(a_business_contact_id,BEST12.);

37              drop a_business_contact_id;

38              run;

NOTE: There were 62934 observations read from the data set JEN1.DIRECT_STAFF_HEADCOUNT_CURRENT1.

NOTE: The data set JEN1.DIRECT_STAFF_HEADCOUNT_B has 62934 observations and 133 variables.

NOTE: Compressing data set JEN1.DIRECT_STAFF_HEADCOUNT_B decreased size by 54.69 percent.

      Compressed is 1097 pages; un-compressed would require 2421 pages.

NOTE: DATA statement used (Total process time):

      real time           5.92 seconds

      cpu time            5.59 seconds

     

39        

40        

41         /* CURRENT MONTH - Merge/Combine direct_staff_headcount and non_employee_headcount */

42        

43         Data jen1.combined_gdw_current_a;

44        

45         MERGE jen1.direct_staff_headcount_b

46         jen1.non_employee_headcount_b;

47         by GEID;

48        

49         run;

NOTE: There were 62934 observations read from the data set JEN1.DIRECT_STAFF_HEADCOUNT_B.

NOTE: There were 42844 observations read from the data set JEN1.NON_EMPLOYEE_HEADCOUNT_B.

NOTE: The data set JEN1.COMBINED_GDW_CURRENT_A has 105778 observations and 135 variables.

NOTE: Compressing data set JEN1.COMBINED_GDW_CURRENT_A decreased size by 55.35 percent.

      Compressed is 1817 pages; un-compressed would require 4069 pages.

NOTE: DATA statement used (Total process time):

      real time           8.95 seconds

      cpu time            8.76 seconds

     

50        

51         GOPTIONS NOACCESSIBLE;

52         %LET _CLIENTTASKLABEL=;

53         %LET _CLIENTPROJECTPATH=;

54         %LET _CLIENTPROJECTNAME=;

55         %LET _SASPROGRAMFILE=;

56        

57         ;*';*";*/;quit;run;

58         ODS _ALL_ CLOSE;

59        

60        

61         QUIT; RUN;

62        

                                                     

PROC Star
Posts: 7,433

Re: Merge Statement resulting in extra variables

: You used a merge, but none of the GEID values matched across the two files.  As such, the resulting file contained one record for each record that existed in the two files you tried to merge.

It would have been more appropriate to combine them with a set statement.

One of the extra fields had to come from WORK.NON_EMPLOYEE_HEADCOUNT_A, as it had 134 variables.  I would think that there is one other variable in JEN1.DIRECT_STAFF_HEADCOUNT_B that doesn't have a match in the other file, thus accounting for the 135th variable that was added to the file.

Also, in your two early datasteps, you use put statements to convert a numeric variable, but use an character format.  You will eliminate the warning if you change those to represent numeric formats (e.g., 3.).

Frequent Contributor
Posts: 94

Re: Merge Statement resulting in extra variables

I had use SET statement first and got the same result.  So I thought I used an incorrect stmt so I tried MERGE.  If I have to combine two files where the rows are unique, then is it best to use SET instead of MERGE?

RE:  "Also, in your two early datasteps, you use put statements to convert a numeric variable, but use an character format.  You will eliminate the warning if you change those to represent numeric formats (e.g., 3.)."   The original variable is NUMERIC and I want to convert it to CHARACTER, that is why I used a PUT stmt and character format.  Did I do this incorrectly?  What should I use if I want to convert the NUMERIC variable to CHARACTER?

Thanks for the help Arther!

Solution
‎01-17-2013 08:16 AM
PROC Star
Posts: 7,433

Re: Merge Statement resulting in extra variables

: Just remove The $ sign.  e.g.: managed_function_L4=put(a_managed_function_L4,5.);

Frequent Contributor
Posts: 94

Re: Merge Statement resulting in extra variables

Thank you for the help Arthur!

PROC Star
Posts: 7,433

Re: Merge Statement resulting in extra variables

: Also, you could use something like the following to find out which variables were only in one of the files:

proc sql;

  create table differences as

    select name, count(*) as count

      from dictionary.columns

        where libname="JEN" and

              (memname="DIRECT_STAFF_HEADCOUNT_B" or

               memname="NON_EMPLOYEE_HEADCOUNT_B")

          group by name

            having count(*) eq 1

  ;

quit;

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 379 views
  • 0 likes
  • 2 in conversation