Here is my code:
Data file_results;
Merge file_a file_b;
by variable_x;
run;
Issue: There are 134 columns (variables) in file_a and file_b. When I merge these two files together, the resultant dataset has 135 variables. There are 2 varaibles added at the end....and neither are in file_a and file_b.
Any advice is appreciated!
Thanks,
Jenne
I think you will have to post a copy of your log in order for anyone to help.
Here is the log...thanks in advance for any help.
***************************************
1 The SAS System 21:55 Wednesday, January 16, 2013
1 ;*';*";*/;quit;run;
2 OPTIONS PAGENO=MIN;
3 %LET _CLIENTTASKLABEL='combined_gdw';
4 %LET _CLIENTPROJECTPATH='/ptr/ops/proj/jle/monthly_headcount_walk/monthly_headcount_walk.egp';
5 %LET _CLIENTPROJECTNAME='monthly_headcount_walk.egp';
6 %LET _SASPROGRAMFILE='/ptr/ops/proj/jle/monthly_headcount_walk/combined_gdw.sas';
7
8 ODS _ALL_ CLOSE;
9 OPTIONS DEV=ACTIVEX;
NOTE: Procedures may not support all options or statements for all devices. For details, see the documentation for each procedure.
10 GOPTIONS XPIXELS=0 YPIXELS=0;
11 FILENAME EGSR TEMP;
12 ODS tagsets.sasreport12(ID=EGSR) FILE=EGSR STYLE=Analysis
12 ! STYLESHEET=(URL="file:///C:/Program%20Files/SAS/EnterpriseGuide/4.3/Styles/Analysis.css") NOGTITLE NOGFOOTNOTE
12 ! GPATH=&sasworklocation ENCODING=UTF8 options(rolap="on");
NOTE: Writing TAGSETS.SASREPORT12(EGSR) Body file: EGSR
13
14 GOPTIONS ACCESSIBLE;
15 libname jen1 "/ptr/ops/proj/jle/monthly_headcount_walk/gdw_monthend_files/current_month";
NOTE: Libref JEN1 was successfully assigned as follows:
Engine: V9
Physical Name: /ptr/ops/proj/jle/monthly_headcount_walk/gdw_monthend_files/current_month
16 libname jen2 "/ptr/ops/proj/jle/monthly_headcount_walk/gdw_monthend_files/prior_month";
NOTE: Libref JEN2 was successfully assigned as follows:
Engine: V9
Physical Name: /ptr/ops/proj/jle/monthly_headcount_walk/gdw_monthend_files/prior_month
17
18 /* CURRENT MONTH - Convert variables from numeric to character */
19
20 data non_employee_headcount_a;
21 set jen1.non_employee_headcount_current1 (rename=(global_function=a_global_function));
22 global_function=put(a_global_function,$4.);
WARNING: Variable a_global_function has already been defined as numeric.
23 drop a_global_function;
24 run;
NOTE: There were 42844 observations read from the data set JEN1.NON_EMPLOYEE_HEADCOUNT_CURRENT1.
NOTE: The data set WORK.NON_EMPLOYEE_HEADCOUNT_A has 42844 observations and 134 variables.
NOTE: Compressing data set WORK.NON_EMPLOYEE_HEADCOUNT_A decreased size by 57.33 percent.
Compressed is 762 pages; un-compressed would require 1786 pages.
NOTE: DATA statement used (Total process time):
real time 4.86 seconds
cpu time 4.25 seconds
25
26 data jen1.non_employee_headcount_b;
27 set non_employee_headcount_a (rename=(managed_function_L4=a_managed_function_L4));
28 managed_function_L4=put(a_managed_function_L4,$5.);
WARNING: Variable a_managed_function_L4 has already been defined as numeric.
29 drop a_managed_function_L4;
30 run;
NOTE: There were 42844 observations read from the data set WORK.NON_EMPLOYEE_HEADCOUNT_A.
NOTE: The data set JEN1.NON_EMPLOYEE_HEADCOUNT_B has 42844 observations and 134 variables.
NOTE: Compressing data set JEN1.NON_EMPLOYEE_HEADCOUNT_B decreased size by 57.39 percent.
Compressed is 761 pages; un-compressed would require 1786 pages.
NOTE: DATA statement used (Total process time):
2 The SAS System 21:55 Wednesday, January 16, 2013
real time 3.83 seconds
cpu time 3.51 seconds
31
32 /* CURRENT MONTH - Convert variables from character to numeric */
33
34 data jen1.direct_staff_headcount_b;
35 set jen1.direct_staff_headcount_current1 (rename=(business_contact_id=a_business_contact_id));
36 business_contact_id=input(a_business_contact_id,BEST12.);
37 drop a_business_contact_id;
38 run;
NOTE: There were 62934 observations read from the data set JEN1.DIRECT_STAFF_HEADCOUNT_CURRENT1.
NOTE: The data set JEN1.DIRECT_STAFF_HEADCOUNT_B has 62934 observations and 133 variables.
NOTE: Compressing data set JEN1.DIRECT_STAFF_HEADCOUNT_B decreased size by 54.69 percent.
Compressed is 1097 pages; un-compressed would require 2421 pages.
NOTE: DATA statement used (Total process time):
real time 5.92 seconds
cpu time 5.59 seconds
39
40
41 /* CURRENT MONTH - Merge/Combine direct_staff_headcount and non_employee_headcount */
42
43 Data jen1.combined_gdw_current_a;
44
45 MERGE jen1.direct_staff_headcount_b
46 jen1.non_employee_headcount_b;
47 by GEID;
48
49 run;
NOTE: There were 62934 observations read from the data set JEN1.DIRECT_STAFF_HEADCOUNT_B.
NOTE: There were 42844 observations read from the data set JEN1.NON_EMPLOYEE_HEADCOUNT_B.
NOTE: The data set JEN1.COMBINED_GDW_CURRENT_A has 105778 observations and 135 variables.
NOTE: Compressing data set JEN1.COMBINED_GDW_CURRENT_A decreased size by 55.35 percent.
Compressed is 1817 pages; un-compressed would require 4069 pages.
NOTE: DATA statement used (Total process time):
real time 8.95 seconds
cpu time 8.76 seconds
50
51 GOPTIONS NOACCESSIBLE;
52 %LET _CLIENTTASKLABEL=;
53 %LET _CLIENTPROJECTPATH=;
54 %LET _CLIENTPROJECTNAME=;
55 %LET _SASPROGRAMFILE=;
56
57 ;*';*";*/;quit;run;
58 ODS _ALL_ CLOSE;
59
60
61 QUIT; RUN;
62
: You used a merge, but none of the GEID values matched across the two files. As such, the resulting file contained one record for each record that existed in the two files you tried to merge.
It would have been more appropriate to combine them with a set statement.
One of the extra fields had to come from WORK.NON_EMPLOYEE_HEADCOUNT_A, as it had 134 variables. I would think that there is one other variable in JEN1.DIRECT_STAFF_HEADCOUNT_B that doesn't have a match in the other file, thus accounting for the 135th variable that was added to the file.
Also, in your two early datasteps, you use put statements to convert a numeric variable, but use an character format. You will eliminate the warning if you change those to represent numeric formats (e.g., 3.).
I had use SET statement first and got the same result. So I thought I used an incorrect stmt so I tried MERGE. If I have to combine two files where the rows are unique, then is it best to use SET instead of MERGE?
RE: "Also, in your two early datasteps, you use put statements to convert a numeric variable, but use an character format. You will eliminate the warning if you change those to represent numeric formats (e.g., 3.)." The original variable is NUMERIC and I want to convert it to CHARACTER, that is why I used a PUT stmt and character format. Did I do this incorrectly? What should I use if I want to convert the NUMERIC variable to CHARACTER?
Thanks for the help Arther!
Thank you for the help Arthur!
: Also, you could use something like the following to find out which variables were only in one of the files:
proc sql;
create table differences as
select name, count(*) as count
from dictionary.columns
where libname="JEN" and
(memname="DIRECT_STAFF_HEADCOUNT_B" or
memname="NON_EMPLOYEE_HEADCOUNT_B")
group by name
having count(*) eq 1
;
quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.