BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ballardw
Super User

Doesn't work is awful vague.

Are there errors in the log?: Post the code and log in a code box opened with the "</>" icon to maintain formatting of error messages.

No output? Post any log in a code box.

Unexpected output? Provide input data in the form of data step code pasted into a code box, the actual results and the expected results. Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the "</>" icon or attached as text to show exactly what you have and that we can test code against.

 

 

With that in mind, Proc Import makes guesses as to variable type and length separately for each file.

So depending on the actual values present in the two files your Acc_number could be character in one file and numeric in the other which would have a note in the log about mismatched data type.

Another possibility is that if both are character the lengths are different which could lead to values you expect to match not matching as 'A0123' is not the same as 'A01234'.

Another possibility if the variables are character is that values are entered differently. Something in one set might be entered as 'a0123' and 'A0123' in the others. So the case of the letter A doesn't match. There are quite a number of possible value mismatches in this sort of scenario, leading zeroes in one file and not in the other, all upper case, mixed case, lower case and spelling such as 'ABC Co.' 'ABC Co' 'ABC Company' and such, none of these match the others.

GN0001
Barite | Level 11
Hello ballardw,
Thanks for the response. It is interesting to know these codes. The first one gave me an error and I am not able to fix it.
The second one created a data set which is different than mine.
I use merge to append the data sets and I think the order of variable are important. The data type is not important because I heard that SAS accepts mixed data types in one variable.
I posted my code and my log any way. The code doesn't arrange the variables on the order that I need for merging.
I might get a job to write SAS programming. It has been for two days that I am trying to merge to data sets and even my proc sort doesn't sort properly. I told them that I don't have any experience and I am working on it. If if is going to be this hard and challenging, I am going to let this job go.
Regards,
Blue Sky
Blue Blue
ballardw
Super User

@GN0001 wrote:
Hello ballardw,
Thanks for the response. It is interesting to know these codes. The first one gave me an error and I am not able to fix it.
The second one created a data set which is different than mine.
I use merge to append the data sets and I think the order of variable are important. The data type is not important because I heard that SAS accepts mixed data types in one variable.
I posted my code and my log any way. The code doesn't arrange the variables on the order that I need for merging.
I might get a job to write SAS programming. It has been for two days that I am trying to merge to data sets and even my proc sort doesn't sort properly. I told them that I don't have any experience and I am working on it. If if is going to be this hard and challenging, I am going to let this job go.
Regards,
Blue Sky

Highlighted text is false, at least once data is in a SAS data set. One example that creates two data sets with the same named variable of different types. Note the error when combining data.

12   data one;
13      x='1';
14   run;

NOTE: The data set WORK.ONE has 1 observations and 1 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.00 seconds


15
16   data two;
17      x=1;
18   run;

NOTE: The data set WORK.TWO has 1 observations and 1 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds


19
20   data combined;
21     set one two;
ERROR: Variable x has been defined as both character and numeric.
22   run;

NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.COMBINED may be incomplete.  When this step was stopped there were 0
         observations and 1 variables.

SAS will assign one of two variable types when reading data: character or numeric. There is no "mixed" data type.

 

And an example of different length variables.

24   data three;
25      x='a longer value for x';
26   run;

NOTE: The data set WORK.THREE has 1 observations and 1 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.00 seconds


27
28   data combined2;
29     set one three;
30   run;

WARNING: Multiple lengths were specified for the variable x by input data set(s). This can cause
         truncation of data.
NOTE: There were 1 observations read from the data set WORK.ONE.
NOTE: There were 1 observations read from the data set WORK.THREE.
NOTE: The data set WORK.COMBINED2 has 2 observations and 1 variables.

When you look at combined2 data set you will see the values of x are '1' and 'a'.

So it is critical to look at your log when combining data sets for messages like the "defined as both character and numeric" as that is an error and the "multiple lengths" can mean the your data is truncated.

 

Which is why showing a LOG is important when we ask for one. It is clear in the two examples what could be a couple of the "didn't work" scenarios.

 

mkeintz
PROC Star

@GN0001 :

 

As others have pointed out, simply saying your code "doesn't work", ... doesn't work.

 

Show the log at least, and describe what you got vs what you expected.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
GN0001
Barite | Level 11

I posted the code and the log.

Thanks for the response.

Blue Sky

Blue Blue
andreas_lds
Jade | Level 19

@GN0001 wrote:

I posted the code and the log.

Thanks for the response.

Blue Sky


After so many posts it is still not clear what does not work.

proc import reads the variable in the order they have in file it processes. The procedure has no parameters to change the order of variables. Btw. afaik only one procedure needs variables in a certain order: proc export. For anything else the order of variable is hardly relevant.

PaigeMiller
Diamond | Level 26

@GN0001 wrote:

I posted the code and the log.

Thanks for the response.

Blue Sky


The log you posted does not show errors. You need to post a log that shows the ERRORs you mentioned in your first post. That is what we need to see.

--
Paige Miller
Tom
Super User Tom
Super User

This code as posted cannot work.  In the PROC IMPORT steps you are telling it NOT to use the first row in the spreadsheet as the names of the variables.  So there cannot be a variable named Acc_number in either SAS dataset.  When you tell PROC IMPORT that the source does not include names it will either name the variables A, B, C, .... or VAR1, VAR2, VAR3, ... depending on the type of DBMS you are reading from.

 

PROC IMPORT does not support the INPUT statement.

 

Are you sure your assignment is to read from an XLSX file?  Perhaps you are instead supposed to be reading from text files, like comma separated values (aka CSV) files?  In that case uses a data step and not proc import to read the file. Then you can have complete control over the names, types, length and even the order of the variables.

GN0001
Barite | Level 11

Your answer is the best so far. Someone else also pointed out at this.

My code was fixed.

Now I want to merge the data sets, I can't get the result I am supposed to have:

 


PROC IMPORT OUT=Balance_Bank1
DataFile = "/home/nguityn0/Balance_Bank.xlsx"
dbms=xlsx REPLACE;
Sheet="Sheet1";
getnames=YES;
RUN;

/*Getnames is not about getting or not getting headers. It is about deciding if
you want the variable names from the dataset or just random one*/

/*INPUT should be placed before Merge/ set statement.*/

Proc Sort Data=Balance_Bank1 OUT=Balance_Bank1Sorted;
BY Acc_number;
RUN;


PROC IMPORT OUT=Balance_Bank2
DataFile = "/home/nguityn0/Balance_Bank.xlsx"
dbms=xlsx REPLACE;
Sheet="Sheet2";
getnames=YES;
Run;

Proc Sort Data=Balance_Bank2 OUT=Balance_Bank2Sorted;
BY Acc_number;
RUN;

Data NewCollection;
MERGE Balance_Bank1Sorted Balance_Bank2Sorted;
BY Acc_Number;
RUN;

It is Excel sheet and It is not a text file.
These are the variables for each sheet:
Sheet1:
Alphabetic List of Variables and Attributes
# Variable Type Len Format Label
4 Acc_Number Num 8 BEST. Acc_Number
3 Credit Num 8 BEST. Credit
1 Date Num 8 MMDDYY10. Date
2 Debit Num 8 BEST. Debit

Sheet2:
Variable Type Len Format Label
1 Acc_Number Num 8 BEST. Acc_Number
4 Credit Num 8 BEST. Credit
2 Date Num 8 MMDDYY10. Date
3 Debit Num 8 BEST. Debit


I want to merge these two data sets and I don't get the result that it is supposed to be:
Data NewCollection;
MERGE Balance_Bank1Sorted Balance_Bank2Sorted;
BY Acc_Number;
RUN;
Regards,
Blue Sky
Blue Blue
ballardw
Super User

And what exactly is supposed to be the output?

If you merge data sets by a matching variable then the value of the variable in the right most data set will be the one in the output.

A brief example:

data one;
  input x y;
datalines;
1 1
2 2
3 3
;

data two;
  input x y z;
datalines;
1 15 3
2 25 4
3 35 5
;

/* values of Y will come from set two*/
data combined;
   merge one two;
   by x;
run;

/* values of Y will come from set one*/
data combined2;
   merge two one;
   by x;
run;
/* if you want both values for y you need additional variable*/
/* values of Y will come from set one*/
data combined2;
   merge one 
         two (rename=(y=y2))
   ;
   by x;
run;

So in your example where you have 3 variables in common other than acc_number which one(s) do you want in the output?

You would likely need to rename all 3 if you want all 3.

Or perhaps you need to merge By acc_number date?

If you have multiple values of the acc_number in both data sets and only want to "merge" on acc_number you are likely to "matching" different dates to other dates and the results are unpredictable because a data step merge really expects only one of the sets to have multiple values of the BY variables.

OR

You may not want a merge at all. If you expect to have the second set appear after the first set then you want a SET statement, not a Merge statement.

 

You do not show any values or expected result, so we cannot tell what the result should look like.

A short data example like I provide for data one and two with some of your variables is would show 1) the starting values and 2) what you expect for output.

GN0001
Barite | Level 11

Hello ballardw,

Thanks for the response.

1- Can we say that merge is like a inner join?

2- I want all the variables from the left data set and all the variables from right data set together in one table? one dataset has 5 observation and the other dataset has 9 observation (I tried to explain it to the best that I can).

3- Instead of merge, I placed set, the code ran successfully. Why doesn't merge work?

4- On my previous post, I sent what each dataset has and I want to merge these two data sets regardless which dataset comes first. The key to merge is Acc_Number only.

 

Please advise me.

 

Regards,

Blue Sky

 

 

Blue Blue
Kurt_Bremser
Super User

Since your datasets contain identically named columns (and I suppose the contain the same logical values), you do not want to merge or join, you want to stack the dataset by using both in the same SET statement; if they are already ordered by acc_number, you can preserve that order with a BY statement:

data newcollection;
set
  balance_bank1sorted
  balance_bank2sorted
;
by acc_number;
run;

This is called interleaving.

 

Regarding your question #1:

NO. A SQL join and a data step merge work differently, and produce completely different results when you have a many-to-many relationship; also, the data step MERGE tolerates identically named columns (and usually results in unexpected behavior), while SQL (rightfully) gives you a WARNING for that.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 26 replies
  • 1920 views
  • 13 likes
  • 9 in conversation