I was joining two simple tables that reside on an SQL server and thought I would check if there was a difference if I named the parent table first or second. There was and I can't figure out why. Can anyone help please?
I am also wondering what happens when variables have the same name?
Here is the log:
proc sql noprint;
create table cmd1 as
select b.*, a.*
from cmdclaim.claim a, cmdclaim.claimdetail b
where b.claimid=a.claimid
order by a.claimid;
WARNING: Variable ClaimID already exists on file WORK.CMD1.
WARNING: Variable DiagnosticCode already exists on file WORK.CMD1.
WARNING: Variable ModifiedDate already exists on file WORK.CMD1.
WARNING: Variable CreatedDate already exists on file WORK.CMD1.
NOTE: Table WORK.CMD1 created, with 50788 rows and 48 columns.
16 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 15.14 seconds
cpu time 3.41 seconds
proc sql noprint;
create table cmd1 as
select b.*, a.*
from cmdclaim.claim a, cmdclaim.claimdetail b
where a.claimid=b.claimid
order by a.claimid;
WARNING: Variable ClaimID already exists on file WORK.CMD1.
WARNING: Variable DiagnosticCode already exists on file WORK.CMD1.
WARNING: Variable ModifiedDate already exists on file WORK.CMD1.
WARNING: Variable CreatedDate already exists on file WORK.CMD1.
NOTE: Table WORK.CMD1 created, with 50811 rows and 48 columns.
23 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 14.73 seconds
cpu time 3.54 seconds
I would appreciate any help understanding the difference in the results.
Sally
ok to explain , please try to pull the required variables only from one dataset. if there are common variables between two datasets then try to pull required variables from only one dataset. since you kep b.* and a.* even the common variables are pulled from both the datasets. So try a.* and b.specific variable names from other datasets. So that you will not get those warnings.
To add to what has been said already:
1. Use option "options sastrace=',,,d' sastraceloc=saslog nostsuffix;" to see in the log what SQL syntax can get sent to the DBMS.
2. Use option "fullstimer" to get a bit more timing information in the log.
3. Use "feedback" to see in the log if SAS changes the syntax: Proc SQL feedback; ....
5. That the two queries return a different number of rows is concerning and the only explanation I'm having is that you had a different number of rows in your source tables when you executed them
5. CPU and Real Times are not always the same between two runs even for exactly the same syntax. To really see if there is a difference between two SQL code versions you would have to execute them multiple times.
6. If you want to dig deeper then copy the SQL syntax sent to the DBMS from the log and analyze it further using sql server management studio https://technet.microsoft.com/en-us/library/ms191227(v=sql.105).aspx
This is unexpected. Both results should have the same number of obs. Unless the tables were modified between the two queries. Did you try running the first query again?
The query is most likely run on the server. So you may be looking at peculiarities features of SQL Server.
Repeated names are removed from the select list. That should not affect the number of obs, unless you are using DISTINCT.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.