OK, so that's the source of the error. The fix would be to change the matching condition:
b.Client_ID = input(a.Client_ID,6.)
Finally, note that you are selecting too many variables. You don't have to select variables in order to use them in join conditions. Which Client_ID would you like to keep in the final result, the one from Client_Report1 or the one from Client_Report2? That's the one that should appear in the SELECT statement. Similar considerations apply to other variables. If you want to keep both, you have to add AS to the SELECT statement for one variable in each pair.
here is the change made:
proc sql;
create table work.test as
select a.Client_id, a.Client_name, a.Score, a.Status,
(a.score-b.score) as Score_diff,
case when a.status=b.status then 'match' else 'no match' end as Status_test
from Client_Report2 a
left join Client_Report1 b on b.Client_id=input(a.Client_id,6.);
quit;
i got this error in the log:
ERROR: INPUT function requires a character argument.
ERROR: INPUT function requires a character argument.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE SQL used (Total process time):
real time 0.04 seconds
cpu time 0.00 seconds
any idea?
A little too much switching around regarding which data set is which. Try:
on a.Client_id = input(b.Client_id,6.);
@Astounding
the code above run with the change you made above. However the result is not what i was expecting. The original data Client_Report1 has 56965 rows and Client_Report2 has 56987 rows.
The result of the work.test shows 2654809 rows which cannot be loaded. i know it is hard for you to check without data in front you.
Did you make sure to have the LEFT JOIN ON ...
Here is the code run:
proc sql;
create table work.test as
select a.Client_id, a.Client_name, a.Score, a.Status,
(a.score-b.score) as Score_diff,
case when a.status=b.status then 'match' else 'no match' end as Status_test
from Client_Report2 a
left join Client_Report1 b on a.Client_id=input(b.Client_id,6.);
quit;
with suggestion from Astounding in post 25.
If your left join is ON, you should only get the 56,987 rows. Even a full outer join with no matching records will yield at the most only 56,987 + 56,965 rows. 2,654,809 is not even a sum of rows from both tables. Where did the extra 2,600,000 rows come from when these records aren't in either dataset?
yeah that is what im trying to find out. the result is too much that it cannot even display (load) so i can check where the extra data coming from.
No, Armand. My point was you probably have the wrong row counts.
There is no extra data coming unless existing records were broken out into many small records or there is a cartesian join (plain JOIN without left, right, full, center etc.) being done. I woiuld say with SQL the latter is more probable. See if your 'LEFT' keyword is actually in effect. This assuming the initial record counts you posted are right.
So yes that s the right row count you have in the post. Now the question is, where the problem is come from because i have used left join multiple time in my codes and got the expected results. So i am still looking at other ways beside left join to get the result.
I think your original question may have been answered and it's time to end this thread and start a new one.
Do reference this thread if it's still applicable, but with the number of posts its hard to determine what's your current issue.
Do you have multiple ID's in each file? If so each will join with each ID resulting in a big table.
Perhaps your join needs another ON condition, i.e. date/month.
That's only one data set, what about the second data set. Can you post the actual proc contents output please? As well as the log that shows the error?
the file u are requesting is huge. To simplify that s what proc content show:
Client_Report1:
TYPE LEN FORMAT INFORMAT LABEL
Client_id: Num 6 $6 $6 imdb id
Client_name: CHAR 68 $68 $68 Client name
Score: Num 8 Score
Status: CHAR 128 $128 $128 Status
Client_Report2
TYPE LEN FORMAT INFORMAT LABEL
Client_id: Num 6 $6 $6 imdb id
Client_name: CHAR 68 $68 $68 Client name
Score: Num 8 Score
Status: CHAR 128 $128 $128 Status
log:
ERROR: Expression using equals (=) has components that are of different data types.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE SQL used (Total process time):
real time 0.04 seconds
cpu time 0.01 seconds
You have
(a.score-b.scorre)
but the variable should be b.score in the original post. Was this retyped or actual copy?
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.