I do proc sql join almost everyday and data merge as well. But this time, ...very frustrating. I tried many different ways but resulting data keeps missing variable Y or D depending how I move datasets before or after join statements in sql. Data merge said my "year" variables are not in same length and I chopped them off to same number of years with no success.
Guys, please help.
by variables are in same characteristics in both datasets: year =char, raceth=num, A5=num, S=num in both datasets.
proc sql;
create table test as
select *
from case_A5_P1 c left join pop p
on c.year=p.year and c.A5=p.A5 and c.S = p.S and c.raceth=p.raceth;
quit;
proc sort data=case_A5_P1;
by year A5 S raceth; run;
proc sort data=cr.pop;
by year A5 S raceth; run;
data merged;
merge pop
case_A5_P1 (in = in2);
by year A5 S raceth;
if in2;
run;
OK. try the following:
proc sql;
create table test as
select *
from case_A5_P1 c left join pop p
on left(c.year)=left(p.year) and c.A5=p.A5 and left(c.S) = left(p.S) and c.raceth=p.raceth;
quit;
To give you guidance here without source data will require a lot of guessing.
Could it be that variables D and Y exists in both source data sets with one of the variables missing in each of the sources?
To test this theory: Switch the data set order in your merge statement and check if you're now getting the same result than with the SQL.
From: merge pop case_A5_P1 (in = in2); To: merge case_A5_P1 (in = in2) pop;
If you now get the same result: Don't use SELECT * in the SQL but actually list the variables explicitly with two level names, or in the merge drop the variable which shouldn't become an input - i.e. case_A5_P1 (in = in2 drop=(Y))
Somehow, one of the join variables doesn't match. I could be obscured by formats that make them look the same. I bet that nothing comes out if you replace left join with inner join.
X = 5 and Y = 5.5 will both look the same if Y has 4.0 as format, but they won't equate in a join or a merge.
@PGStats wrote:
Somehow, one of the join variables doesn't match. I could be obscured by formats that make them look the same. I bet that nothing comes out if you replace left join with inner join.
@PGStats That was my first thought as well but that wouldn't explain why the data step merge populates missing/non-missing variables the other way round.
Can you please test both? The inner join and changing the data sets in the data step merge and let us know the outcome?
@Patrick, I have the feeling that the data views do not correspond to the posted versions of the code.
proc contents and freq on two datasets and link variables.
SAS Output
The SAS System |
Data Set Name | CR.CASE_A5_P1 | Observations | 1483 |
---|---|---|---|
Member Type | DATA | Variables | 5 |
Engine | V9 | Indexes | 0 |
Created | 05/31/2018 18:47:38 | Observation Length | 32 |
Last Modified | 05/31/2018 18:47:38 | Deleted Observations | 0 |
Protection | Compressed | NO | |
Data Set Type | Sorted | YES | |
Label | |||
Data Representation | WINDOWS_64 | ||
Encoding | utf-8 Unicode (UTF-8) |
Engine/Host Dependent Information | |
---|---|
Data Set Page Size | 65536 |
Number of Data Set Pages | 1 |
First Data Page | 1 |
Max Obs per Page | 2039 |
Obs in First Data Page | 1483 |
Number of Data Set Repairs | 0 |
ExtendObsCounter | YES |
Filename | E:\Hard_drive_seagate\QE2\CRregdata\case_a5_p1.sas7bdat |
Release Created | 9.0401M3 |
Host Created | X64_8PRO |
Alphabetic List of Variables and Attributes | |||||
---|---|---|---|---|---|
# | Variable | Type | Len | Format | Informat |
2 | A5 | Num | 8 | ||
5 | D | Num | 8 | ||
4 | RACETH | Num | 8 | ||
3 | S | Char | 1 | $1. | $1. |
1 | Year | Char | 4 |
Sort Information | |
---|---|
Sortedby | Year A5 S RACETH |
Validated | YES |
Character Set | ANSI |
Sort Option | NODUPKEY |
The SAS System |
Data Set Name | CR.POP | Observations | 1980 |
---|---|---|---|
Member Type | DATA | Variables | 5 |
Engine | V9 | Indexes | 0 |
Created | 05/31/2018 19:30:37 | Observation Length | 40 |
Last Modified | 05/31/2018 19:30:37 | Deleted Observations | 0 |
Protection | Compressed | NO | |
Data Set Type | Sorted | YES | |
Label | |||
Data Representation | WINDOWS_64 | ||
Encoding | utf-8 Unicode (UTF-8) |
Engine/Host Dependent Information | |
---|---|
Data Set Page Size | 65536 |
Number of Data Set Pages | 2 |
First Data Page | 1 |
Max Obs per Page | 1632 |
Obs in First Data Page | 1595 |
Number of Data Set Repairs | 0 |
ExtendObsCounter | YES |
Filename | E:\Hard_drive_seagate\QE2\CRregdata\pop.sas7bdat |
Release Created | 9.0401M3 |
Host Created | X64_8PRO |
Alphabetic List of Variables and Attributes | |||||
---|---|---|---|---|---|
# | Variable | Type | Len | Format | Informat |
2 | A5 | Num | 8 | ||
4 | RACETH | Num | 8 | ||
3 | S | Char | 8 | ||
5 | Y | Num | 8 | ||
1 | Year | Char | 5 | $5. | $5. |
Sort Information | |
---|---|
Sortedby | Year A5 S RACETH |
Validated | YES |
Character Set | ANSI |
The SAS System |
Year | Frequency |
---|---|
2004 | 180 |
2005 | 180 |
2006 | 180 |
2007 | 180 |
2008 | 180 |
2009 | 180 |
2010 | 180 |
2011 | 180 |
2012 | 180 |
2013 | 180 |
2014 | 180 |
A5 | Frequency |
---|---|
0 | 110 |
5 | 110 |
10 | 110 |
15 | 110 |
20 | 110 |
25 | 110 |
30 | 110 |
35 | 110 |
40 | 110 |
45 | 110 |
50 | 110 |
55 | 110 |
60 | 110 |
65 | 110 |
70 | 110 |
75 | 110 |
80 | 110 |
85 | 110 |
S | Frequency |
---|---|
1 | 990 |
2 | 990 |
RACETH | Frequency |
---|---|
1 | 396 |
2 | 396 |
3 | 396 |
4 | 396 |
99 | 396 |
The SAS System |
Data Set Name | CR.CASE_A5_P1 | Observations | 1483 |
---|---|---|---|
Member Type | DATA | Variables | 5 |
Engine | V9 | Indexes | 0 |
Created | 05/31/2018 18:47:38 | Observation Length | 32 |
Last Modified | 05/31/2018 18:47:38 | Deleted Observations | 0 |
Protection | Compressed | NO | |
Data Set Type | Sorted | YES | |
Label | |||
Data Representation | WINDOWS_64 | ||
Encoding | utf-8 Unicode (UTF-8) |
Engine/Host Dependent Information | |
---|---|
Data Set Page Size | 65536 |
Number of Data Set Pages | 1 |
First Data Page | 1 |
Max Obs per Page | 2039 |
Obs in First Data Page | 1483 |
Number of Data Set Repairs | 0 |
ExtendObsCounter | YES |
Filename | E:\Hard_drive_seagate\QE2\CRregdata\case_a5_p1.sas7bdat |
Release Created | 9.0401M3 |
Host Created | X64_8PRO |
Alphabetic List of Variables and Attributes | |||||
---|---|---|---|---|---|
# | Variable | Type | Len | Format | Informat |
2 | A5 | Num | 8 | ||
5 | D | Num | 8 | ||
4 | RACETH | Num | 8 | ||
3 | S | Char | 1 | $1. | $1. |
1 | Year | Char | 4 |
Sort Information | |
---|---|
Sortedby | Year A5 S RACETH |
Validated | YES |
Character Set | ANSI |
Sort Option | NODUPKEY |
The SAS System |
Data Set Name | CR.POP | Observations | 1980 |
---|---|---|---|
Member Type | DATA | Variables | 5 |
Engine | V9 | Indexes | 0 |
Created | 05/31/2018 19:30:37 | Observation Length | 40 |
Last Modified | 05/31/2018 19:30:37 | Deleted Observations | 0 |
Protection | Compressed | NO | |
Data Set Type | Sorted | YES | |
Label | |||
Data Representation | WINDOWS_64 | ||
Encoding | utf-8 Unicode (UTF-8) |
Engine/Host Dependent Information | |
---|---|
Data Set Page Size | 65536 |
Number of Data Set Pages | 2 |
First Data Page | 1 |
Max Obs per Page | 1632 |
Obs in First Data Page | 1595 |
Number of Data Set Repairs | 0 |
ExtendObsCounter | YES |
Filename | E:\Hard_drive_seagate\QE2\CRregdata\pop.sas7bdat |
Release Created | 9.0401M3 |
Host Created | X64_8PRO |
Alphabetic List of Variables and Attributes | |||||
---|---|---|---|---|---|
# | Variable | Type | Len | Format | Informat |
2 | A5 | Num | 8 | ||
4 | RACETH | Num | 8 | ||
3 | S | Char | 8 | ||
5 | Y | Num | 8 | ||
1 | Year | Char | 5 | $5. | $5. |
Sort Information | |
---|---|
Sortedby | Year A5 S RACETH |
Validated | YES |
Character Set | ANSI |
The SAS System |
Year | Frequency |
---|---|
2004 | 133 |
2005 | 132 |
2006 | 130 |
2007 | 134 |
2008 | 129 |
2009 | 135 |
2010 | 138 |
2011 | 137 |
2012 | 136 |
2013 | 139 |
2014 | 140 |
A5 | Frequency |
---|---|
0 | 1 |
5 | 2 |
10 | 16 |
15 | 31 |
20 | 69 |
25 | 89 |
30 | 97 |
35 | 99 |
40 | 107 |
45 | 108 |
50 | 110 |
55 | 112 |
60 | 111 |
65 | 110 |
70 | 107 |
75 | 108 |
80 | 104 |
85 | 102 |
S | Frequency |
---|---|
1 | 732 |
2 | 744 |
3 | 1 |
4 | 3 |
5 | 3 |
RACETH | Frequency |
---|---|
1 | 336 |
2 | 314 |
3 | 294 |
4 | 314 |
99 | 225 |
The SAS System |
Year | Frequency |
---|---|
2004 | 180 |
2005 | 180 |
2006 | 180 |
2007 | 180 |
2008 | 180 |
2009 | 180 |
2010 | 180 |
2011 | 180 |
2012 | 180 |
2013 | 180 |
2014 | 180 |
A5 | Frequency |
---|---|
0 | 110 |
5 | 110 |
10 | 110 |
15 | 110 |
20 | 110 |
25 | 110 |
30 | 110 |
35 | 110 |
40 | 110 |
45 | 110 |
50 | 110 |
55 | 110 |
60 | 110 |
65 | 110 |
70 | 110 |
75 | 110 |
80 | 110 |
85 | 110 |
S | Frequency |
---|---|
1 | 990 |
2 | 990 |
RACETH | Frequency |
---|---|
1 | 396 |
2 | 396 |
3 | 396 |
4 | 396 |
99 | 396 |
OK. try the following:
proc sql;
create table test as
select *
from case_A5_P1 c left join pop p
on left(c.year)=left(p.year) and c.A5=p.A5 and left(c.S) = left(p.S) and c.raceth=p.raceth;
quit;
One of the variables had one or more spaces before the value, i.e. it wasn't left-justified.
Many operations generate left justified strings. But when you are not sure, just do YEAR = left(YEAR), or in the case of a number represented as a string, transform it into a real number with INPUT().
@Cruise wrote:
But how to figure that out? Is there any clue to tell whether variables left or right justified ?
put var $char.;
or for your year variable: lyear=lengthn(year) ....you know it's a string of 4 characters so if you get any length other than 4 then you've got leading blanks.
Below code showcasing some of the issues you might be facing and possibilities how to resolve. Pay attention to the Proc Freq Result which does per default the Freq on formatted values.
data a;
format akey1 10.2;
akey1=1;
akey2='2000 ';
row=1;
output;
akey1=1;
akey2=' 2000';
row=2;
output;
akey1=1.00000001;
akey2='2000 ';
row=3;
output;
stop;
run;
data b;
format bkey1 10.2;
bkey1=1;
bkey2='2000';
output;
stop;
run;
proc sql;
select *
from a left join b
on a.akey1=b.bkey1 and a.akey2=b.bkey2
;
quit;
proc freq data=a;
table akey1 akey2;
run;
proc sql;
select *
from a left join b
on round(a.akey1,0001)=round(b.bkey1,0001) and left(a.akey2)=left(b.bkey2)
;
quit;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.