BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Cruise
Ammonite | Level 13

I do proc sql join almost everyday and data merge as well. But this time, ...very frustrating. I tried many different ways but resulting data keeps missing variable Y or D depending how I move datasets before or after join statements in sql. Data merge said my "year" variables are not in same length and I chopped them off to same number of years with no success.

Guys, please help.

by variables are in same characteristics in both datasets: year =char, raceth=num, A5=num, S=num in both datasets.

 

proc sql;
create table test as
select *  
from case_A5_P1 c left join pop p   
on c.year=p.year and c.A5=p.A5 and c.S = p.S and c.raceth=p.raceth;
quit;

proc sort data=case_A5_P1;
by year A5 S raceth; run;
proc sort data=cr.pop;
by year A5 S raceth; run; 

data  merged;
   merge pop
              case_A5_P1 (in = in2);
   by year A5 S raceth;
   if in2;
run;

proc sql resultproc sql resultmerge in data step resultmerge in data step result

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

OK. try the following:

 

proc sql;
create table test as
select *  
from case_A5_P1 c left join pop p   
on left(c.year)=left(p.year) and c.A5=p.A5 and left(c.S) = left(p.S) and c.raceth=p.raceth;
quit;
PG

View solution in original post

17 REPLIES 17
Patrick
Opal | Level 21

@Cruise

To give you guidance here without source data will require a lot of guessing.

 

Could it be that variables D and Y exists in both source data sets with one of the variables missing in each of the sources?

 

To test this theory: Switch the data set order in your merge statement and check if you're now getting the same result than with the SQL.

From:
   merge pop
              case_A5_P1 (in = in2);

To:
   merge case_A5_P1 (in = in2)
              pop;

If you now get the same result: Don't use SELECT * in the SQL but actually list the variables explicitly with two level names, or in the merge drop the variable which shouldn't become an input - i.e. case_A5_P1 (in = in2 drop=(Y))

Cruise
Ammonite | Level 13
@Patrick, no, pop dataset is Census population and A5_P1 file is number of disease counts by 5 year age class and 1 year increment time period. So two datasets have no overlap.
PGStats
Opal | Level 21

Somehow, one of the join variables doesn't match. I could be obscured by formats that make them look the same. I bet that nothing comes out if you replace left join with inner join.

PG
Cruise
Ammonite | Level 13
Obscured by formatting? Can you elaborate on that? No proc format used here
PGStats
Opal | Level 21

X = 5 and Y = 5.5 will both look the same if Y has 4.0 as format, but they won't equate in a join or a merge.

PG
Patrick
Opal | Level 21

@PGStats wrote:

Somehow, one of the join variables doesn't match. I could be obscured by formats that make them look the same. I bet that nothing comes out if you replace left join with inner join.


@PGStats That was my first thought as well but that wouldn't explain why the data step merge populates missing/non-missing variables the other way round.

 

@Cruise

Can you please test both? The inner join and changing the data sets in the data step merge and let us know the outcome?

PGStats
Opal | Level 21

@Patrick, I have the feeling that the data views do not correspond to the posted versions of the code.

PG
Cruise
Ammonite | Level 13

proc contents and freq on two datasets and link variables.

 

SAS Output

The SAS System

The CONTENTS Procedure
Data Set Name CR.CASE_A5_P1 Observations 1483
Member Type DATA Variables 5
Engine V9 Indexes 0
Created 05/31/2018 18:47:38 Observation Length 32
Last Modified 05/31/2018 18:47:38 Deleted Observations 0
Protection   Compressed NO
Data Set Type   Sorted YES
Label      
Data Representation WINDOWS_64    
Encoding utf-8 Unicode (UTF-8)    


Engine/Host Dependent Information
Data Set Page Size 65536
Number of Data Set Pages 1
First Data Page 1
Max Obs per Page 2039
Obs in First Data Page 1483
Number of Data Set Repairs 0
ExtendObsCounter YES
Filename E:\Hard_drive_seagate\QE2\CRregdata\case_a5_p1.sas7bdat
Release Created 9.0401M3
Host Created X64_8PRO


Alphabetic List of Variables and Attributes
# Variable Type Len Format Informat
2 A5 Num 8    
5 D Num 8    
4 RACETH Num 8    
3 S Char 1 $1. $1.
1 Year Char 4    


Sort Information
Sortedby Year A5 S RACETH
Validated YES
Character Set ANSI
Sort Option NODUPKEY

 


The SAS System

The CONTENTS Procedure
Data Set Name CR.POP Observations 1980
Member Type DATA Variables 5
Engine V9 Indexes 0
Created 05/31/2018 19:30:37 Observation Length 40
Last Modified 05/31/2018 19:30:37 Deleted Observations 0
Protection   Compressed NO
Data Set Type   Sorted YES
Label      
Data Representation WINDOWS_64    
Encoding utf-8 Unicode (UTF-8)    


Engine/Host Dependent Information
Data Set Page Size 65536
Number of Data Set Pages 2
First Data Page 1
Max Obs per Page 1632
Obs in First Data Page 1595
Number of Data Set Repairs 0
ExtendObsCounter YES
Filename E:\Hard_drive_seagate\QE2\CRregdata\pop.sas7bdat
Release Created 9.0401M3
Host Created X64_8PRO


Alphabetic List of Variables and Attributes
# Variable Type Len Format Informat
2 A5 Num 8    
4 RACETH Num 8    
3 S Char 8    
5 Y Num 8    
1 Year Char 5 $5. $5.


Sort Information
Sortedby Year A5 S RACETH
Validated YES
Character Set ANSI

 


The SAS System

The FREQ Procedure
Year Frequency
2004 180
2005 180
2006 180
2007 180
2008 180
2009 180
2010 180
2011 180
2012 180
2013 180
2014 180


A5 Frequency
0 110
5 110
10 110
15 110
20 110
25 110
30 110
35 110
40 110
45 110
50 110
55 110
60 110
65 110
70 110
75 110
80 110
85 110


S Frequency
1 990
2 990


RACETH Frequency
1 396
2 396
3 396
4 396
99 396

 


The SAS System

The CONTENTS Procedure
Data Set Name CR.CASE_A5_P1 Observations 1483
Member Type DATA Variables 5
Engine V9 Indexes 0
Created 05/31/2018 18:47:38 Observation Length 32
Last Modified 05/31/2018 18:47:38 Deleted Observations 0
Protection   Compressed NO
Data Set Type   Sorted YES
Label      
Data Representation WINDOWS_64    
Encoding utf-8 Unicode (UTF-8)    


Engine/Host Dependent Information
Data Set Page Size 65536
Number of Data Set Pages 1
First Data Page 1
Max Obs per Page 2039
Obs in First Data Page 1483
Number of Data Set Repairs 0
ExtendObsCounter YES
Filename E:\Hard_drive_seagate\QE2\CRregdata\case_a5_p1.sas7bdat
Release Created 9.0401M3
Host Created X64_8PRO


Alphabetic List of Variables and Attributes
# Variable Type Len Format Informat
2 A5 Num 8    
5 D Num 8    
4 RACETH Num 8    
3 S Char 1 $1. $1.
1 Year Char 4    


Sort Information
Sortedby Year A5 S RACETH
Validated YES
Character Set ANSI
Sort Option NODUPKEY

 


The SAS System

The CONTENTS Procedure
Data Set Name CR.POP Observations 1980
Member Type DATA Variables 5
Engine V9 Indexes 0
Created 05/31/2018 19:30:37 Observation Length 40
Last Modified 05/31/2018 19:30:37 Deleted Observations 0
Protection   Compressed NO
Data Set Type   Sorted YES
Label      
Data Representation WINDOWS_64    
Encoding utf-8 Unicode (UTF-8)    


Engine/Host Dependent Information
Data Set Page Size 65536
Number of Data Set Pages 2
First Data Page 1
Max Obs per Page 1632
Obs in First Data Page 1595
Number of Data Set Repairs 0
ExtendObsCounter YES
Filename E:\Hard_drive_seagate\QE2\CRregdata\pop.sas7bdat
Release Created 9.0401M3
Host Created X64_8PRO


Alphabetic List of Variables and Attributes
# Variable Type Len Format Informat
2 A5 Num 8    
4 RACETH Num 8    
3 S Char 8    
5 Y Num 8    
1 Year Char 5 $5. $5.


Sort Information
Sortedby Year A5 S RACETH
Validated YES
Character Set ANSI

 


The SAS System

The FREQ Procedure
Year Frequency
2004 133
2005 132
2006 130
2007 134
2008 129
2009 135
2010 138
2011 137
2012 136
2013 139
2014 140


A5 Frequency
0 1
5 2
10 16
15 31
20 69
25 89
30 97
35 99
40 107
45 108
50 110
55 112
60 111
65 110
70 107
75 108
80 104
85 102


S Frequency
1 732
2 744
3 1
4 3
5 3


RACETH Frequency
1 336
2 314
3 294
4 314
99 225

 


The SAS System

The FREQ Procedure
Year Frequency
2004 180
2005 180
2006 180
2007 180
2008 180
2009 180
2010 180
2011 180
2012 180
2013 180
2014 180


A5 Frequency
0 110
5 110
10 110
15 110
20 110
25 110
30 110
35 110
40 110
45 110
50 110
55 110
60 110
65 110
70 110
75 110
80 110
85 110


S Frequency
1 990
2 990


RACETH Frequency
1 396
2 396
3 396
4 396
99 396
   
   
   
   
   
   
PGStats
Opal | Level 21

OK. try the following:

 

proc sql;
create table test as
select *  
from case_A5_P1 c left join pop p   
on left(c.year)=left(p.year) and c.A5=p.A5 and left(c.S) = left(p.S) and c.raceth=p.raceth;
quit;
PG
Cruise
Ammonite | Level 13
Goodness gracious! Worked out. Now I have both D and Y in output data. How so? Why? What just happened?
PGStats
Opal | Level 21

One of the variables had one or more spaces before the value, i.e. it wasn't left-justified.

PG
Cruise
Ammonite | Level 13
But how to figure that out? Is there any clue to tell whether variables left or right justified ?
PGStats
Opal | Level 21

Many operations generate left justified strings. But when you are not sure, just do YEAR = left(YEAR), or in the case of a number represented as a string, transform it into a real number with INPUT().

PG
Patrick
Opal | Level 21

@Cruise wrote:
But how to figure that out? Is there any clue to tell whether variables left or right justified ?

put var $char.;

 

or for your year variable: lyear=lengthn(year)   ....you know it's a string of 4 characters so if you get any length other than 4 then you've got leading blanks.

 

Below code showcasing some of the issues you might be facing and possibilities how to resolve. Pay attention to the Proc Freq Result which does per default the Freq on formatted values.

data a;
  format akey1 10.2;
  akey1=1;
  akey2='2000 ';
  row=1;
  output; 
  akey1=1;
  akey2=' 2000';
  row=2;
  output; 
  akey1=1.00000001;
  akey2='2000 ';
  row=3;
  output; 
  stop;
run;
data b;
  format bkey1 10.2;
  bkey1=1;
  bkey2='2000';
  output; 
  stop;
run;

proc sql;
  select *
  from a left join b
    on a.akey1=b.bkey1 and a.akey2=b.bkey2
  ;
quit;

proc freq data=a;
  table akey1 akey2;
run;

proc sql;
  select *
  from a left join b
    on round(a.akey1,0001)=round(b.bkey1,0001) and left(a.akey2)=left(b.bkey2)
  ;
quit;

 

Capture.JPG 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 17 replies
  • 3065 views
  • 1 like
  • 4 in conversation