DATA Step, Macro, Functions and more

proc sql and merge in data step / missing variable in output data

Accepted Solution Solved
Reply
Super Contributor
Posts: 331
Accepted Solution

proc sql and merge in data step / missing variable in output data

I do proc sql join almost everyday and data merge as well. But this time, ...very frustrating. I tried many different ways but resulting data keeps missing variable Y or D depending how I move datasets before or after join statements in sql. Data merge said my "year" variables are not in same length and I chopped them off to same number of years with no success.

Guys, please help.

by variables are in same characteristics in both datasets: year =char, raceth=num, A5=num, S=num in both datasets.

 

proc sql;
create table test as
select *  
from case_A5_P1 c left join pop p   
on c.year=p.year and c.A5=p.A5 and c.S = p.S and c.raceth=p.raceth;
quit;

proc sort data=case_A5_P1;
by year A5 S raceth; run;
proc sort data=cr.pop;
by year A5 S raceth; run; 

data  merged;
   merge pop
              case_A5_P1 (in = in2);
   by year A5 S raceth;
   if in2;
run;

sas help.pngproc sql resultmerged crash.pngmerge in data step result


Accepted Solutions
Solution
3 weeks ago
Esteemed Advisor
Posts: 5,479

Re: proc sql and merge in data step / missing variable in output data

OK. try the following:

 

proc sql;
create table test as
select *  
from case_A5_P1 c left join pop p   
on left(c.year)=left(p.year) and c.A5=p.A5 and left(c.S) = left(p.S) and c.raceth=p.raceth;
quit;
PG

View solution in original post


All Replies
Respected Advisor
Posts: 4,672

Re: proc sql and merge in data step / missing variable in output data

@Cruise

To give you guidance here without source data will require a lot of guessing.

 

Could it be that variables D and Y exists in both source data sets with one of the variables missing in each of the sources?

 

To test this theory: Switch the data set order in your merge statement and check if you're now getting the same result than with the SQL.

From:
   merge pop
              case_A5_P1 (in = in2);

To:
   merge case_A5_P1 (in = in2)
              pop;

If you now get the same result: Don't use SELECT * in the SQL but actually list the variables explicitly with two level names, or in the merge drop the variable which shouldn't become an input - i.e. case_A5_P1 (in = in2 drop=(Y))

Super Contributor
Posts: 331

Re: proc sql and merge in data step / missing variable in output data

@Patrick, no, pop dataset is Census population and A5_P1 file is number of disease counts by 5 year age class and 1 year increment time period. So two datasets have no overlap.
Esteemed Advisor
Posts: 5,479

Re: proc sql and merge in data step / missing variable in output data

Somehow, one of the join variables doesn't match. I could be obscured by formats that make them look the same. I bet that nothing comes out if you replace left join with inner join.

PG
Super Contributor
Posts: 331

Re: proc sql and merge in data step / missing variable in output data

Obscured by formatting? Can you elaborate on that? No proc format used here
Esteemed Advisor
Posts: 5,479

Re: proc sql and merge in data step / missing variable in output data

X = 5 and Y = 5.5 will both look the same if Y has 4.0 as format, but they won't equate in a join or a merge.

PG
Respected Advisor
Posts: 4,672

Re: proc sql and merge in data step / missing variable in output data

[ Edited ]

@PGStats wrote:

Somehow, one of the join variables doesn't match. I could be obscured by formats that make them look the same. I bet that nothing comes out if you replace left join with inner join.


@PGStats That was my first thought as well but that wouldn't explain why the data step merge populates missing/non-missing variables the other way round.

 

@Cruise

Can you please test both? The inner join and changing the data sets in the data step merge and let us know the outcome?

Esteemed Advisor
Posts: 5,479

Re: proc sql and merge in data step / missing variable in output data

@Patrick, I have the feeling that the data views do not correspond to the posted versions of the code.

PG
Super Contributor
Posts: 331

Re: proc sql and merge in data step / missing variable in output data

[ Edited ]

proc contents and freq on two datasets and link variables.

 

SAS Output

The SAS System

The CONTENTS Procedure
Data Set Name CR.CASE_A5_P1 Observations 1483
Member Type DATA Variables 5
Engine V9 Indexes 0
Created 05/31/2018 18:47:38 Observation Length 32
Last Modified 05/31/2018 18:47:38 Deleted Observations 0
Protection   Compressed NO
Data Set Type   Sorted YES
Label      
Data Representation WINDOWS_64    
Encoding utf-8 Unicode (UTF-8)    


Engine/Host Dependent Information
Data Set Page Size 65536
Number of Data Set Pages 1
First Data Page 1
Max Obs per Page 2039
Obs in First Data Page 1483
Number of Data Set Repairs 0
ExtendObsCounter YES
Filename E:\Hard_drive_seagate\QE2\CRregdata\case_a5_p1.sas7bdat
Release Created 9.0401M3
Host Created X64_8PRO


Alphabetic List of Variables and Attributes
# Variable Type Len Format Informat
2 A5 Num 8    
5 D Num 8    
4 RACETH Num 8    
3 S Char 1 $1. $1.
1 Year Char 4    


Sort Information
Sortedby Year A5 S RACETH
Validated YES
Character Set ANSI
Sort Option NODUPKEY

 


The SAS System

The CONTENTS Procedure
Data Set Name CR.POP Observations 1980
Member Type DATA Variables 5
Engine V9 Indexes 0
Created 05/31/2018 19:30:37 Observation Length 40
Last Modified 05/31/2018 19:30:37 Deleted Observations 0
Protection   Compressed NO
Data Set Type   Sorted YES
Label      
Data Representation WINDOWS_64    
Encoding utf-8 Unicode (UTF-8)    


Engine/Host Dependent Information
Data Set Page Size 65536
Number of Data Set Pages 2
First Data Page 1
Max Obs per Page 1632
Obs in First Data Page 1595
Number of Data Set Repairs 0
ExtendObsCounter YES
Filename E:\Hard_drive_seagate\QE2\CRregdata\pop.sas7bdat
Release Created 9.0401M3
Host Created X64_8PRO


Alphabetic List of Variables and Attributes
# Variable Type Len Format Informat
2 A5 Num 8    
4 RACETH Num 8    
3 S Char 8    
5 Y Num 8    
1 Year Char 5 $5. $5.


Sort Information
Sortedby Year A5 S RACETH
Validated YES
Character Set ANSI

 


The SAS System

The FREQ Procedure
Year Frequency
2004 180
2005 180
2006 180
2007 180
2008 180
2009 180
2010 180
2011 180
2012 180
2013 180
2014 180


A5 Frequency
0 110
5 110
10 110
15 110
20 110
25 110
30 110
35 110
40 110
45 110
50 110
55 110
60 110
65 110
70 110
75 110
80 110
85 110


S Frequency
1 990
2 990


RACETH Frequency
1 396
2 396
3 396
4 396
99 396

 


The SAS System

The CONTENTS Procedure
Data Set Name CR.CASE_A5_P1 Observations 1483
Member Type DATA Variables 5
Engine V9 Indexes 0
Created 05/31/2018 18:47:38 Observation Length 32
Last Modified 05/31/2018 18:47:38 Deleted Observations 0
Protection   Compressed NO
Data Set Type   Sorted YES
Label      
Data Representation WINDOWS_64    
Encoding utf-8 Unicode (UTF-8)    


Engine/Host Dependent Information
Data Set Page Size 65536
Number of Data Set Pages 1
First Data Page 1
Max Obs per Page 2039
Obs in First Data Page 1483
Number of Data Set Repairs 0
ExtendObsCounter YES
Filename E:\Hard_drive_seagate\QE2\CRregdata\case_a5_p1.sas7bdat
Release Created 9.0401M3
Host Created X64_8PRO


Alphabetic List of Variables and Attributes
# Variable Type Len Format Informat
2 A5 Num 8    
5 D Num 8    
4 RACETH Num 8    
3 S Char 1 $1. $1.
1 Year Char 4    


Sort Information
Sortedby Year A5 S RACETH
Validated YES
Character Set ANSI
Sort Option NODUPKEY

 


The SAS System

The CONTENTS Procedure
Data Set Name CR.POP Observations 1980
Member Type DATA Variables 5
Engine V9 Indexes 0
Created 05/31/2018 19:30:37 Observation Length 40
Last Modified 05/31/2018 19:30:37 Deleted Observations 0
Protection   Compressed NO
Data Set Type   Sorted YES
Label      
Data Representation WINDOWS_64    
Encoding utf-8 Unicode (UTF-8)    


Engine/Host Dependent Information
Data Set Page Size 65536
Number of Data Set Pages 2
First Data Page 1
Max Obs per Page 1632
Obs in First Data Page 1595
Number of Data Set Repairs 0
ExtendObsCounter YES
Filename E:\Hard_drive_seagate\QE2\CRregdata\pop.sas7bdat
Release Created 9.0401M3
Host Created X64_8PRO


Alphabetic List of Variables and Attributes
# Variable Type Len Format Informat
2 A5 Num 8    
4 RACETH Num 8    
3 S Char 8    
5 Y Num 8    
1 Year Char 5 $5. $5.


Sort Information
Sortedby Year A5 S RACETH
Validated YES
Character Set ANSI

 


The SAS System

The FREQ Procedure
Year Frequency
2004 133
2005 132
2006 130
2007 134
2008 129
2009 135
2010 138
2011 137
2012 136
2013 139
2014 140


A5 Frequency
0 1
5 2
10 16
15 31
20 69
25 89
30 97
35 99
40 107
45 108
50 110
55 112
60 111
65 110
70 107
75 108
80 104
85 102


S Frequency
1 732
2 744
3 1
4 3
5 3


RACETH Frequency
1 336
2 314
3 294
4 314
99 225

 


The SAS System

The FREQ Procedure
Year Frequency
2004 180
2005 180
2006 180
2007 180
2008 180
2009 180
2010 180
2011 180
2012 180
2013 180
2014 180


A5 Frequency
0 110
5 110
10 110
15 110
20 110
25 110
30 110
35 110
40 110
45 110
50 110
55 110
60 110
65 110
70 110
75 110
80 110
85 110


S Frequency
1 990
2 990


RACETH Frequency
1 396
2 396
3 396
4 396
99 396
   
   
   
   
   
   
Solution
3 weeks ago
Esteemed Advisor
Posts: 5,479

Re: proc sql and merge in data step / missing variable in output data

OK. try the following:

 

proc sql;
create table test as
select *  
from case_A5_P1 c left join pop p   
on left(c.year)=left(p.year) and c.A5=p.A5 and left(c.S) = left(p.S) and c.raceth=p.raceth;
quit;
PG
Super Contributor
Posts: 331

Re: proc sql and merge in data step / missing variable in output data

Goodness gracious! Worked out. Now I have both D and Y in output data. How so? Why? What just happened?
Esteemed Advisor
Posts: 5,479

Re: proc sql and merge in data step / missing variable in output data

One of the variables had one or more spaces before the value, i.e. it wasn't left-justified.

PG
Super Contributor
Posts: 331

Re: proc sql and merge in data step / missing variable in output data

But how to figure that out? Is there any clue to tell whether variables left or right justified ?
Esteemed Advisor
Posts: 5,479

Re: proc sql and merge in data step / missing variable in output data

Many operations generate left justified strings. But when you are not sure, just do YEAR = left(YEAR), or in the case of a number represented as a string, transform it into a real number with INPUT().

PG
Respected Advisor
Posts: 4,672

Re: proc sql and merge in data step / missing variable in output data

[ Edited ]

@Cruise wrote:
But how to figure that out? Is there any clue to tell whether variables left or right justified ?

put var $char.;

 

or for your year variable: lyear=lengthn(year)   ....you know it's a string of 4 characters so if you get any length other than 4 then you've got leading blanks.

 

Below code showcasing some of the issues you might be facing and possibilities how to resolve. Pay attention to the Proc Freq Result which does per default the Freq on formatted values.

data a;
  format akey1 10.2;
  akey1=1;
  akey2='2000 ';
  row=1;
  output; 
  akey1=1;
  akey2=' 2000';
  row=2;
  output; 
  akey1=1.00000001;
  akey2='2000 ';
  row=3;
  output; 
  stop;
run;
data b;
  format bkey1 10.2;
  bkey1=1;
  bkey2='2000';
  output; 
  stop;
run;

proc sql;
  select *
  from a left join b
    on a.akey1=b.bkey1 and a.akey2=b.bkey2
  ;
quit;

proc freq data=a;
  table akey1 akey2;
run;

proc sql;
  select *
  from a left join b
    on round(a.akey1,0001)=round(b.bkey1,0001) and left(a.akey2)=left(b.bkey2)
  ;
quit;

 

Capture.JPG 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 17 replies
  • 164 views
  • 1 like
  • 4 in conversation