BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
geneshackman
Pyrite | Level 9

Hi all. I hope everyone is healthy.

Suppose I have a data set like this:

county Treatment Raceth Percent
CountyA vaccine All 43
CountyB vaccine All 48
CountyC vaccine All 27
CountyD vaccine All 31
CountyE vaccine All 12
CountyF vaccine All 52
CountyG vaccine All 18
CountyA vaccine White 76
CountyB vaccine White 56
CountyD vaccine White 94
CountyE vaccine White 25
CountyF vaccine White 88
CountyG vaccine White 44
CountyA vaccine Black 93
CountyD vaccine Black 54
CountyE vaccine Black 57
CountyG vaccine Black 81
CountyB vaccine Asian 76
CountyC vaccine Asian 6
CountyD vaccine Asian 34
CountyD vaccine Hispanic 12
CountyF vaccine Hispanic 77
CountyG vaccine Hispanic 70

 

And another data set with ID codes for county

county CountyID
CountyA c01
CountyB c02
CountyC c03
CountyD c04
CountyE c05
CountyF c06
CountyG c07

 

How do I get a data set that has, for each race ethnic group, one line for each county ID code, even when the first data set has missing lines for those counties. So for example, the output data set would include these lines

county CountyID Treatment Raceth Percent
CountyA c01 vaccine Asian  
CountyB c02 vaccine Asian 76
CountyC c03 vaccine Asian 6
CountyD c04 vaccine Asian 34
CountyE c05 vaccine Asian  
CountyF c06 vaccine Asian  
CountyG c07 vaccine Asian  
CountyA c01 vaccine Hispanic  
CountyB c02 vaccine Hispanic  
CountyC c03 vaccine Hispanic  
CountyD c04 vaccine Hispanic 12
CountyE c05 vaccine Hispanic  
CountyF c06 vaccine Hispanic 77
CountyG c07 vaccine Hispanic 70

 

Thanks

 

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
data county ;
	infile cards ;
	input county $ countyID $;
cards ;
CountyA	c01
CountyB	c02
CountyC	c03
CountyD	c04
CountyE	c05
CountyF	c06
CountyG	c07
;
run ;
data have;
infile cards expandtabs truncover;
input county $ Treatment $ Raceth $ Percent;
cards;
CountyA	vaccine	All	43
CountyB	vaccine	All	48
CountyC	vaccine	All	27
CountyD	vaccine	All	31
CountyE	vaccine	All	12
CountyF	vaccine	All	52
CountyG	vaccine	All	18
CountyA	vaccine	White	76
CountyB	vaccine	White	56
CountyD	vaccine	White	94
CountyE	vaccine	White	25
CountyF	vaccine	White	88
CountyG	vaccine	White	44
CountyA	vaccine	Black	93
CountyD	vaccine	Black	54
CountyE	vaccine	Black	57
CountyG	vaccine	Black	81
CountyB	vaccine	Asian	76
CountyC	vaccine	Asian	6
CountyD	vaccine	Asian	34
CountyD	vaccine	Hispanic	12
CountyF	vaccine	Hispanic	77
CountyG	vaccine	Hispanic	70
;

proc sql;
create table want as
select a.*,b.percent
from ( select * from
(select distinct county,countyID from county),(select distinct Treatment,Raceth from have)
) as a 
     natural left join
     have as b 
 order by Raceth,countyID;
quit;

View solution in original post

7 REPLIES 7
AMSAS
SAS Super FREQ

Including Nonmatching Rows with the Left Outer Join 

 

Take a look at the link above, and here's a simpler example with your data:

data county ;
	infile cards ;
	input county $ countyID $;
cards ;
CountyA	c01
CountyB	c02
CountyC	c03
CountyD	c04
CountyE	c05
CountyF	c06
CountyG	c07
;
run ;

data treatment ;
	infile cards;
	input county $ percent ;
cards ;
CountyB	10
CountyD	30
CountyE	40
CountyG	20
;
run ;

proc sql ;
	create table 
		work.want as
	select 
		county.*,
		treatment.percent 
	from
		work.county left join work.treatment
	on
	county.county=treatment.county 
;
quit ;
	
geneshackman
Pyrite | Level 9

Hi Amsas, 

Thanks for your response. The example data set I used was a little complex specifically because I wanted to be able to do this when there were different subgroups in the same data set. Would your method work when there are subgroups in the data set that has the measure values?

Sajid01
Meteorite | Level 14

The following code will give the outcome what you have desired.
I have taken your first table as treatment and the second table as county.

/* In the following steps a table containing County,CountyID and raceth is created*/
proc sql;
create table race as
select distinct UPCASE(TRIM(RACETH)) as RACETH from treatment
/*where  upcase(raceth)  in ('ASIAN','HISPANIC') */
order by raceth;
quit;
proc sql;
create table c0 as
select * from county (keep=county), race;
quit;
proc sql;
create table c1 as
select a.County, b.CountyID, a.raceth from c0 a, county b 
where a.county=b.county;
quit;

/*Combining the above table C1 with treatment table to give the desired outcome */
proc sql;
create table combined as
select a.county,a.countyID,a.raceth,"Vaccine" as Treatment, B.Percent from c1 A left  join Treatment B
on A.county=B.county and upcase(a.raceth)=upcase(b.raceth)
order by  raceth,county;
quit;

The outcome is as you had desired.

proc sql;
create table race as
select distinct UPCASE(TRIM(RACETH)) as RACETH from treatment
/*where upcase(raceth) in ('ASIAN','HISPANIC') */
order by raceth;
quit;
proc sql;
create table c0 as
select * from county (keep=county), race;
quit;
proc sql;
create table c1 as
select a.County, b.CountyID, a.raceth from c0 a, county b
where a.county=b.county;
quit;
proc sql;
create table race as
select distinct UPCASE(TRIM(RACETH)) as RACETH from treatment
/*where upcase(raceth) in ('ASIAN','HISPANIC') */
order by raceth;
quit;
proc sql;
create table c0 as
select * from county (keep=county), race;
quit;
proc sql;
create table race as
select distinct UPCASE(TRIM(RACETH)) as RACETH from treatment
/*where upcase(raceth) in ('ASIAN','HISPANIC') */
order by raceth;
quit;
proc sql;
create table c0 as
select * from county (keep=county), race;
quit;The outcome will be as follows. This is the same as you wanted.

Sajid01_1-1671929656340.png

 

Ksharp
Super User
data county ;
	infile cards ;
	input county $ countyID $;
cards ;
CountyA	c01
CountyB	c02
CountyC	c03
CountyD	c04
CountyE	c05
CountyF	c06
CountyG	c07
;
run ;
data have;
infile cards expandtabs truncover;
input county $ Treatment $ Raceth $ Percent;
cards;
CountyA	vaccine	All	43
CountyB	vaccine	All	48
CountyC	vaccine	All	27
CountyD	vaccine	All	31
CountyE	vaccine	All	12
CountyF	vaccine	All	52
CountyG	vaccine	All	18
CountyA	vaccine	White	76
CountyB	vaccine	White	56
CountyD	vaccine	White	94
CountyE	vaccine	White	25
CountyF	vaccine	White	88
CountyG	vaccine	White	44
CountyA	vaccine	Black	93
CountyD	vaccine	Black	54
CountyE	vaccine	Black	57
CountyG	vaccine	Black	81
CountyB	vaccine	Asian	76
CountyC	vaccine	Asian	6
CountyD	vaccine	Asian	34
CountyD	vaccine	Hispanic	12
CountyF	vaccine	Hispanic	77
CountyG	vaccine	Hispanic	70
;

proc sql;
create table want as
select a.*,b.percent
from ( select * from
(select distinct county,countyID from county),(select distinct Treatment,Raceth from have)
) as a 
     natural left join
     have as b 
 order by Raceth,countyID;
quit;
geneshackman
Pyrite | Level 9

Hi. I think this gets me the output I want. The log also has this:

 

NOTE: The execution of this query involves performing one or more Cartesian product joins that
can not be optimized.

 

I don't know if this is important, though, as the output still looks like what I want.

 

Thanks

 

Gene

 

Ksharp
Super User
That doesn't matter. That note imply the code of SQL would cost you a lot of time if you have a big table .
geneshackman
Pyrite | Level 9
Thanks! I had a chance to try this and it worked.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1140 views
  • 4 likes
  • 4 in conversation