Hi all. I hope everyone is healthy.
Suppose I have a data set like this:
| county | Treatment | Raceth | Percent | 
| CountyA | vaccine | All | 43 | 
| CountyB | vaccine | All | 48 | 
| CountyC | vaccine | All | 27 | 
| CountyD | vaccine | All | 31 | 
| CountyE | vaccine | All | 12 | 
| CountyF | vaccine | All | 52 | 
| CountyG | vaccine | All | 18 | 
| CountyA | vaccine | White | 76 | 
| CountyB | vaccine | White | 56 | 
| CountyD | vaccine | White | 94 | 
| CountyE | vaccine | White | 25 | 
| CountyF | vaccine | White | 88 | 
| CountyG | vaccine | White | 44 | 
| CountyA | vaccine | Black | 93 | 
| CountyD | vaccine | Black | 54 | 
| CountyE | vaccine | Black | 57 | 
| CountyG | vaccine | Black | 81 | 
| CountyB | vaccine | Asian | 76 | 
| CountyC | vaccine | Asian | 6 | 
| CountyD | vaccine | Asian | 34 | 
| CountyD | vaccine | Hispanic | 12 | 
| CountyF | vaccine | Hispanic | 77 | 
| CountyG | vaccine | Hispanic | 70 | 
And another data set with ID codes for county
| county | CountyID | 
| CountyA | c01 | 
| CountyB | c02 | 
| CountyC | c03 | 
| CountyD | c04 | 
| CountyE | c05 | 
| CountyF | c06 | 
| CountyG | c07 | 
How do I get a data set that has, for each race ethnic group, one line for each county ID code, even when the first data set has missing lines for those counties. So for example, the output data set would include these lines
| county | CountyID | Treatment | Raceth | Percent | 
| CountyA | c01 | vaccine | Asian | |
| CountyB | c02 | vaccine | Asian | 76 | 
| CountyC | c03 | vaccine | Asian | 6 | 
| CountyD | c04 | vaccine | Asian | 34 | 
| CountyE | c05 | vaccine | Asian | |
| CountyF | c06 | vaccine | Asian | |
| CountyG | c07 | vaccine | Asian | |
| CountyA | c01 | vaccine | Hispanic | |
| CountyB | c02 | vaccine | Hispanic | |
| CountyC | c03 | vaccine | Hispanic | |
| CountyD | c04 | vaccine | Hispanic | 12 | 
| CountyE | c05 | vaccine | Hispanic | |
| CountyF | c06 | vaccine | Hispanic | 77 | 
| CountyG | c07 | vaccine | Hispanic | 70 | 
Thanks
data county ;
	infile cards ;
	input county $ countyID $;
cards ;
CountyA	c01
CountyB	c02
CountyC	c03
CountyD	c04
CountyE	c05
CountyF	c06
CountyG	c07
;
run ;
data have;
infile cards expandtabs truncover;
input county $ Treatment $ Raceth $ Percent;
cards;
CountyA	vaccine	All	43
CountyB	vaccine	All	48
CountyC	vaccine	All	27
CountyD	vaccine	All	31
CountyE	vaccine	All	12
CountyF	vaccine	All	52
CountyG	vaccine	All	18
CountyA	vaccine	White	76
CountyB	vaccine	White	56
CountyD	vaccine	White	94
CountyE	vaccine	White	25
CountyF	vaccine	White	88
CountyG	vaccine	White	44
CountyA	vaccine	Black	93
CountyD	vaccine	Black	54
CountyE	vaccine	Black	57
CountyG	vaccine	Black	81
CountyB	vaccine	Asian	76
CountyC	vaccine	Asian	6
CountyD	vaccine	Asian	34
CountyD	vaccine	Hispanic	12
CountyF	vaccine	Hispanic	77
CountyG	vaccine	Hispanic	70
;
proc sql;
create table want as
select a.*,b.percent
from ( select * from
(select distinct county,countyID from county),(select distinct Treatment,Raceth from have)
) as a 
     natural left join
     have as b 
 order by Raceth,countyID;
quit;Including Nonmatching Rows with the Left Outer Join
Take a look at the link above, and here's a simpler example with your data:
data county ;
	infile cards ;
	input county $ countyID $;
cards ;
CountyA	c01
CountyB	c02
CountyC	c03
CountyD	c04
CountyE	c05
CountyF	c06
CountyG	c07
;
run ;
data treatment ;
	infile cards;
	input county $ percent ;
cards ;
CountyB	10
CountyD	30
CountyE	40
CountyG	20
;
run ;
proc sql ;
	create table 
		work.want as
	select 
		county.*,
		treatment.percent 
	from
		work.county left join work.treatment
	on
	county.county=treatment.county 
;
quit ;
	Hi Amsas,
Thanks for your response. The example data set I used was a little complex specifically because I wanted to be able to do this when there were different subgroups in the same data set. Would your method work when there are subgroups in the data set that has the measure values?
The following code will give the outcome what you have desired.
I have taken your first table as treatment and the second table as county.
/* In the following steps a table containing County,CountyID and raceth is created*/
proc sql;
create table race as
select distinct UPCASE(TRIM(RACETH)) as RACETH from treatment
/*where  upcase(raceth)  in ('ASIAN','HISPANIC') */
order by raceth;
quit;
proc sql;
create table c0 as
select * from county (keep=county), race;
quit;
proc sql;
create table c1 as
select a.County, b.CountyID, a.raceth from c0 a, county b 
where a.county=b.county;
quit;
/*Combining the above table C1 with treatment table to give the desired outcome */
proc sql;
create table combined as
select a.county,a.countyID,a.raceth,"Vaccine" as Treatment, B.Percent from c1 A left  join Treatment B
on A.county=B.county and upcase(a.raceth)=upcase(b.raceth)
order by  raceth,county;
quit;The outcome is as you had desired.
proc sql;
create table race as
select distinct UPCASE(TRIM(RACETH)) as RACETH from treatment
/*where upcase(raceth) in ('ASIAN','HISPANIC') */
order by raceth;
quit;
proc sql;
create table c0 as
select * from county (keep=county), race;
quit;
proc sql;
create table c1 as
select a.County, b.CountyID, a.raceth from c0 a, county b
where a.county=b.county;
quit;
proc sql;
create table race as
select distinct UPCASE(TRIM(RACETH)) as RACETH from treatment
/*where upcase(raceth) in ('ASIAN','HISPANIC') */
order by raceth;
quit;
proc sql;
create table c0 as
select * from county (keep=county), race;
quit;
proc sql;
create table race as
select distinct UPCASE(TRIM(RACETH)) as RACETH from treatment
/*where upcase(raceth) in ('ASIAN','HISPANIC') */
order by raceth;
quit;
proc sql;
create table c0 as
select * from county (keep=county), race;
quit;The outcome will be as follows. This is the same as you wanted.
data county ;
	infile cards ;
	input county $ countyID $;
cards ;
CountyA	c01
CountyB	c02
CountyC	c03
CountyD	c04
CountyE	c05
CountyF	c06
CountyG	c07
;
run ;
data have;
infile cards expandtabs truncover;
input county $ Treatment $ Raceth $ Percent;
cards;
CountyA	vaccine	All	43
CountyB	vaccine	All	48
CountyC	vaccine	All	27
CountyD	vaccine	All	31
CountyE	vaccine	All	12
CountyF	vaccine	All	52
CountyG	vaccine	All	18
CountyA	vaccine	White	76
CountyB	vaccine	White	56
CountyD	vaccine	White	94
CountyE	vaccine	White	25
CountyF	vaccine	White	88
CountyG	vaccine	White	44
CountyA	vaccine	Black	93
CountyD	vaccine	Black	54
CountyE	vaccine	Black	57
CountyG	vaccine	Black	81
CountyB	vaccine	Asian	76
CountyC	vaccine	Asian	6
CountyD	vaccine	Asian	34
CountyD	vaccine	Hispanic	12
CountyF	vaccine	Hispanic	77
CountyG	vaccine	Hispanic	70
;
proc sql;
create table want as
select a.*,b.percent
from ( select * from
(select distinct county,countyID from county),(select distinct Treatment,Raceth from have)
) as a 
     natural left join
     have as b 
 order by Raceth,countyID;
quit;Hi. I think this gets me the output I want. The log also has this:
NOTE: The execution of this query involves performing one or more Cartesian product joins that
can not be optimized.
I don't know if this is important, though, as the output still looks like what I want.
Thanks
Gene
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
