Hi all. I hope everyone is healthy.
Suppose I have a data set like this:
county | Treatment | Raceth | Percent |
CountyA | vaccine | All | 43 |
CountyB | vaccine | All | 48 |
CountyC | vaccine | All | 27 |
CountyD | vaccine | All | 31 |
CountyE | vaccine | All | 12 |
CountyF | vaccine | All | 52 |
CountyG | vaccine | All | 18 |
CountyA | vaccine | White | 76 |
CountyB | vaccine | White | 56 |
CountyD | vaccine | White | 94 |
CountyE | vaccine | White | 25 |
CountyF | vaccine | White | 88 |
CountyG | vaccine | White | 44 |
CountyA | vaccine | Black | 93 |
CountyD | vaccine | Black | 54 |
CountyE | vaccine | Black | 57 |
CountyG | vaccine | Black | 81 |
CountyB | vaccine | Asian | 76 |
CountyC | vaccine | Asian | 6 |
CountyD | vaccine | Asian | 34 |
CountyD | vaccine | Hispanic | 12 |
CountyF | vaccine | Hispanic | 77 |
CountyG | vaccine | Hispanic | 70 |
And another data set with ID codes for county
county | CountyID |
CountyA | c01 |
CountyB | c02 |
CountyC | c03 |
CountyD | c04 |
CountyE | c05 |
CountyF | c06 |
CountyG | c07 |
How do I get a data set that has, for each race ethnic group, one line for each county ID code, even when the first data set has missing lines for those counties. So for example, the output data set would include these lines
county | CountyID | Treatment | Raceth | Percent |
CountyA | c01 | vaccine | Asian | |
CountyB | c02 | vaccine | Asian | 76 |
CountyC | c03 | vaccine | Asian | 6 |
CountyD | c04 | vaccine | Asian | 34 |
CountyE | c05 | vaccine | Asian | |
CountyF | c06 | vaccine | Asian | |
CountyG | c07 | vaccine | Asian | |
CountyA | c01 | vaccine | Hispanic | |
CountyB | c02 | vaccine | Hispanic | |
CountyC | c03 | vaccine | Hispanic | |
CountyD | c04 | vaccine | Hispanic | 12 |
CountyE | c05 | vaccine | Hispanic | |
CountyF | c06 | vaccine | Hispanic | 77 |
CountyG | c07 | vaccine | Hispanic | 70 |
Thanks
data county ;
infile cards ;
input county $ countyID $;
cards ;
CountyA c01
CountyB c02
CountyC c03
CountyD c04
CountyE c05
CountyF c06
CountyG c07
;
run ;
data have;
infile cards expandtabs truncover;
input county $ Treatment $ Raceth $ Percent;
cards;
CountyA vaccine All 43
CountyB vaccine All 48
CountyC vaccine All 27
CountyD vaccine All 31
CountyE vaccine All 12
CountyF vaccine All 52
CountyG vaccine All 18
CountyA vaccine White 76
CountyB vaccine White 56
CountyD vaccine White 94
CountyE vaccine White 25
CountyF vaccine White 88
CountyG vaccine White 44
CountyA vaccine Black 93
CountyD vaccine Black 54
CountyE vaccine Black 57
CountyG vaccine Black 81
CountyB vaccine Asian 76
CountyC vaccine Asian 6
CountyD vaccine Asian 34
CountyD vaccine Hispanic 12
CountyF vaccine Hispanic 77
CountyG vaccine Hispanic 70
;
proc sql;
create table want as
select a.*,b.percent
from ( select * from
(select distinct county,countyID from county),(select distinct Treatment,Raceth from have)
) as a
natural left join
have as b
order by Raceth,countyID;
quit;
Including Nonmatching Rows with the Left Outer Join
Take a look at the link above, and here's a simpler example with your data:
data county ;
infile cards ;
input county $ countyID $;
cards ;
CountyA c01
CountyB c02
CountyC c03
CountyD c04
CountyE c05
CountyF c06
CountyG c07
;
run ;
data treatment ;
infile cards;
input county $ percent ;
cards ;
CountyB 10
CountyD 30
CountyE 40
CountyG 20
;
run ;
proc sql ;
create table
work.want as
select
county.*,
treatment.percent
from
work.county left join work.treatment
on
county.county=treatment.county
;
quit ;
Hi Amsas,
Thanks for your response. The example data set I used was a little complex specifically because I wanted to be able to do this when there were different subgroups in the same data set. Would your method work when there are subgroups in the data set that has the measure values?
The following code will give the outcome what you have desired.
I have taken your first table as treatment and the second table as county.
/* In the following steps a table containing County,CountyID and raceth is created*/
proc sql;
create table race as
select distinct UPCASE(TRIM(RACETH)) as RACETH from treatment
/*where upcase(raceth) in ('ASIAN','HISPANIC') */
order by raceth;
quit;
proc sql;
create table c0 as
select * from county (keep=county), race;
quit;
proc sql;
create table c1 as
select a.County, b.CountyID, a.raceth from c0 a, county b
where a.county=b.county;
quit;
/*Combining the above table C1 with treatment table to give the desired outcome */
proc sql;
create table combined as
select a.county,a.countyID,a.raceth,"Vaccine" as Treatment, B.Percent from c1 A left join Treatment B
on A.county=B.county and upcase(a.raceth)=upcase(b.raceth)
order by raceth,county;
quit;
The outcome is as you had desired.
proc sql;
create table race as
select distinct UPCASE(TRIM(RACETH)) as RACETH from treatment
/*where upcase(raceth) in ('ASIAN','HISPANIC') */
order by raceth;
quit;
proc sql;
create table c0 as
select * from county (keep=county), race;
quit;
proc sql;
create table c1 as
select a.County, b.CountyID, a.raceth from c0 a, county b
where a.county=b.county;
quit;
proc sql;
create table race as
select distinct UPCASE(TRIM(RACETH)) as RACETH from treatment
/*where upcase(raceth) in ('ASIAN','HISPANIC') */
order by raceth;
quit;
proc sql;
create table c0 as
select * from county (keep=county), race;
quit;
proc sql;
create table race as
select distinct UPCASE(TRIM(RACETH)) as RACETH from treatment
/*where upcase(raceth) in ('ASIAN','HISPANIC') */
order by raceth;
quit;
proc sql;
create table c0 as
select * from county (keep=county), race;
quit;The outcome will be as follows. This is the same as you wanted.
data county ;
infile cards ;
input county $ countyID $;
cards ;
CountyA c01
CountyB c02
CountyC c03
CountyD c04
CountyE c05
CountyF c06
CountyG c07
;
run ;
data have;
infile cards expandtabs truncover;
input county $ Treatment $ Raceth $ Percent;
cards;
CountyA vaccine All 43
CountyB vaccine All 48
CountyC vaccine All 27
CountyD vaccine All 31
CountyE vaccine All 12
CountyF vaccine All 52
CountyG vaccine All 18
CountyA vaccine White 76
CountyB vaccine White 56
CountyD vaccine White 94
CountyE vaccine White 25
CountyF vaccine White 88
CountyG vaccine White 44
CountyA vaccine Black 93
CountyD vaccine Black 54
CountyE vaccine Black 57
CountyG vaccine Black 81
CountyB vaccine Asian 76
CountyC vaccine Asian 6
CountyD vaccine Asian 34
CountyD vaccine Hispanic 12
CountyF vaccine Hispanic 77
CountyG vaccine Hispanic 70
;
proc sql;
create table want as
select a.*,b.percent
from ( select * from
(select distinct county,countyID from county),(select distinct Treatment,Raceth from have)
) as a
natural left join
have as b
order by Raceth,countyID;
quit;
Hi. I think this gets me the output I want. The log also has this:
NOTE: The execution of this query involves performing one or more Cartesian product joins that
can not be optimized.
I don't know if this is important, though, as the output still looks like what I want.
Thanks
Gene
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.