Hi,
Could you please let me know if there is any way to conditionally check the difference between rows?
For eg:
I have the below dataset(Have), where I need to calculate the shot differences between WHITE and BLACK(race) in corresponding states. i.e calculate the difference between WHITE and BLACK in California then Arizona..etc(dataset NEED below):
Dataset -Have
Race Shot_rcvd Area
WHITE 10 CALIFORNIA
WHITE 20 ARIZONA
BLACK 21 CALIFORNIA
BALCK 5 ARIZONA
Dataset Need
Race Shot_rcvd Area Diff_in_shots
WHITE 10 CALIFORNIA -11
WHITE 20 ARIZONA 15
BLACK 21 CALIFORNIA 11
BALCK 5 ARIZONA -15
Please let me know your suggestions.
data have;
input RACE :$12. STATE :$12. vaccinated_cnt DATE :anydtdte.;
format date yymm7.;
datalines;
WHITE CALIFORNIA 2 Sep2016
WHITE ARIZONA 1 Sep2016
WHITE TEXAS 50 Oct2016
BLACK CALIFORNIA 1 Sep2016
BLACK ARIZONA 5 Sep2016
BLACK TEXAS 10 Oct2016
;
proc sql;
create table cnt_diff as
select
a.state,
a.date,
a.race,
b.race as other_race,
a.vaccinated_cnt - b.vaccinated_cnt as vaccinated_diff
from
have as a inner join
have as b on a.state=b.state and a.date=b.date and a.race ne b.race
order by state, date, race, other_race;
select * from cnt_diff;
quit;
Hi,
No, it is not going to be the same..there are more states and the numbers of white and black are different too.For some states there will be two black and one white or one black and 4 white or 2 black and 1 white..etc.
Your example data should include at least one case of each of the types/ combinations of things going on.
If there are dates of events to consider that may be a good idea to include.
If this data is the result of summarizing some data it may be better to post some examples from that data as sometimes raw data makes more sense than summarized for certain forms of manipulation.
Sure, here is a snapshot of the real data:
RACE | STATE | vaccinated_cnt | DATE |
WHITE | CALIFORNIA | 2 | Sep2016 |
WHITE | ARIZONA | 1 | Sep2016 |
WHITE | TEXAS | 50 | Oct2016 |
BLACK | CALIFORNIA | 1 | Sep2016 |
BLACK | ARIZONA | 5 | Sep2016 |
BLACK | TEXAS | 10 | Oct2016 |
Maybe I'm misunderstanding but a direct comparison doesn't make sense. Wouldn't you need rates to compare? Raw numbers don't provide the correct context.
Trying to figure out the difference in the number of shots white and blacks received in each state.
data have;
input RACE :$12. STATE :$12. vaccinated_cnt DATE :anydtdte.;
format date yymm7.;
datalines;
WHITE CALIFORNIA 2 Sep2016
WHITE ARIZONA 1 Sep2016
WHITE TEXAS 50 Oct2016
BLACK CALIFORNIA 1 Sep2016
BLACK ARIZONA 5 Sep2016
BLACK TEXAS 10 Oct2016
;
proc sql;
create table cnt_diff as
select
a.state,
a.date,
a.race,
b.race as other_race,
a.vaccinated_cnt - b.vaccinated_cnt as vaccinated_diff
from
have as a inner join
have as b on a.state=b.state and a.date=b.date and a.race ne b.race
order by state, date, race, other_race;
select * from cnt_diff;
quit;
data have;
input Race $ Shot_rcvd Area : $20.;
cards;
WHITE 10 CALIFORNIA
WHITE 20 ARIZONA
BLACK 21 CALIFORNIA
BLACK 5 ARIZONA
;
run;
data white(index=(area)) black(index=(area));
set have;
if race='WHITE' then output white;
else if race='BLACK' then output black;
run;
data want;
merge white black(keep=shot_rcvd area rename=(shot_rcvd=_shot));
by area;
diff=shot_rcvd-_shot;
output;
diff=-diff;
race='BLACK';
output;
drop _shot;
run;
Thank you all for your suggestions!
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.