Hi,
Could you please let me know if there is any way to conditionally check the difference between rows?
For eg:
I have the below dataset(Have), where I need to calculate the shot differences between WHITE and BLACK(race) in corresponding states. i.e calculate the difference between WHITE and BLACK in California then Arizona..etc(dataset NEED below):
Dataset -Have
Race Shot_rcvd Area
WHITE 10 CALIFORNIA
WHITE 20 ARIZONA
BLACK 21 CALIFORNIA
BALCK 5 ARIZONA
Dataset Need
Race Shot_rcvd Area Diff_in_shots
WHITE 10 CALIFORNIA -11
WHITE 20 ARIZONA 15
BLACK 21 CALIFORNIA 11
BALCK 5 ARIZONA -15
Please let me know your suggestions.
data have;
input RACE :$12. STATE :$12. vaccinated_cnt DATE :anydtdte.;
format date yymm7.;
datalines;
WHITE	CALIFORNIA	2	Sep2016
WHITE	ARIZONA	1	Sep2016
WHITE	TEXAS	50	Oct2016
BLACK	CALIFORNIA	1	Sep2016
BLACK	ARIZONA	5	Sep2016
BLACK	TEXAS	10	Oct2016
;
proc sql;
create table cnt_diff as
select
    a.state, 
    a.date, 
    a.race,
    b.race as other_race,
    a.vaccinated_cnt - b.vaccinated_cnt as vaccinated_diff
from 
    have as a inner join
    have as b on a.state=b.state and a.date=b.date and a.race ne b.race
order by state, date, race, other_race;
select * from cnt_diff;
quit;
					
				
			
			
				Hi,
No, it is not going to be the same..there are more states and the numbers of white and black are different too.For some states there will be two black and one white or one black and 4 white or 2 black and 1 white..etc.
Your example data should include at least one case of each of the types/ combinations of things going on.
If there are dates of events to consider that may be a good idea to include.
If this data is the result of summarizing some data it may be better to post some examples from that data as sometimes raw data makes more sense than summarized for certain forms of manipulation.
Sure, here is a snapshot of the real data:
| RACE | STATE | vaccinated_cnt | DATE | 
| WHITE | CALIFORNIA | 2 | Sep2016 | 
| WHITE | ARIZONA | 1 | Sep2016 | 
| WHITE | TEXAS | 50 | Oct2016 | 
| BLACK | CALIFORNIA | 1 | Sep2016 | 
| BLACK | ARIZONA | 5 | Sep2016 | 
| BLACK | TEXAS | 10 | Oct2016 | 
Maybe I'm misunderstanding but a direct comparison doesn't make sense. Wouldn't you need rates to compare? Raw numbers don't provide the correct context.
Trying to figure out the difference in the number of shots white and blacks received in each state.
data have;
input RACE :$12. STATE :$12. vaccinated_cnt DATE :anydtdte.;
format date yymm7.;
datalines;
WHITE	CALIFORNIA	2	Sep2016
WHITE	ARIZONA	1	Sep2016
WHITE	TEXAS	50	Oct2016
BLACK	CALIFORNIA	1	Sep2016
BLACK	ARIZONA	5	Sep2016
BLACK	TEXAS	10	Oct2016
;
proc sql;
create table cnt_diff as
select
    a.state, 
    a.date, 
    a.race,
    b.race as other_race,
    a.vaccinated_cnt - b.vaccinated_cnt as vaccinated_diff
from 
    have as a inner join
    have as b on a.state=b.state and a.date=b.date and a.race ne b.race
order by state, date, race, other_race;
select * from cnt_diff;
quit;
					
				
			
			
				data have;
input  Race $ Shot_rcvd  Area : $20.;
cards;
WHITE 10             CALIFORNIA
WHITE 20             ARIZONA
BLACK 21            CALIFORNIA
BLACK 5              ARIZONA
;
run;
data white(index=(area)) black(index=(area));
 set have;
 if race='WHITE' then output white;
  else if race='BLACK' then output black;
run;
data want;
 merge white black(keep=shot_rcvd area rename=(shot_rcvd=_shot));
 by area;
 diff=shot_rcvd-_shot;
 output;
 diff=-diff;
 race='BLACK';
 output;
 drop _shot;
run;
					
				
			
			
				
			
			
			
			
			
			
			
		Thank you all for your suggestions!
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.