Hi all,
I'm working off a previous question that I asked the SAS community before. I have 3 datasets now where I'm attempting to add pieces of information from each of the 3 datasets into 1 larger dataset.
Currently, I have:
/*STORE INFORMATION*/
data have0;
infile cards truncover expandtabs;
input STORE_ID $ zip_cd year;
format zip_cd z5.;
cards;
A0001 99993 2010
A0001 99993 2011
A0001 99994 2012
A0001 99994 2013
A0001 99994 2014
A0001 99995 2015
A0002 33999 2011
A0002 33999 2012
A0002 33999 2013
A0002 33995 2014
A0002 33993 2015
A0003 33999 2010
A0003 33999 2011
A0003 33999 2012
A0003 33999 2013
A0003 33999 2014
A0003 33999 2015
A0004 33999 2014
A0004 33999 2015
A0005 33999 2011
A0005 33999 2012
A0005 33995 2013
;
/*TRANSACTION INFORMATION*/
data have1;
infile cards truncover expandtabs;
input STORE_ID $ customer_id $ transaction_id $ transaction_date :date9.;
format transaction_date date8.;
cards;
A0001 aa 1 22jan2010
A0001 ab 2 11Oct2010
A0001 cc 3 16jun2011
A0001 dd 4 19apr2012
A0001 ee 5 10feb2013
A0001 ef 6 09nov2014
A0001 ff 7 04dec2014
A0001 ff 8 07dec2014
A0001 gg 9 22Oct2015
A0001 ds 10 30nov2015
A0002 ad 1 11jun2011
A0002 aq 2 11apr2014
A0002 qr 3 11feb2015
A0003 er 1 09nov2010
A0003 hh 2 04dec2011
A0003 hh 3 22jan2012
A0003 hd 4 11Oct2012
A0003 sd 5 16jun2013
A0003 cv 6 19apr2014
A0003 nb 7 10feb2015
A0003 wr 8 09nov2015
A0003 yt 9 04dec2015
A0003 ui 10 04dec2015
A0004 ol 1 09nov2014
A0004 rr 2 04dec2014
A0004 gh 3 04dec2015
;
/*CUSTOMER INFORMATION*/
data have2;
infile cards truncover expandtabs;
input customer_id $ customer_race $ ;
cards;
aa white
ab black
cc latino
dd other
ee white
ef white
ff white
gg black
ds black
ad black
aq latino
qr latino
er latino
hh other
hd latino
sd other
cv other
nb other
wr white
yt white
ui white
ol white
rr white
gh black
;
With the assistance of someone in the community, I was trying to create a dataset that combined information from the first 2 datasets to create extra rows for stores that opened/closed during my study period and count the number of transactions that occurred during the period they were opened as shown below:
proc sql;
create view store_open_close as
select
l.store_id,
l.zip_cd,
min(l.year) as open_year,
max(l.year) as close_year,
min(r.transaction_date) as min_transaction_date format=date9.,
max(r.transaction_date) as max_transaction_date format=date9.
from
have0 l
left join
have1 r
on l.store_id=r.store_id and l.year=year(r.transaction_date)
group by l.store_id, l.zip_cd
;
quit;
proc sql;
create table long as
select
l.*,
count(r.transaction_id) as no_transactions, /*issue here: I think it should be count(distinct r.transaction_id)*/
count(distinct r.customer_id) as no_customers_overall,
s.open_year,
s.close_year,
s.min_transaction_date,
s.max_transaction_date
from have0 l
left join
have1 r
on l.store_id=r.store_id and l.year=year(r.transaction_date)
left join
have2 t
on r.customer_id=t.customer_id and l.year=year(r.transaction_date)
left join
store_open_close s
on l.store_id=s.store_id and l.zip_cd=s.zip_cd
group by
l.store_id, l.zip_cd, s.open_year, s.close_year, s.min_transaction_date, s.max_transaction_date, l.year
;
quit;
I encounter two issues though:
1. I think counting the number of transactions is not done correctly--I commented where I think it should be count(distinct r.transaction_id) to get the total transactions while open?
2. I want to now combine dataset 3 during this piece to also group the customer_id by race and count how many there were as new variables (e.g., no_customer_white, no_customer_black, and so on...) within this proc sql.
Any help would be appreciated. Thank you!
I modified the code a little bit and I think I have most of it figured out except the fact that I want to obtain 1 row for each store_id, zip combination that shows STORE_ID, zip_cd, no_transactions (summed), no_customers (summed), no_customers_race (summed) open/close_year, min/max_transaction_date.
proc sql;
create view store_open_close as
select
l.store_id,
l.zip_cd,
min(l.year) as open_year,
max(l.year) as close_year,
min(r.transaction_date) as min_transaction_date format=date9.,
max(r.transaction_date) as max_transaction_date format=date9.
from
have0 l
left join
have1 r
on l.store_id=r.store_id and l.year=year(r.transaction_date)
group by l.store_id, l.zip_cd
;
quit;
proc sql;
create table long as
select
l.*,
count(distinct r.transaction_id) as no_transactions, /*issue here: I think it should be count(distinct r.transaction_id)*/
count(distinct r.customer_id) as no_customers_overall,
/*these customer race are wrong since I don't know how to count distinct id and group race*/
count(case when (t.customer_race='white') then t.customer_race end) as no_customers_white,
count(case when (t.customer_race='black') then t.customer_race end) as no_customers_black,
count(case when (t.customer_race='latino') then t.customer_race end) as no_customers_latino,
count(case when (t.customer_race='other') then t.customer_race end) as no_customers_other,
s.open_year,
s.close_year,
s.min_transaction_date,
s.max_transaction_date
from have0 l
left join
have1 r
on l.store_id=r.store_id and l.year=year(r.transaction_date)
left join
have2 t
on r.customer_id=t.customer_id and l.year=year(r.transaction_date)
left join
store_open_close s
on l.store_id=s.store_id and l.zip_cd=s.zip_cd
group by
l.store_id, l.zip_cd, s.open_year, s.close_year, s.min_transaction_date, s.max_transaction_date, l.year
;
quit;
/*Count totals*/
proc sql;
create table new as
select store_id, zip_cd, open_year, close_year, min_transaction_date, max_transaction_date,
sum(no_transactions) as total_transactions,
sum(no_customers_overall) as total_customers,
sum(no_customers_white) as total_customers_white,
sum(no_customers_black) as total_customers_black,
sum(no_customers_latino) as total_customers_latino,
sum(no_customers_other) as total_customers_other
from long
group by store_id, zip_cd, open_year, close_year, min_transaction_date, max_transaction_date;
quit;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.