Hi Team,
I have PAN INDIA customer shopping table which contains online transaction details as well as store details.
I want to create a new variable for category like customer is online_only, Store_only and unclassified (if shopped by both criteria).
Sample dataset :
DATA C;
input CARD_NUM $5. STORE_NAME $7.;
CARDS;
12345 ONLINE
23456 STR_1
98765 ONLINE
98765 STR_NM
98765 STR_5
43214 STR_A
;
Run;
Desired output :
12345 ONLINE_ONLY
23456 STORE_ONLY
98765 UNCLASSIFIED
43214 STORE_ONLY
kindly note : For online transaction, i have only one indicator as 'ONLINE' but for store visit we have n number of store names.
Looking for your kind support.
Regards,
Uma Shanker Saini
Hi @umashankersaini,
I think you don't need a CASE WHEN statement here. Just use the MAX and CHOOSEC functions:
proc sql;
create table want as
select card_num, choosec(max(store_name='ONLINE')+2*max(store_name ne 'ONLINE'),
'ONLINE_ONLY', 'STORE_ONLY', 'UNCLASSIFIED')
as categ length=12
from c
group by card_num;
quit;
And, yes, as
@Reeza wrote:
You need to explain the rules of how STORE_NAME will map to your new categories.
My code is based on the assumption that anything other than 'ONLINE' (what about spelling variants like 'Online'? Missing values?) indicates "Store". Obviously, you would need to modify the conditions about STORE_NAME if this assumption does not hold in your real data.
If you need to preserve the order of CARD_NUM values from dataset C, I'd rather recommend a DATA step solution, e.g.
data want(drop=_: store_name);
do until(last.card_num);
set c;
by card_num notsorted;
if store_name='ONLINE' then _o=1;
else _s=1;
end;
length categ $12;
categ=choosec(sum(_o,_s,_s), 'ONLINE_ONLY', 'STORE_ONLY', 'UNCLASSIFIED');
run;
Same assumption as above plus the data must be grouped by CARD_NUM.
You need to explain the rules of how STORE_NAME will map to your new categories.
@umashankersaini wrote:
Hi Team,
I have PAN INDIA customer shopping table which contains online transaction details as well as store details.
I want to create a new variable for category like customer is online_only, Store_only and unclassified (if shopped by both criteria).
Sample dataset :
DATA C;
input CARD_NUM $5. STORE_NAME $7.;
CARDS;
12345 ONLINE
23456 STR_1
98765 ONLINE
98765 STR_NM
98765 STR_5
43214 STR_A
;
Run;
Desired output :
12345 ONLINE_ONLY
23456 STORE_ONLY
98765 UNCLASSIFIED
43214 STORE_ONLY
kindly note : For online transaction, i have only one indicator as 'ONLINE' but for store visit we have n number of store names.
Looking for your kind support.
Regards,
Uma Shanker Saini
1. If customer is buying product online - then online only
2. If customer is buying by going to physical store - then store only
3. if customer is buying by both ways (online and store visit ) - Then unclassified
for online purchase - store_name would be same like 'ONLINE' across the globe
for physical store - Store_name would be any thing without pattern.
Hi @umashankersaini,
I think you don't need a CASE WHEN statement here. Just use the MAX and CHOOSEC functions:
proc sql;
create table want as
select card_num, choosec(max(store_name='ONLINE')+2*max(store_name ne 'ONLINE'),
'ONLINE_ONLY', 'STORE_ONLY', 'UNCLASSIFIED')
as categ length=12
from c
group by card_num;
quit;
And, yes, as
@Reeza wrote:
You need to explain the rules of how STORE_NAME will map to your new categories.
My code is based on the assumption that anything other than 'ONLINE' (what about spelling variants like 'Online'? Missing values?) indicates "Store". Obviously, you would need to modify the conditions about STORE_NAME if this assumption does not hold in your real data.
If you need to preserve the order of CARD_NUM values from dataset C, I'd rather recommend a DATA step solution, e.g.
data want(drop=_: store_name);
do until(last.card_num);
set c;
by card_num notsorted;
if store_name='ONLINE' then _o=1;
else _s=1;
end;
length categ $12;
categ=choosec(sum(_o,_s,_s), 'ONLINE_ONLY', 'STORE_ONLY', 'UNCLASSIFIED');
run;
Same assumption as above plus the data must be grouped by CARD_NUM.
This works for the provided example data.
proc format library=work; value $store_online 'ONLINE' = 'ONLINE' other = 'STORE' ; DATA C; input CARD_NUM $5. STORE_NAME $7.; CARDS; 12345 ONLINE 23456 STR_1 98765 ONLINE 98765 STR_NM 98765 STR_5 43214 STR_A ; Run; proc freq data=c noprint; tables card_num*store_name/out=cardcount; format store_name $store_online.; run; proc sql; create table want as select distinct card_num, case when count(*)>1 then 'UNCLASSIFIED' When count(*)=1 then put(store_name,$store_online.) end as status from cardcount group by card_num ; quit;
The Format plus Proc freq reduces the possible values of store to 2.Formatted values will be used to create groups in the proc. If the summary output from proc freq for a card has more than one row (count(*)>1) then it should come from combination of store and online.
I didn't make the format use the longer text you show. That is left as an exercise for the interested reader.
Hi,
Please find the code below:
proc sql;
Create table want as select distinct Card_Num ,
case when (count(Store_Name) = 1 and lower(Store_Name) like 'str%') then 'Store_Only'
when (count(Store_Name) = 1 and lower(Store_name) ='online') then 'Online'
when (count(Store_Name) >= 2) then 'Unclassified' end as Category
from C
group by Card_Num
Order by Card_Num
;
quit;
Output would be :
CARD_NUM Category
12345 | Online |
23456 | Store_Only |
43214 | Store_Only |
98765 | Unclassified |
Regards,
Anushree
Proc SQl;
create table want as
select distinct CARD_NUM,
case
when count( distinct scan(STORE_NAME,1,'_')) > 1
then "UNCLASSIFIED"
else cats(scan(STORE_NAME,1,'_'),'_',"ONLY")
end as class
from c
group by CARD_NUM;
Quit;
Hi Team,
Thanks for your quick reply.
There is no pattern for store_name, if offline (store visit). we have pattern for online purchase only.
I tried with :
proc sql;
Create table want (compress=yes) as
Select distinct CARD_NUM
, case when (STORE_NAME) like '%ONLINE%' then 'ONLINE ONLY'
when STORE_NAME not like '%ONLINE%' then 'STORE ONLY'
when STORE_NAME like '%ONLINE%' and STORE_NAME not like '%ONLINE%' then 'UNCLASSIFIED'
End as CHANNEL
From A
Group by 1;
quit;
This code is not working fine for unclassified category
@umashankersaini: Thanks for clarifying the criteria. Have you tried to implement these in my suggested solution? If not, please see below:
proc sql;
create table want as
select card_num, choosec(max(store_name like '%ONLINE%')+2*max(store_name not like '%ONLINE%'),
'ONLINE_ONLY', 'STORE_ONLY', 'UNCLASSIFIED')
as channel length=12
from c
group by card_num;
quit;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.