Hello everyone,
I have a scenario where I need to find whether the same "reward_program_id1" OR "reward_program_id2" is used in multiple product_id. In the below data product_id is unique however, reward_program_id1 & reward_program_id2 digits can be used interchangeable to purchase product.
For example, product_id 89067 is purchased using reward_program_id2 = 567694(row #1) and the same digits are used as reward_program_id1 to purchase product_id 90873 (row #4). Likewise, product_id 34340 is purchased using reward_program_id1 = 429008(row #3) and the same digits are used as reward_program_id2 to purchase product_id 10983(row #8).
In the output I am trying to add a column "Multi_program_id" Y/N if the same reward ID is used in more than one product. I hope this makes sense, any help is appreciated!
Thanks in advance.
Data have;
input product_id product_delivered $ reward_program_id1 reward_program_id2 ;
infile datalines delimiter=',';
datalines;
89067,Y,223456,567694
58394,N,343412,
34340,Y,429008,412948
90873,N,567694,293832
91028,N,389292,902221
38034,N,459303,343033
76281,N,828292,282929
10983,Y,909033,429008
18738,N,929292,
17930,Y,343412,982932
29383,N,898903,
11283,Y,976409,829737
98203,N,839234,928302
12930,N,918833,303033
78023,Y,123233,412948
;
run;
Output:
Product_id | product_delivered | reward_program_id1 | reward_program_id2 | Multi_program_id |
89067 | Y | 223456 | 567694 | Y |
58394 | N | 343412 | . | Y |
34340 | Y | 429008 | 412948 | Y |
90873 | N | 567694 | 293832 | Y |
91028 | N | 389292 | 902221 | N |
38034 | N | 459303 | 343033 | N |
76281 | N | 828292 | 282929 | N |
10983 | Y | 909033 | 429008 | Y |
18738 | N | 929292 | . | N |
17930 | Y | 343412 | 982932 | Y |
29383 | N | 898903 | . | N |
11283 | Y | 976409 | 829737 | N |
98203 | N | 839234 | 928302 | N |
12930 | N | 918833 | 303033 | N |
78023 | Y | 123233 | 412948 | Y |
Hi @vicky07
This an interesting problem, because the caunt of same reward-id on a product is taken from one or other both of the columns reward_program_id1 and reward_program_id2. So it ended up a bit more complicated than I thought from start. But I think it works.
Data have;
input product_id product_delivered $ reward_program_id1 reward_program_id2 ;
infile datalines delimiter=',';
datalines;
89067,Y,223456,567694
58394,N,343412,
34340,Y,429008,412948
90873,N,567694,293832
91028,N,389292,902221
38034,N,459303,343033
76281,N,828292,282929
10983,Y,909033,429008
18738,N,929292,
17930,Y,343412,982932
29383,N,898903,
11283,Y,976409,829737
98203,N,839234,928302
12930,N,918833,303033
78023,Y,123233,412948
;
run;
* Collapse reward_program_id1 and reward_program_id2 til one column without misings;
* order on reward_program_id to use in following data step;
proc sql;
create table rewardproduct as
select distinct
product_id,
reward_program_id
from (
select
product_id,
reward_program_id1 as reward_program_id
from have
where reward_program_id1 ne .
union
select
product_id,
reward_program_id2 as reward_program_id
from have
where reward_program_id2 ne .
)
order by
reward_program_id,
product_id;
quit;
* Count products per program_id and set code for multiple products pr. reward_program_id;
data multi;
set rewardproduct;
by reward_program_id;
if (first.reward_program_id and last.reward_program_id) then multi = 0;
else multi = 1;
run;
* Set highest value of products pr. reward_program_id as multivalue on product;
proc sql;
create table multi2 as
select distinct
product_id,
ifc(max(multi)=1,'Y','N') as Multi_Program_id
from multi
group by product_id;
quit;
* Join to get final result;
proc sql;
create table want as
select distinct
a.product_id,
a.product_delivered,
a.reward_program_id1,
a.reward_program_id2,
b.Multi_Program_id
from have as a
left join multi2 as b
on a.product_id = b.product_id;
quit;
Hi @vicky07
This an interesting problem, because the caunt of same reward-id on a product is taken from one or other both of the columns reward_program_id1 and reward_program_id2. So it ended up a bit more complicated than I thought from start. But I think it works.
Data have;
input product_id product_delivered $ reward_program_id1 reward_program_id2 ;
infile datalines delimiter=',';
datalines;
89067,Y,223456,567694
58394,N,343412,
34340,Y,429008,412948
90873,N,567694,293832
91028,N,389292,902221
38034,N,459303,343033
76281,N,828292,282929
10983,Y,909033,429008
18738,N,929292,
17930,Y,343412,982932
29383,N,898903,
11283,Y,976409,829737
98203,N,839234,928302
12930,N,918833,303033
78023,Y,123233,412948
;
run;
* Collapse reward_program_id1 and reward_program_id2 til one column without misings;
* order on reward_program_id to use in following data step;
proc sql;
create table rewardproduct as
select distinct
product_id,
reward_program_id
from (
select
product_id,
reward_program_id1 as reward_program_id
from have
where reward_program_id1 ne .
union
select
product_id,
reward_program_id2 as reward_program_id
from have
where reward_program_id2 ne .
)
order by
reward_program_id,
product_id;
quit;
* Count products per program_id and set code for multiple products pr. reward_program_id;
data multi;
set rewardproduct;
by reward_program_id;
if (first.reward_program_id and last.reward_program_id) then multi = 0;
else multi = 1;
run;
* Set highest value of products pr. reward_program_id as multivalue on product;
proc sql;
create table multi2 as
select distinct
product_id,
ifc(max(multi)=1,'Y','N') as Multi_Program_id
from multi
group by product_id;
quit;
* Join to get final result;
proc sql;
create table want as
select distinct
a.product_id,
a.product_delivered,
a.reward_program_id1,
a.reward_program_id2,
b.Multi_Program_id
from have as a
left join multi2 as b
on a.product_id = b.product_id;
quit;
I was going to post a similar approach. Is there another way to make the multi flag other than first and last variables? I didn't post my solution because I was not finished but I wanted to get a count of reward by product_id and then if it was ge 2 then it was in more than one product. Would something close to that work?
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.