Solved: To add new column based on multiple conditions from same and other dat...

Ronin · Posted 11-20-2018 12:48 AM

Hi I am very new to SAS and coding. I have a situation I have two datasets as below

1) Dataset 1:(with 86 variables and 7095 observation)

Var 1	var 2	WEEK	var 4	STORE	var 5
a	f	1	0	1	x
b	k	6	0	53	x
c	z	66	0	2	x
d	x	50	0	53	x
e	y	3	0	8	x

Here WEEK column above have values from week 1 to week 100

2) Dataset 2:(with 2 variables and 621 unique observation)

STORE	TIER
1	Tier A
2	Tier C
3	Tier B
8	Tier B
9	Tier C
53	Tier A
18	Tier C
98	Tier B

Here Tier column has values as Tier A,B,C for all 621 unique store numbers

I want to add a new column in dataset 1 as store type with below logic:

	Tier A	Tier B	Tier C
Pre-week (week 1 to Week 50)	11	12	13
Post-week (week 51 to Week 100)	21	22	23

So the new column will have numerically encoded values as 11,12,13,21,22,23 based on range of week from WEEK column in dataset 1 and mapped Tier from dataset 2.

Kindly help me with some sort of function code for this.

gamotte · Posted 11-20-2018 03:38 AM

Hello,

Please show what the want dataset should look like. Does the following program

answer your question ?

data have;
    input var1 $ var2 $ WEEK var4 STORE var5 $;
    cards;
a f 1 0 1 x
b k 6 0 53 x
c z 66 0 2 x
d x 50 0 53 x
e y 3 0 8 x
;
run;

data tiers;
    infile cards dlm=',' dsd;
    input STORE TIER $;
    cards;
1,Tier A
2,Tier C
3,Tier B
8,Tier B
9,Tier C
53,Tier A
18,Tier C
98,Tier B
;
run;

proc sort data=have; by STORE; run;
proc sort data=tiers; by STORE; run;

data want;
    merge have (in=inhave) tiers;
    by STORE;

    if WEEK le 50 then newvar=10;
    else newvar=20;

    if upcase(TIER)="TIER A" then newvar=newvar+1;
    else if  upcase(TIER)="TIER B" then newvar=newvar+2;
    else newvar=newvar+3;

    if inhave;
run;

proc sort data=want; by var1; run;

View solution in original post

gamotte · Posted 11-20-2018 03:38 AM

Hello,

Please show what the want dataset should look like. Does the following program

answer your question ?

data have;
    input var1 $ var2 $ WEEK var4 STORE var5 $;
    cards;
a f 1 0 1 x
b k 6 0 53 x
c z 66 0 2 x
d x 50 0 53 x
e y 3 0 8 x
;
run;

data tiers;
    infile cards dlm=',' dsd;
    input STORE TIER $;
    cards;
1,Tier A
2,Tier C
3,Tier B
8,Tier B
9,Tier C
53,Tier A
18,Tier C
98,Tier B
;
run;

proc sort data=have; by STORE; run;
proc sort data=tiers; by STORE; run;

data want;
    merge have (in=inhave) tiers;
    by STORE;

    if WEEK le 50 then newvar=10;
    else newvar=20;

    if upcase(TIER)="TIER A" then newvar=newvar+1;
    else if  upcase(TIER)="TIER B" then newvar=newvar+2;
    else newvar=newvar+3;

    if inhave;
run;

proc sort data=want; by var1; run;

Ronin · Posted 11-20-2018 04:49 AM

Hi,

Thanks for answering.

My apology for not being clear
The datasets I gave above are just part of actual data with dummy values.

as I mentioned in the problem statement the dataset 1 is having 86 variables and 7095 observation. I just showed some of them as

Var 1	var 2	WEEK	var 4	STORE	var 6	var 7…….var 86
a	f	1	0	1	x	……………………
b	k	6	0	53	x	……………………
c	z	66	0	2	x	……………………
d	x	50	0	53	x	……………………
e	y	3	0	8	x	……………………
………….	………….	………….	………….	………….	………….	……………………
………….	………….	………….	………….	………….	………….	……………………

this will continue for all 7095 observations....

Also the dataset 2 have 2 variables and 621 unique observations, for this also i have displayed just a part of the data. So using cards;

were you have shown whole data will become a tedious job.

STORE	TIER
1	Tier A
2	Tier C
3	Tier B
8	Tier B
9	Tier C
53	Tier A
18	Tier C
98	Tier B

continued till 621 rows with 621 unique Store values....

To your question the final want dataset is an update on dataset 1 with a new_variable added to it. It will look like:

Var 1	Var 2	WEEK	Var 4	STORE	Var 6	Var 7	var 8…………….var 85	var 86	new_variable
a	f	1	0	1	x	y	……………………………..	7 j	11
b	k	6	0	53	x	y	……………………………..	j u 6	11
c	z	66	0	2	x	y	……………………………..	next 7	23
d	x	50	0	53	x	y	……………………………..	j e s t k #2	11
e	y	3	0	8	x	y	……………………………..	l k j l_87	12
f	j	51	0	9	x	y	……………………………..	j k j_769	23
g	m	49	0	8	x	y	……………………………..	78_65_k j u	12

Here the new_variable is based on the logic as:

	Tier A	Tier B	Tier C
Pre-week (week 1 to Week 50)	11	12	13
Post-week (week 51 to Week 100)	21	22	23

which means for a given store no.

IF (dataset 1.WEEK <=50) AND (dataset 2.Tier=="Tier A") THEN new_variable=11

ELSE IF (dataset 1.WEEK <=50) AND (dataset 2.Tier=="Tier B") THEN new_variable=12

ELSE IF (dataset 1.WEEK <=50) AND (dataset 2.Tier=="Tier C") THEN new_variable=13

ELSE IF (dataset 1.WEEK >50) AND (dataset 2.Tier=="Tier A") THEN new_variable=21

ELSE IF (dataset 1.WEEK >50) AND (dataset 2.Tier=="Tier B") THEN new_variable=22

ELSE IF (dataset 1.WEEK >50) AND (dataset 2.Tier=="Tier C") THEN new_variable=23

I hope this might give you some more clarity about the the problem and logic I am trying to apply. Kindly help me.

gamotte · Posted 11-20-2018 05:07 AM

I think the program i proposed in my first answer does what you want.

Did you try it ?

Ronin · Posted 11-21-2018 12:33 AM

Hi;

Thanks a lot. The code worked with just small changes. I appreciate your help..Just one thing the values for new_variable added are moving to the next row. and when I create the CSV file the values for new variable is coming below the 2nd variable. You can see the snapshot in the file attached. The new_variable is Named here as GRP_NUM. Can you help me with it

To add new column based on multiple conditions from same and other datasets

Re: To add new column based on multiple conditions from same and other datasets

Re: To add new column based on multiple conditions from same and other datasets

Re: To add new column based on multiple conditions from same and other datasets

Re: To add new column based on multiple conditions from same and other datasets

Re: To add new column based on multiple conditions from same and other datasets

To add new column based on multiple conditions from same and other datasets

Re: To add new column based on multiple conditions from same and other datasets

Re: To add new column based on multiple conditions from same and other datasets

Re: To add new column based on multiple conditions from same and other datasets

Re: To add new column based on multiple conditions from same and other datasets

Re: To add new column based on multiple conditions from same and other datasets

SAS Innovate 2025: Call for Content

Click image to register for webinar

Classroom Training Available!