Solved: Create a variable dependent on value from another row

ANKH1 · Posted 02-27-2023 06:04 PM

We have this dataset. Each ID is an item sold at a store. OLD_NEW is whether the version of the item is old or new. Months is time it took for the item to sell. Type1 is equal to A if it got sold before 12 months.

ID	STORE	OLD_NEW	MONTHS	type1
100	1	OLD	34	.
101	1	NEW	11	A
102	2	OLD	45	.
103	2	NEW	15	.
104	3	NEW	6	A
105	3	OLD	34	.

We would like to create a variable type2 that will only be filled out if the other item in the same store had type1=A. Like the following:

ID	STORE	OLD_NEW	MONTHS	type1	type2
100	1	OLD	34	.	B
101	1	NEW	11	A	.
102	2	OLD	45	.	.
103	2	NEW	15	.	.
104	3	NEW	6	A	.
105	3	OLD	34	.	B

Tom · Posted 02-27-2023 08:18 PM

Why did you show the observations in random order? It would be much easier if they were in order, then you can just remember if there have been any with months < 12.

I find binary flags much easier to deal with.

data have;
  input ID STORE OLD_NEW $ MONTHS type1 $ type2 $;
cards;
100 1 OLD 34 . B
101 1 NEW 11 A .
102 2 OLD 45 . .
103 2 NEW 15 . .
104 3 NEW 6 A .
105 3 OLD 34 . B
;

proc sort data=have ;
  by store months ;
run;

data want;
  set have;
  by store ;
  if first.store then n_type1=0;
  new_type1=(months < 12);
  new_type2=(n_type1>0);
  if new_type1 then n_type1+1;
run;

                                                                          new_     new_
OBS     ID    STORE    OLD_NEW    MONTHS    type1    type2    n_type1    type1    type2

 1     101      1        NEW        11        A                  1         1        0
 2     100      1        OLD        34                 B         1         0        1
 3     103      2        NEW        15                           0         0        0
 4     102      2        OLD        45                           0         0        0
 5     104      3        NEW         6        A                  1         1        0
 6     105      3        OLD        34                 B         1         0        1

View solution in original post

Tom · Posted 02-27-2023 08:18 PM

Why did you show the observations in random order? It would be much easier if they were in order, then you can just remember if there have been any with months < 12.

I find binary flags much easier to deal with.

data have;
  input ID STORE OLD_NEW $ MONTHS type1 $ type2 $;
cards;
100 1 OLD 34 . B
101 1 NEW 11 A .
102 2 OLD 45 . .
103 2 NEW 15 . .
104 3 NEW 6 A .
105 3 OLD 34 . B
;

proc sort data=have ;
  by store months ;
run;

data want;
  set have;
  by store ;
  if first.store then n_type1=0;
  new_type1=(months < 12);
  new_type2=(n_type1>0);
  if new_type1 then n_type1+1;
run;

                                                                          new_     new_
OBS     ID    STORE    OLD_NEW    MONTHS    type1    type2    n_type1    type1    type2

 1     101      1        NEW        11        A                  1         1        0
 2     100      1        OLD        34                 B         1         0        1
 3     103      2        NEW        15                           0         0        0
 4     102      2        OLD        45                           0         0        0
 5     104      3        NEW         6        A                  1         1        0
 6     105      3        OLD        34                 B         1         0        1

ANKH1 · Posted 02-28-2023 01:01 PM

Thank you! It works perfectly.

ANKH1 · Posted 02-28-2023 04:13 PM

Could you please explain why n_type1 turns into 1 for the second observation by adding the last line of code if it is missing up until that line?

Tom · Posted 02-28-2023 05:14 PM

It is never going to be missing with that data step.

The SUM STATEMENT means that the variable will be initialized to zero. Plus for the first observation of each BY group it is set to zero.

ANKH1 · Posted 02-28-2023 05:37 PM

Thank you! Sorry, I am new with SAS and I want to understand the code instead of only using. What is the function of the n_type1? And why for the observation 2 is 1 if you have this line " if new_type1 then n_type1+1;" Doesn't it mean that it is 1 if new_type1 is real? Why line is 0 for n_type1?

Tom · Posted 02-28-2023 06:46 PM

Read the documentation on the SUM STATEMENT that I linked before. Also read about the RETAIN statement, using a SUM statement implies that the variable that is being summed into is retained. Also read about difference in how missing values are handled by normal addition operator (A+B) and the SUM() function, sum(a,b). The SUM STATEMENT handles missing values of its two arguments in the same way as the SUM() function does.

Also understand about how Boolean expressions work in SAS. When you use a number, like NEW_TYPE1, in a boolean expression then SAS will treat a value that is 0 or missing as FALSE and any other value as TRUE. When you use the result of a Boolean expression, such as (months<12), as a number then TRUE expression result in a 1 and FALSE expressions result in a 0.

So

if new_type1 then n_type1+1;

means.

When NEW_TYPE1 has a value that is neither zero or missing then add 1 to the current value on N_TYPE1 (and if N_TYPE1 was somehow set to missing the result is 1 instead of missing). Or more clearly:

Increment n_type1 by 1 when new_type1 is true.

ANKH1 · Posted 02-28-2023 09:24 PM

Thank you, this is very helpful.

Create a variable dependent on value from another row

Re: Create a variable dependent on value from another row

Re: Create a variable dependent on value from another row

Re: Create a variable dependent on value from another row

Re: Create a variable dependent on value from another row

Re: Create a variable dependent on value from another row

Re: Create a variable dependent on value from another row

Re: Create a variable dependent on value from another row

Re: Create a variable dependent on value from another row

Registration is open

SAS Training: Just a Click Away