We have this dataset. Each ID is an item sold at a store. OLD_NEW is whether the version of the item is old or new. Months is time it took for the item to sell. Type1 is equal to A if it got sold before 12 months.
ID | STORE | OLD_NEW | MONTHS | type1 |
100 | 1 | OLD | 34 | . |
101 | 1 | NEW | 11 | A |
102 | 2 | OLD | 45 | . |
103 | 2 | NEW | 15 | . |
104 | 3 | NEW | 6 | A |
105 | 3 | OLD | 34 | . |
We would like to create a variable type2 that will only be filled out if the other item in the same store had type1=A. Like the following:
ID | STORE | OLD_NEW | MONTHS | type1 | type2 |
100 | 1 | OLD | 34 | . | B |
101 | 1 | NEW | 11 | A | . |
102 | 2 | OLD | 45 | . | . |
103 | 2 | NEW | 15 | . | . |
104 | 3 | NEW | 6 | A | . |
105 | 3 | OLD | 34 | . | B |
Why did you show the observations in random order? It would be much easier if they were in order, then you can just remember if there have been any with months < 12.
I find binary flags much easier to deal with.
data have;
input ID STORE OLD_NEW $ MONTHS type1 $ type2 $;
cards;
100 1 OLD 34 . B
101 1 NEW 11 A .
102 2 OLD 45 . .
103 2 NEW 15 . .
104 3 NEW 6 A .
105 3 OLD 34 . B
;
proc sort data=have ;
by store months ;
run;
data want;
set have;
by store ;
if first.store then n_type1=0;
new_type1=(months < 12);
new_type2=(n_type1>0);
if new_type1 then n_type1+1;
run;
new_ new_ OBS ID STORE OLD_NEW MONTHS type1 type2 n_type1 type1 type2 1 101 1 NEW 11 A 1 1 0 2 100 1 OLD 34 B 1 0 1 3 103 2 NEW 15 0 0 0 4 102 2 OLD 45 0 0 0 5 104 3 NEW 6 A 1 1 0 6 105 3 OLD 34 B 1 0 1
Why did you show the observations in random order? It would be much easier if they were in order, then you can just remember if there have been any with months < 12.
I find binary flags much easier to deal with.
data have;
input ID STORE OLD_NEW $ MONTHS type1 $ type2 $;
cards;
100 1 OLD 34 . B
101 1 NEW 11 A .
102 2 OLD 45 . .
103 2 NEW 15 . .
104 3 NEW 6 A .
105 3 OLD 34 . B
;
proc sort data=have ;
by store months ;
run;
data want;
set have;
by store ;
if first.store then n_type1=0;
new_type1=(months < 12);
new_type2=(n_type1>0);
if new_type1 then n_type1+1;
run;
new_ new_ OBS ID STORE OLD_NEW MONTHS type1 type2 n_type1 type1 type2 1 101 1 NEW 11 A 1 1 0 2 100 1 OLD 34 B 1 0 1 3 103 2 NEW 15 0 0 0 4 102 2 OLD 45 0 0 0 5 104 3 NEW 6 A 1 1 0 6 105 3 OLD 34 B 1 0 1
Thank you! It works perfectly.
It is never going to be missing with that data step.
The SUM STATEMENT means that the variable will be initialized to zero. Plus for the first observation of each BY group it is set to zero.
Read the documentation on the SUM STATEMENT that I linked before. Also read about the RETAIN statement, using a SUM statement implies that the variable that is being summed into is retained. Also read about difference in how missing values are handled by normal addition operator (A+B) and the SUM() function, sum(a,b). The SUM STATEMENT handles missing values of its two arguments in the same way as the SUM() function does.
Also understand about how Boolean expressions work in SAS. When you use a number, like NEW_TYPE1, in a boolean expression then SAS will treat a value that is 0 or missing as FALSE and any other value as TRUE. When you use the result of a Boolean expression, such as (months<12), as a number then TRUE expression result in a 1 and FALSE expressions result in a 0.
So
if new_type1 then n_type1+1;
means.
When NEW_TYPE1 has a value that is neither zero or missing then add 1 to the current value on N_TYPE1 (and if N_TYPE1 was somehow set to missing the result is 1 instead of missing). Or more clearly:
Increment n_type1 by 1 when new_type1 is true.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.