Fluorite | Level 6

## Retain statement with two by group variables?

Hi all,

I just can't seem to figure this out so hoping that someone here can help.

I have this dataset, where Var1 is an indicator for a particular event:

ID  Month  Var1

A   Jan       1

A   Jan       1

A   Feb       .

B   Jan       1

B   Jan       1

B   Feb      1

B   Mar      1

I want to create a new dataset that contains a new count variable that will count the number in Var1 for each ID/month group so that it looks like this:

ID  Month  Var1      Count

A   Jan       1            1

A   Jan       1            2

A   Feb       .            .

B   Jan       1            1

B   Jan       1            2

B   Feb      1            1

B   Mar      1             1

How do I do this with a retain statement? This is what I tried:

``````data want;
set have;
where var1=1;
by id month;
if first.month then Count=0;
Count+1;

run;``````

(Side note: I know this probably won't give me the row where Var1=. but that's okay.)

10 REPLIES 10
Tourmaline | Level 20

## Re: Retain statement with two by group variables?

``````data have;
input ID \$  Month \$  Var1;
cards;
A   Jan       1
A   Jan       1
A   Feb       .
B   Jan       1
B   Jan       1
B   Feb      1
B   Mar      1
;

data want;
set have;
by id month notsorted;
if first.month and var1 then count=1;
else if var1 then count+1;
else count=.;
run;``````

/*Or just*/

``````data want;
set have;
by id month notsorted;
if first.month then count=.;
if var1 then count+1;
run;
``````

/*Or*/

``````data want;
set have;
by id month var1 notsorted;
if first.var1 then count=.;
if var1 then count+1;
run;``````

Meteorite | Level 14

## Re: Retain statement with two by group variables?

You need to specify "notsorted" in the by statement as the value are considered as character values (not a format of a numeric variable for the month 1-12). So SAS consider they are not sorted properly (Feb should occur before Jan etc. alphabetically)

``````data want;
set have;
by id month notsorted;
if first.month and var1 then count=1;
else if var1 then count+1;
else count=.;
run;``````
Fluorite | Level 6

## Re: Retain statement with two by group variables?

Oh, okay. So I'm sorry, I should have specified but month is actually in YYMM7. format (so it displays as 2018M10, for instance). I did do a proc sort by ID and month. But I will give this notsorted option a try as well.

Super User

## Re: Retain statement with two by group variables?

So what was wrong with the results you got from running your code?

Fluorite | Level 6

## Re: Retain statement with two by group variables?

I just ran it again. I have another variable in my dataset for exact date and it looks like my code only counts observations occurring on the same date (as though the bygroup is date). Which doesn't make sense to me because I don't list the date variable anywhere in my code for proc sort or the data step. So in the end, I could have some observations occurring in the same month but the count gets reset whenever the actual date changes, not the month.

Super User

## Re: Retain statement with two by group variables?

When you attach a format to a variable it changes how SAS displays the value, but it does not change the actual value.  So probably to get the behavior you are seeing the two variables have the same values, but one is displaying just the month and the other displays the day of the month also.

Some procedures, like PROC FREQ and PROC MEANS, will use the formatted value when aggregating by a variable.  But the BY processing of the DATA step looks at the actual values to see when they change, not the formatted (display) values.

You could fix that by converting the variable that you are using to show the month to use the same day of the month every time.

So something like this will make a copy of a date variable named DATE into another date valued variable named MONTH, but force the values to the first day of the month.

``````data want;
set have ;
month =intnx('month',date,0,'b');
format month yymm7. date date9.;
run;``````

Rhodochrosite | Level 12

## Re: Retain statement with two by group variables?

In this case it is easier to use Lag functions instead of by variables with notsorted.

``````data have;
input ID \$  Month \$  Var1;
cards;
A   Jan       1
A   Jan       1
A   Feb       .
B   Jan       1
B   Jan       1
B   Feb      1
B   Mar      1
;

data want; set have;
if lag(ID)=ID and lag(Month)=Month then count + 1;
else count=Var1;
run;``````

Result:

Fluorite | Level 6

## Re: Retain statement with two by group variables?

thank you! I'm not familiar with lag functions at all so will check this out.
Rhodochrosite | Level 12

## Re: Retain statement with two by group variables?

So the Month variable is a SAS date. It is formatted to appear as a month value, but it contains a numeric value counting days from jan 1, 1960. And the value is different, if the days are different, even if they are in the same month. Here is a modified code to handle that:

``````data have;
input ID\$ Month date9. Var1;
format Month YYMM7.;
cards;
A 12jan2019 1
A 20jan2019 1
A 16feb2019 .
B 01jan2019 1
B 17jan2019 1
B 24feb2019 1
B 07mar2019 1
;

data want; set have;
if lag(ID)=ID and month(lag(Month))=month(Month) then count + 1;
else count=Var1;
run;``````

Result - same as previous mail.

Rhodochrosite | Level 12

## Re: Retain statement with two by group variables?

I realize it cannot handle a situation, where two consecutive records for the same ID has the same month, but in different years. They will be seen as the same month. This should be failsafe:

``````
data want; set have;
if lag(ID)=ID and put(lag(Month),YYMM7.)=put(Month,YYMM7.) then count + 1;
else count=Var1;
run;``````

Discussion stats
• 10 replies
• 6195 views
• 1 like
• 5 in conversation