Hi all,
I just can't seem to figure this out so hoping that someone here can help.
I have this dataset, where Var1 is an indicator for a particular event:
ID Month Var1
A Jan 1
A Jan 1
A Feb .
B Jan 1
B Jan 1
B Feb 1
B Mar 1
I want to create a new dataset that contains a new count variable that will count the number in Var1 for each ID/month group so that it looks like this:
ID Month Var1 Count
A Jan 1 1
A Jan 1 2
A Feb . .
B Jan 1 1
B Jan 1 2
B Feb 1 1
B Mar 1 1
How do I do this with a retain statement? This is what I tried:
data want;
set have;
where var1=1;
by id month;
if first.month then Count=0;
Count+1;
run;
(Side note: I know this probably won't give me the row where Var1=. but that's okay.)
data have;
input ID $ Month $ Var1;
cards;
A Jan 1
A Jan 1
A Feb .
B Jan 1
B Jan 1
B Feb 1
B Mar 1
;
data want;
set have;
by id month notsorted;
if first.month and var1 then count=1;
else if var1 then count+1;
else count=.;
run;
/*Or just*/
data want;
set have;
by id month notsorted;
if first.month then count=.;
if var1 then count+1;
run;
/*Or*/
data want;
set have;
by id month var1 notsorted;
if first.var1 then count=.;
if var1 then count+1;
run;
You need to specify "notsorted" in the by statement as the value are considered as character values (not a format of a numeric variable for the month 1-12). So SAS consider they are not sorted properly (Feb should occur before Jan etc. alphabetically)
data want;
set have;
by id month notsorted;
if first.month and var1 then count=1;
else if var1 then count+1;
else count=.;
run;
Oh, okay. So I'm sorry, I should have specified but month is actually in YYMM7. format (so it displays as 2018M10, for instance). I did do a proc sort by ID and month. But I will give this notsorted option a try as well.
So what was wrong with the results you got from running your code?
I just ran it again. I have another variable in my dataset for exact date and it looks like my code only counts observations occurring on the same date (as though the bygroup is date). Which doesn't make sense to me because I don't list the date variable anywhere in my code for proc sort or the data step. So in the end, I could have some observations occurring in the same month but the count gets reset whenever the actual date changes, not the month.
When you attach a format to a variable it changes how SAS displays the value, but it does not change the actual value. So probably to get the behavior you are seeing the two variables have the same values, but one is displaying just the month and the other displays the day of the month also.
Some procedures, like PROC FREQ and PROC MEANS, will use the formatted value when aggregating by a variable. But the BY processing of the DATA step looks at the actual values to see when they change, not the formatted (display) values.
You could fix that by converting the variable that you are using to show the month to use the same day of the month every time.
So something like this will make a copy of a date variable named DATE into another date valued variable named MONTH, but force the values to the first day of the month.
data want;
set have ;
month =intnx('month',date,0,'b');
format month yymm7. date date9.;
run;
In this case it is easier to use Lag functions instead of by variables with notsorted.
data have;
input ID $ Month $ Var1;
cards;
A Jan 1
A Jan 1
A Feb .
B Jan 1
B Jan 1
B Feb 1
B Mar 1
;
data want; set have;
if lag(ID)=ID and lag(Month)=Month then count + 1;
else count=Var1;
run;
Result:
So the Month variable is a SAS date. It is formatted to appear as a month value, but it contains a numeric value counting days from jan 1, 1960. And the value is different, if the days are different, even if they are in the same month. Here is a modified code to handle that:
data have;
input ID$ Month date9. Var1;
format Month YYMM7.;
cards;
A 12jan2019 1
A 20jan2019 1
A 16feb2019 .
B 01jan2019 1
B 17jan2019 1
B 24feb2019 1
B 07mar2019 1
;
data want; set have;
if lag(ID)=ID and month(lag(Month))=month(Month) then count + 1;
else count=Var1;
run;
Result - same as previous mail.
I realize it cannot handle a situation, where two consecutive records for the same ID has the same month, but in different years. They will be seen as the same month. This should be failsafe:
data want; set have;
if lag(ID)=ID and put(lag(Month),YYMM7.)=put(Month,YYMM7.) then count + 1;
else count=Var1;
run;
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Select SAS Training centers are offering in-person courses. View upcoming courses for: