This approved answer has a problem on the big dataset. For unknown reason it gives the same obs_id at each row instaed of creating a new varaible. But it works on the samle sample data. Weired. I finaly managed to do what I wanted to do by creating another variable which is called "vidviewperson_cum " : this variable counts the number of views that the person has had so far from the specific video. for example in last column fourth row this number is 3,which means at obs_id=4 for person_id =1002, this video 502 has been viewd three times. Ideally, I want to create the last column below from teh first three columns: person_id video_id obs_id vidviewperson_cum distinctvideos_watched 1002 501 1 1 1 1002 502 2 1 2 1002 502 3 2 2 1002 502 4 3 2 1006 205 1 1 1 1006 205 2 2 1 1006 209 3 1 2 here is my final code and it works on 1.7 m observations: data have;
input person_id video_id obs_id $;
cards;
1002 501 1
1002 502 2
1002 502 3
1002 502 4
1006 205 1
1006 205 2
1006 209 3
;
run;
proc sort data=have;by person_id video_id obs_id ;run;
data want1;
set have;
vidviewperson_cum + 1;
by person_id video_id;
if first.video_id then vidviewperson_cum = 1;
run;
proc sort data=want1;by person_id obs_id ;run;
data want2;set want1;by person_id obs_id ;
retain videos_watched 0;
if vidviewperson_cum=1 then videos_watched=videos_watched+1;
if first.person_id then distinctvideos_watched=1;run;
... View more