<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Running Count Distinct by Group in SAS Data Management</title>
    <link>https://communities.sas.com/t5/SAS-Data-Management/Running-Count-Distinct-by-Group/m-p/260647#M7117</link>
    <description>&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sort data=have (keep=person_id video_id) nodupkey;
by person_id video_id;
run;

proc summary data=have;
by person_id;
output out=want (keep=person_id _freq_);
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
    <pubDate>Fri, 01 Apr 2016 09:08:16 GMT</pubDate>
    <dc:creator>Kurt_Bremser</dc:creator>
    <dc:date>2016-04-01T09:08:16Z</dc:date>
    <item>
      <title>Running Count Distinct by Group</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Running-Count-Distinct-by-Group/m-p/260644#M7115</link>
      <description>&lt;P&gt;Hi All,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am working on a panel data. Multiple individuals view&amp;nbsp;different videos.&amp;nbsp;Does any one know how I can create running distinct count of videos viewed within each group (peron_id) ? At each pont of time, I would like to see how many distinct videos the person has watched. Obs_id is created based on time order of viewing events. My desired table looks like below:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;person_id &amp;nbsp; video_id &amp;nbsp;running_count &amp;nbsp; &amp;nbsp;obs_id&lt;/P&gt;&lt;P&gt;1002 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 501 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;1 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;1&lt;/P&gt;&lt;P&gt;1002 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 502 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;2 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;2&lt;/P&gt;&lt;P&gt;1002 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 502 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;2 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;3&lt;/P&gt;&lt;P&gt;1002 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 502 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;2 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;4&lt;/P&gt;&lt;P&gt;1006 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 205 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;1 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;1&lt;/P&gt;&lt;P&gt;1006 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 205 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;1&lt;/P&gt;&lt;P&gt;1006 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 209 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;2&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;So far, I have tried this:&lt;/P&gt;&lt;P&gt;proc sql;&lt;BR /&gt;create table final3 as&lt;BR /&gt;SELECT person_id,&lt;BR /&gt;video_id,&lt;BR /&gt;count(distinct video_id) OVER&lt;BR /&gt;(ORDER BY person_id)&lt;BR /&gt;AS running_total&lt;BR /&gt;FROM final2;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;but error message pops up for the use of "OVER" here.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Any ideas how I can proceed?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 01 Apr 2016 08:51:17 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Running-Count-Distinct-by-Group/m-p/260644#M7115</guid>
      <dc:creator>fereshteh</dc:creator>
      <dc:date>2016-04-01T08:51:17Z</dc:date>
    </item>
    <item>
      <title>Re: Running Count Distinct by Group</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Running-Count-Distinct-by-Group/m-p/260647#M7117</link>
      <description>&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sort data=have (keep=person_id video_id) nodupkey;
by person_id video_id;
run;

proc summary data=have;
by person_id;
output out=want (keep=person_id _freq_);
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 01 Apr 2016 09:08:16 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Running-Count-Distinct-by-Group/m-p/260647#M7117</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2016-04-01T09:08:16Z</dc:date>
    </item>
    <item>
      <title>Re: Running Count Distinct by Group</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Running-Count-Distinct-by-Group/m-p/260648#M7118</link>
      <description>&lt;P&gt;That's not valid SAS syntax.&lt;/P&gt;
&lt;P&gt;And running counts is not really a SQL thing. SQL is column&amp;nbsp;based. If the data is sorted, try a data step instead.&lt;/P&gt;</description>
      <pubDate>Fri, 01 Apr 2016 09:11:37 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Running-Count-Distinct-by-Group/m-p/260648#M7118</guid>
      <dc:creator>LinusH</dc:creator>
      <dc:date>2016-04-01T09:11:37Z</dc:date>
    </item>
    <item>
      <title>Re: Running Count Distinct by Group</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Running-Count-Distinct-by-Group/m-p/260649#M7119</link>
      <description>Tnx but this does not work. I tried this code and it gives me the number of observations per person.</description>
      <pubDate>Fri, 01 Apr 2016 09:14:34 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Running-Count-Distinct-by-Group/m-p/260649#M7119</guid>
      <dc:creator>fereshteh</dc:creator>
      <dc:date>2016-04-01T09:14:34Z</dc:date>
    </item>
    <item>
      <title>Re: Running Count Distinct by Group</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Running-Count-Distinct-by-Group/m-p/260650#M7120</link>
      <description>I do not get your point. Can you be more clear on how?</description>
      <pubDate>Fri, 01 Apr 2016 09:16:23 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Running-Count-Distinct-by-Group/m-p/260650#M7120</guid>
      <dc:creator>fereshteh</dc:creator>
      <dc:date>2016-04-01T09:16:23Z</dc:date>
    </item>
    <item>
      <title>Re: Running Count Distinct by Group</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Running-Count-Distinct-by-Group/m-p/260651#M7121</link>
      <description>&lt;P&gt;No:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
input person_id video_id;
cards;
1002           501          1                        1
1002           502          2                        2
1002           502          2                        3
1002           502          2                        4
1006           205          1                        1
1006           205          1
1006           209          2
;
run;

proc sort data=have (keep=person_id video_id) nodupkey;
by person_id video_id;
run;

proc summary data=have;
by person_id;
output out=want (keep=person_id _freq_);
run;

proc print;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Gives this output, which only counts distinct video ID's per person:&lt;/P&gt;
&lt;PRE&gt;                                                             person_
                                                      Obs       id      _FREQ_

                                                       1       1002        2  
                                                       2       1006        2  
&lt;/PRE&gt;
&lt;P&gt;The count of observations per person would be 4 for 1002 and 3 for 1006&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 01 Apr 2016 09:18:23 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Running-Count-Distinct-by-Group/m-p/260651#M7121</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2016-04-01T09:18:23Z</dc:date>
    </item>
    <item>
      <title>Re: Running Count Distinct by Group</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Running-Count-Distinct-by-Group/m-p/260652#M7122</link>
      <description>&lt;P&gt;Please post test data in the form of a datastep.&lt;/P&gt;
&lt;P&gt;Try something like (not tested):&lt;/P&gt;
&lt;PRE&gt;proc sort data=have;
  by person_id video_id;
run;
data want;
  set have;
  by person_id video_id;
  retain running_count;
  if first.video_id then running_count=1;
  else running_count=running_count+1;
run;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 01 Apr 2016 09:20:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Running-Count-Distinct-by-Group/m-p/260652#M7122</guid>
      <dc:creator>RW9</dc:creator>
      <dc:date>2016-04-01T09:20:25Z</dc:date>
    </item>
    <item>
      <title>Re: Running Count Distinct by Group</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Running-Count-Distinct-by-Group/m-p/260654#M7123</link>
      <description>&lt;P&gt;This neglects the "at each point of time" and counts distincts videos watched at the latest time for each person. But I was trying to get&amp;nbsp;distinct counts at each point of time. Maybe I was not clear. In my description I wrote point of time is replcaed by Obs_id. For example,&amp;nbsp;at time=1 (obs_id=1) for the first person, distinct videos watched is one.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 01 Apr 2016 09:24:44 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Running-Count-Distinct-by-Group/m-p/260654#M7123</guid>
      <dc:creator>fereshteh</dc:creator>
      <dc:date>2016-04-01T09:24:44Z</dc:date>
    </item>
    <item>
      <title>Re: Running Count Distinct by Group</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Running-Count-Distinct-by-Group/m-p/260657#M7126</link>
      <description>&lt;P&gt;Ok.&amp;nbsp;Unfortunately, I cannot sort by video_id. That's not what I want to create. I should sort first by person_id then by time (obs_id). I already wrote sth similar to your code but that's another variable of interest.&lt;/P&gt;</description>
      <pubDate>Fri, 01 Apr 2016 09:35:18 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Running-Count-Distinct-by-Group/m-p/260657#M7126</guid>
      <dc:creator>fereshteh</dc:creator>
      <dc:date>2016-04-01T09:35:18Z</dc:date>
    </item>
    <item>
      <title>Re: Running Count Distinct by Group</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Running-Count-Distinct-by-Group/m-p/260660#M7127</link>
      <description>&lt;P&gt;Ok, thats not clear from your data - labelling a variable obs_id is not a good idea - lable it something pertaining to the data it contains. &amp;nbsp;The same principal is there regarless of what variables are used:&lt;/P&gt;
&lt;P&gt;1) Sort the data so the data appears firt in the highest group, then the next group etc. for each group, and then data point count order.&lt;/P&gt;
&lt;P&gt;2) Step through data retaining a count at each record, resetting when you get to the first occurence of a new group.&lt;/P&gt;
&lt;PRE&gt;proc sort data=have;
  by person_id obs_id video_id;
run;
data want;
  set have;
  by person_id obs_id video_id;
  retain running_count;
  if first.video_id then running_count=1;
  else running_count=running_count+1;
run;&lt;/PRE&gt;
&lt;P&gt;Or the same thing can be achieved in SQL with group by person_id, obs_id, video_id. &amp;nbsp;It is really the grouping that matters.&lt;/P&gt;</description>
      <pubDate>Fri, 01 Apr 2016 09:45:03 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Running-Count-Distinct-by-Group/m-p/260660#M7127</guid>
      <dc:creator>RW9</dc:creator>
      <dc:date>2016-04-01T09:45:03Z</dc:date>
    </item>
    <item>
      <title>Re: Running Count Distinct by Group</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Running-Count-Distinct-by-Group/m-p/260747#M7139</link>
      <description>&lt;P&gt;Why can't you sort by video_id?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Sort it to calculate the count and then resort to what you want.&lt;/P&gt;</description>
      <pubDate>Fri, 01 Apr 2016 14:57:08 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Running-Count-Distinct-by-Group/m-p/260747#M7139</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2016-04-01T14:57:08Z</dc:date>
    </item>
    <item>
      <title>Re: Running Count Distinct by Group</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Running-Count-Distinct-by-Group/m-p/260784#M7141</link>
      <description>&lt;P&gt;Oh this gives me a lot of ones for the running count because at each row when it approaches a new video_id it resets the running count to zero. I will post a sample data through data step.&lt;/P&gt;</description>
      <pubDate>Fri, 01 Apr 2016 17:10:56 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Running-Count-Distinct-by-Group/m-p/260784#M7141</guid>
      <dc:creator>fereshteh</dc:creator>
      <dc:date>2016-04-01T17:10:56Z</dc:date>
    </item>
    <item>
      <title>Re: Running Count Distinct by Group</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Running-Count-Distinct-by-Group/m-p/260787#M7142</link>
      <description>because that does not give me the variable I would like to create. I would like to know counts of distinct video views AT EACH POINT OF TIME for each person. Data should be sorted by: person_id viewing_date video_id&lt;BR /&gt;Then I would like to count distinct views up to each viewing date (=time=obs_id)</description>
      <pubDate>Fri, 01 Apr 2016 17:17:22 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Running-Count-Distinct-by-Group/m-p/260787#M7142</guid>
      <dc:creator>fereshteh</dc:creator>
      <dc:date>2016-04-01T17:17:22Z</dc:date>
    </item>
    <item>
      <title>Re: Running Count Distinct by Group</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Running-Count-Distinct-by-Group/m-p/260792#M7143</link>
      <description>&lt;P&gt;How big is your dataset, how many potential videos are watched, do you have a max? At each point you're going to have search backwards for the video...I would create a string that stored the video id's retain it across the id. If the id is found then don't increment, otherwise increment and add to list. &amp;nbsp;Brute force but would work....I'll post some code in a bit.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 01 Apr 2016 17:24:45 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Running-Count-Distinct-by-Group/m-p/260792#M7143</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2016-04-01T17:24:45Z</dc:date>
    </item>
    <item>
      <title>Re: Running Count Distinct by Group</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Running-Count-Distinct-by-Group/m-p/260793#M7144</link>
      <description>I have around 1.7 million rows in the panel. each person watches 20 videos say, on average. tnx!</description>
      <pubDate>Fri, 01 Apr 2016 17:27:50 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Running-Count-Distinct-by-Group/m-p/260793#M7144</guid>
      <dc:creator>fereshteh</dc:creator>
      <dc:date>2016-04-01T17:27:50Z</dc:date>
    </item>
    <item>
      <title>Re: Running Count Distinct by Group</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Running-Count-Distinct-by-Group/m-p/260820#M7145</link>
      <description>&lt;P&gt;Here's my solution. Your sample data isn't a good one because it doesn't identify the issues with your data. I changed the last ID so it would. I have no idea about efficiency but it should work. 1.7 million rows isn't very much anywas. I would drop the videos_watched variable at the end, once you've verified this works.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
input person_id video_id $;
cards;
1002           501 
1002           502 
1002           502 
1002           502 
1006           209 
1006           205 
1006           209 
;
run;

data want;
length videos_watched $500.;
set have;
by person_Id video_id notsorted;

retain videos_watched count;

if first.person_id then do;
call missing(videos_watched);
count=0;
end;

if first.video_id then do;
	if index(videos_watched, trim(video_id))=0 then do;
		count+1;
		videos_watched=catx(', ', videos_watched, video_id);
	end;
end;

run;

proc print; run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 01 Apr 2016 19:43:41 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Running-Count-Distinct-by-Group/m-p/260820#M7145</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2016-04-01T19:43:41Z</dc:date>
    </item>
    <item>
      <title>Re: Running Count Distinct by Group</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Running-Count-Distinct-by-Group/m-p/260827#M7147</link>
      <description>It does not solve my problem but I use this to create another variable</description>
      <pubDate>Fri, 01 Apr 2016 20:14:28 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Running-Count-Distinct-by-Group/m-p/260827#M7147</guid>
      <dc:creator>fereshteh</dc:creator>
      <dc:date>2016-04-01T20:14:28Z</dc:date>
    </item>
    <item>
      <title>Re: Running Count Distinct by Group</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Running-Count-Distinct-by-Group/m-p/260841#M7150</link>
      <description>&lt;P&gt;This has a problem on the big dataset. For unknown reason it gives the same obs_id at each row instaed of creating a new varaible. But it works on the samle sample data. Weired. I finaly managed to do what I wanted to do by creating another variable which is called "vidviewperson_cum " : this variable counts the number of views that the person has had so far from the specific video. for example in last column fourth row this number is 3,which means at obs_id=4 for person_id =1002, this video 502 has been viewd three times. Ideally, I want to create the last column below from teh first three columns:&lt;/P&gt;&lt;P&gt;person_id&amp;nbsp;&amp;nbsp; video_id&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; vidviewperson_cum&amp;nbsp;&amp;nbsp;&amp;nbsp; distinctvideos_watched&lt;BR /&gt;&lt;BR /&gt;1002&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 501&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1&lt;BR /&gt;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;1002&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 502&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 2&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1&lt;BR /&gt;&amp;nbsp;&lt;BR /&gt;1002&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 502&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 3&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 2&lt;BR /&gt;&lt;BR /&gt;1002&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 502&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 4&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 3&lt;BR /&gt;&lt;BR /&gt;1006&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 205&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1&lt;BR /&gt;&lt;BR /&gt;1006&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 205&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 2&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 2&lt;BR /&gt;&amp;nbsp;&lt;BR /&gt;1006&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 209&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 3&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;here is my final code and it works on 1.7 m observations:&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
input person_id video_id obs_id vidviewperson_cum $;
cards;
1002           501           1                

1002           502           2                 

1002           502           3                 

1002           502           4                 

1006           205           1                

1006           205           2               
 
1006           209           3                

;
run;


proc sort data=have;by person_id  video_id  obs_id ;run;
data want1;
  set have;
  vidviewperson_cum + 1;
  by person_id video_id;
  if first.video_id then  vidviewperson_cum = 1;
run;

proc sort data=want1;by person_id  obs_id  ;run;
data want2;set want1;by person_id obs_id ;
retain videos_watched 0;
if vidviewperson_cum=1 then videos_watched=videos_watched+1;
if first.person_id then distinctvideos_watched=1;run;

 &lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 01 Apr 2016 21:41:43 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Running-Count-Distinct-by-Group/m-p/260841#M7150</guid>
      <dc:creator>fereshteh</dc:creator>
      <dc:date>2016-04-01T21:41:43Z</dc:date>
    </item>
    <item>
      <title>Re: Running Count Distinct by Group</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Running-Count-Distinct-by-Group/m-p/260844#M7151</link>
      <description>&lt;P&gt;This approved answer has a problem on the big dataset. For unknown reason it gives the same obs_id at each row instaed of creating a new varaible. But it works on the samle sample data. Weired. I finaly managed to do what I wanted to do by creating another variable which is called "vidviewperson_cum " : this variable counts the number of views that the person has had so far from the specific video. for example in last column fourth row this number is 3,which means at obs_id=4 for person_id =1002, this video 502 has been viewd three times. Ideally, I want to create the last column below from teh first three columns:&lt;/P&gt;&lt;P&gt;person_id &amp;nbsp; &amp;nbsp; &amp;nbsp; video_id &amp;nbsp; obs_id&amp;nbsp;&amp;nbsp; vidviewperson_cum &amp;nbsp; &amp;nbsp; distinctvideos_watched&lt;/P&gt;&lt;P&gt;1002 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; 501&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1&lt;BR /&gt;1002&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp; 502&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp; 2&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 2&lt;BR /&gt;1002&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 502&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp; 3&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp; 2&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 2&lt;/P&gt;&lt;P&gt;1002&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; 502&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp; 4&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 3 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 2&lt;/P&gt;&lt;P&gt;1006&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp; 205&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; 1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp; 1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1&lt;/P&gt;&lt;P&gt;1006&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; 205&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp; 2 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp; 2 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 1&lt;BR /&gt;1006&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 209 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 3&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 2&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;here is my final code and it works on 1.7 m observations:&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
input person_id video_id obs_id $;
cards;
1002  501     1                
1002  502     2                 
1002  502     3                 
1002  502     4                 
1006  205     1                
1006  205     2               
1006  209     3                

;
run;
 
proc sort data=have;by person_id  video_id  obs_id ;run;
data want1;
  set have;
  vidviewperson_cum + 1;
  by person_id video_id;
  if first.video_id then  vidviewperson_cum = 1;
run;

proc sort data=want1;by person_id  obs_id  ;run;
data want2;set want1;by person_id obs_id ;
retain videos_watched 0;
if vidviewperson_cum=1 then videos_watched=videos_watched+1;
if first.person_id then distinctvideos_watched=1;run;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 01 Apr 2016 21:52:08 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Running-Count-Distinct-by-Group/m-p/260844#M7151</guid>
      <dc:creator>fereshteh</dc:creator>
      <dc:date>2016-04-01T21:52:08Z</dc:date>
    </item>
    <item>
      <title>Re: Running Count Distinct by Group</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Running-Count-Distinct-by-Group/m-p/260845#M7152</link>
      <description>&lt;P&gt;This is why I mentioned that your test data is important. Interestingly enough, your code does not work on the test data I created, where a person watches a video A, then video B, then rewatches video A.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;But if it works for you, that's all that matters.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Just a note that you had mentioned that you couldn't sort by video_id, but you ended up doing that in your solution anyways....&lt;/P&gt;</description>
      <pubDate>Fri, 01 Apr 2016 22:13:20 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Running-Count-Distinct-by-Group/m-p/260845#M7152</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2016-04-01T22:13:20Z</dc:date>
    </item>
  </channel>
</rss>

