Hello
I have a table like this;
| Date | Student Name | 
| 8/25/2020 | Jane | 
| 8/25/2020 | Andy | 
| 8/25/2020 | Mary | 
| 8/26/2020 | Sue | 
| 8/26/2020 | Nate | 
| 8/26/2020 | Andy | 
| 8/26/2020 | Joe | 
| 8/27/2020 | Jane | 
| 8/27/2020 | Kathy | 
I want the table to look like this. As you can see Andy is in Aug 25 and Aug 26 so I want him counted only in the first date. Basically i want an unduplicated count by date and then cumulatively add the count by date.
| Date | unduplicated count | cumulative count | 
| 8/25/2020 | 3 | 3 | 
| 8/26/2020 | 3 | 6 | 
| 8/27/2020 | 1 | 7 | 
I am not sure how to get what i want with either proc sql or sas data step.
Thanks
ananda
Hi again @anandas I have simplified my previous solution by eliminating the intermediate step in the below-
data have;
input id 	date :datetime20.;
format date datetime20.;
cards;
2186282	18JUL1995:13:21:29
2007173	19JUL1995:10:24:27
2007623	24JUL1995:11:20:27
2007173	10AUG1995:10:24:27
2007623	16AUG1995:11:59:52
2186282	25AUG1995:13:22:03
2117809	11OCT1995:13:31:30
2064119	26JUL1996:16:46:32
2064119	26JUL1996:16:46:40
2064119	19AUG1996:16:47:46
1220090	26AUG1996:00:00:00
2011790	27AUG1996:00:00:00
2011790	30AUG1996:00:00:00
2011790	03SEP1996:00:00:00
2008211	03SEP1996:10:36:43
2013623	04SEP1996:09:12:21
2013623	04SEP1996:09:12:24
2013623	04SEP1996:09:12:28
2013623	04SEP1996:09:12:32
2012087	05SEP1996:13:08:07
1259184	26SEP1996:00:00:00
2012342	01OCT1996:00:00:00
2064119	01OCT1996:16:50:51
1216288	02OCT1996:00:00:00
2008961	08OCT1996:11:25:31
2008961	08OCT1996:11:25:35
1220474	10OCT1996:00:00:00
;
data want;
 if _n_=1 then do;
  if 0 then set have;
  dcl hash H () ;
  h.definekey  ("id") ;
  h.definedone () ;
 end;
 undup_count=0;
 do until(last.date);
  set have;
  by  date groupformat;
  format date dtmonyy7. ;
  if put(date,dtyear4.-l) ne put(lag(date),dtyear4. -l) then do;
   cumul_count=0;
   h.clear();
  end;
  if h.check() ne 0 then do;
   undup_count=sum(undup_count,1);
   h.add();
  end;
 end;
 cumul_count+undup_count;
run;
proc print noobs;run;| id | date | undup_count | cumul_count | 
|---|---|---|---|
| 2007623 | JUL1995 | 3 | 3 | 
| 2186282 | AUG1995 | 0 | 3 | 
| 2117809 | OCT1995 | 1 | 4 | 
| 2064119 | JUL1996 | 1 | 1 | 
| 2011790 | AUG1996 | 2 | 3 | 
| 1259184 | SEP1996 | 4 | 7 | 
| 1220474 | OCT1996 | 4 | 11 | 
Hi @anandas If you are comfortable with Hashes, Here is an easy solution
data have;
input Date :mmddyy10.	Student_Name $;
format date mmddyy10.;
cards;
8/25/2020	Jane
8/25/2020	Andy
8/25/2020	Mary
8/26/2020	Sue
8/26/2020	Nate
8/26/2020	Andy
8/26/2020	Joe
8/27/2020	Jane
8/27/2020	Kathy
;
data want;
 if _n_=1 then do;
  dcl hash H () ;
  h.definekey  ("student_name") ;
  h.definedone () ;
 end;
 do until(last.date);
  set have;
  by date;
  if h.check() ne 0 then do;
   undup_count=sum(undup_count,1);
   cumul_count+1;
   h.add();
  end;
 end;
 drop student_name;
run;
Well sounds like all you need to do is replace the student_name with ID .
Hmm why let me assume what's in your data. Can you plz create and provide a "mock" sample that is a good representative of your original data. You are likely to get the most appropriate and closest solution if you could do that plz
Thank you.
Below is the sample data table.
| id | date | 
| 2186282 | 18JUL1995:13:21:29 | 
| 2007173 | 19JUL1995:10:24:27 | 
| 2007623 | 24JUL1995:11:20:27 | 
| 2007173 | 10AUG1995:10:24:27 | 
| 2007623 | 16AUG1995:11:59:52 | 
| 2186282 | 25AUG1995:13:22:03 | 
| 2117809 | 11OCT1995:13:31:30 | 
| 2064119 | 26JUL1996:16:46:32 | 
| 2064119 | 26JUL1996:16:46:40 | 
| 2064119 | 19AUG1996:16:47:46 | 
| 1220090 | 26AUG1996:00:00:00 | 
| 2011790 | 27AUG1996:00:00:00 | 
| 2011790 | 30AUG1996:00:00:00 | 
| 2011790 | 03SEP1996:00:00:00 | 
| 2008211 | 03SEP1996:10:36:43 | 
| 2013623 | 04SEP1996:09:12:21 | 
| 2013623 | 04SEP1996:09:12:24 | 
| 2013623 | 04SEP1996:09:12:28 | 
| 2013623 | 04SEP1996:09:12:32 | 
| 2012087 | 05SEP1996:13:08:07 | 
| 1259184 | 26SEP1996:00:00:00 | 
| 2012342 | 01OCT1996:00:00:00 | 
| 2064119 | 01OCT1996:16:50:51 | 
| 1216288 | 02OCT1996:00:00:00 | 
| 2008961 | 08OCT1996:11:25:31 | 
| 2008961 | 08OCT1996:11:25:35 | 
| 1220474 | 10OCT1996:00:00:00 | 
The above table should have the values as shown the following table;
| date | Unduplicated count | cumulative count | 
| Jul-95 | 3 | 3 | 
| Aug-95 | 2 | 5 | 
| Sep-95 | 0 | 5 | 
| Oct-95 | 1 | 6 | 
| Jul-96 | 1 | 1 | 
| Aug-96 | 2 | 3 | 
| Sep-96 | 4 | 7 | 
| Oct-96 | 4 | 11 | 
I will restate my problem.
First - I want to count each student only once in the month (first month) in a year. So if a student shows up in July and August and October of "One Year". I want to count that student only in July and not in August and October. Then, the next year, i want to do the same thing.
Second - I want to have a column with a running total (cumulative total) for the months in a year. I want the cumulative totals by year. So beginning of each year, the count starts from zero.
Thank you very much!
ananda
Thanks. So,
Are you stating-
1. Check only July, Aug and Oct in a calendar year? Not sure what mean here by missing September?
Also, The 3 students who showed up July1995 have also showed up in Aug1995. Wouldn't Aug1995 be zero?
| 2186282 | 18JUL1995:13:21:29 | 
| 2007173 | 19JUL1995:10:24:27 | 
| 2007623 | 24JUL1995:11:20:27 | 
| 2007173 | 10AUG1995:10:24:27 | 
| 2007623 | 16AUG1995:11:59:52 | 
| 2186282 | 25AUG1995:13:22:03 | 
Hi @anandas I believe your AUG95 results are incorrect perhaps because of a typo. I believe the following should do-
data have;
input id 	date :datetime20.;
format date datetime20.;
cards;
2186282	18JUL1995:13:21:29
2007173	19JUL1995:10:24:27
2007623	24JUL1995:11:20:27
2007173	10AUG1995:10:24:27
2007623	16AUG1995:11:59:52
2186282	25AUG1995:13:22:03
2117809	11OCT1995:13:31:30
2064119	26JUL1996:16:46:32
2064119	26JUL1996:16:46:40
2064119	19AUG1996:16:47:46
1220090	26AUG1996:00:00:00
2011790	27AUG1996:00:00:00
2011790	30AUG1996:00:00:00
2011790	03SEP1996:00:00:00
2008211	03SEP1996:10:36:43
2013623	04SEP1996:09:12:21
2013623	04SEP1996:09:12:24
2013623	04SEP1996:09:12:28
2013623	04SEP1996:09:12:32
2012087	05SEP1996:13:08:07
1259184	26SEP1996:00:00:00
2012342	01OCT1996:00:00:00
2064119	01OCT1996:16:50:51
1216288	02OCT1996:00:00:00
2008961	08OCT1996:11:25:31
2008961	08OCT1996:11:25:35
1220474	10OCT1996:00:00:00
;
/*_dt for creating groupformat*/
data temp;
 set have;
 _dt=date;
run;
data want;
 if _n_=1 then do;
  if 0 then set temp;/*not needed but only for variable arrangement by compiler*/
  dcl hash H () ;
  h.definekey  ("id") ;
  h.definedone () ;
 end;
 do until(last._dt);
  undup_count=0;
  do until(last.date);
   set temp;
   by  _dt date groupformat;
   format _dt dtyear4. date dtmonyy7. ;
   if h.check() ne 0 then do;
    undup_count=sum(undup_count,1);
    h.add();
   end;
  end;
  cumul_count=sum(cumul_count,undup_count);
  output;
 end;
 h.clear();
 drop _dt;
run;
proc print noobs;run;
| id | date | undup_count | cumul_count | 
|---|---|---|---|
| 2007623 | JUL1995 | 3 | 3 | 
| 2186282 | AUG1995 | 0 | 3 | 
| 2117809 | OCT1995 | 1 | 4 | 
| 2064119 | JUL1996 | 1 | 1 | 
| 2011790 | AUG1996 | 2 | 3 | 
| 1259184 | SEP1996 | 4 | 7 | 
| 1220474 | OCT1996 | 4 | 11 | 
Hi @anandas I have understood the question and have posted the solution. Please try and let me know. Thank you!
Hi again @anandas I have simplified my previous solution by eliminating the intermediate step in the below-
data have;
input id 	date :datetime20.;
format date datetime20.;
cards;
2186282	18JUL1995:13:21:29
2007173	19JUL1995:10:24:27
2007623	24JUL1995:11:20:27
2007173	10AUG1995:10:24:27
2007623	16AUG1995:11:59:52
2186282	25AUG1995:13:22:03
2117809	11OCT1995:13:31:30
2064119	26JUL1996:16:46:32
2064119	26JUL1996:16:46:40
2064119	19AUG1996:16:47:46
1220090	26AUG1996:00:00:00
2011790	27AUG1996:00:00:00
2011790	30AUG1996:00:00:00
2011790	03SEP1996:00:00:00
2008211	03SEP1996:10:36:43
2013623	04SEP1996:09:12:21
2013623	04SEP1996:09:12:24
2013623	04SEP1996:09:12:28
2013623	04SEP1996:09:12:32
2012087	05SEP1996:13:08:07
1259184	26SEP1996:00:00:00
2012342	01OCT1996:00:00:00
2064119	01OCT1996:16:50:51
1216288	02OCT1996:00:00:00
2008961	08OCT1996:11:25:31
2008961	08OCT1996:11:25:35
1220474	10OCT1996:00:00:00
;
data want;
 if _n_=1 then do;
  if 0 then set have;
  dcl hash H () ;
  h.definekey  ("id") ;
  h.definedone () ;
 end;
 undup_count=0;
 do until(last.date);
  set have;
  by  date groupformat;
  format date dtmonyy7. ;
  if put(date,dtyear4.-l) ne put(lag(date),dtyear4. -l) then do;
   cumul_count=0;
   h.clear();
  end;
  if h.check() ne 0 then do;
   undup_count=sum(undup_count,1);
   h.add();
  end;
 end;
 cumul_count+undup_count;
run;
proc print noobs;run;| id | date | undup_count | cumul_count | 
|---|---|---|---|
| 2007623 | JUL1995 | 3 | 3 | 
| 2186282 | AUG1995 | 0 | 3 | 
| 2117809 | OCT1995 | 1 | 4 | 
| 2064119 | JUL1996 | 1 | 1 | 
| 2011790 | AUG1996 | 2 | 3 | 
| 1259184 | SEP1996 | 4 | 7 | 
| 1220474 | OCT1996 | 4 | 11 | 
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
