BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
anandas
Obsidian | Level 7

Hello

 

I have a table like this;

DateStudent Name
8/25/2020Jane
8/25/2020Andy
8/25/2020Mary
8/26/2020Sue
8/26/2020Nate
8/26/2020Andy
8/26/2020Joe
8/27/2020Jane
8/27/2020Kathy

 

I want the table to look like this. As you can see Andy is in Aug 25 and Aug 26 so I want him counted only in the first date. Basically i want an unduplicated count by date and then cumulatively add the count by date. 

Dateunduplicated countcumulative count
8/25/202033
8/26/202036
8/27/202017

 

I am not sure how to get what i want with either proc sql or sas data step.

 

Thanks

ananda

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

Hi again @anandas  I have simplified my previous solution by eliminating the intermediate step in the below-

 


data have;
input id 	date :datetime20.;
format date datetime20.;
cards;
2186282	18JUL1995:13:21:29
2007173	19JUL1995:10:24:27
2007623	24JUL1995:11:20:27
2007173	10AUG1995:10:24:27
2007623	16AUG1995:11:59:52
2186282	25AUG1995:13:22:03
2117809	11OCT1995:13:31:30
2064119	26JUL1996:16:46:32
2064119	26JUL1996:16:46:40
2064119	19AUG1996:16:47:46
1220090	26AUG1996:00:00:00
2011790	27AUG1996:00:00:00
2011790	30AUG1996:00:00:00
2011790	03SEP1996:00:00:00
2008211	03SEP1996:10:36:43
2013623	04SEP1996:09:12:21
2013623	04SEP1996:09:12:24
2013623	04SEP1996:09:12:28
2013623	04SEP1996:09:12:32
2012087	05SEP1996:13:08:07
1259184	26SEP1996:00:00:00
2012342	01OCT1996:00:00:00
2064119	01OCT1996:16:50:51
1216288	02OCT1996:00:00:00
2008961	08OCT1996:11:25:31
2008961	08OCT1996:11:25:35
1220474	10OCT1996:00:00:00
;

data want;
 if _n_=1 then do;
  if 0 then set have;
  dcl hash H () ;
  h.definekey  ("id") ;
  h.definedone () ;
 end;
 undup_count=0;
 do until(last.date);
  set have;
  by  date groupformat;
  format date dtmonyy7. ;
  if put(date,dtyear4.-l) ne put(lag(date),dtyear4. -l) then do;
   cumul_count=0;
   h.clear();
  end;
  if h.check() ne 0 then do;
   undup_count=sum(undup_count,1);
   h.add();
  end;
 end;
 cumul_count+undup_count;
run;

proc print noobs;run;
id date undup_count cumul_count
2007623 JUL1995 3 3
2186282 AUG1995 0 3
2117809 OCT1995 1 4
2064119 JUL1996 1 1
2011790 AUG1996 2 3
1259184 SEP1996 4 7
1220474 OCT1996 4 11

 

View solution in original post

10 REPLIES 10
novinosrin
Tourmaline | Level 20

Hi @anandas  If you are comfortable with Hashes, Here is an easy solution

 


data have;
input Date :mmddyy10.	Student_Name $;
format date mmddyy10.;
cards;
8/25/2020	Jane
8/25/2020	Andy
8/25/2020	Mary
8/26/2020	Sue
8/26/2020	Nate
8/26/2020	Andy
8/26/2020	Joe
8/27/2020	Jane
8/27/2020	Kathy
;

data want;
 if _n_=1 then do;
  dcl hash H () ;
  h.definekey  ("student_name") ;
  h.definedone () ;
 end;
 do until(last.date);
  set have;
  by date;
  if h.check() ne 0 then do;
   undup_count=sum(undup_count,1);
   cumul_count+1;
   h.add();
  end;
 end;
 drop student_name;
run;
anandas
Obsidian | Level 7
Thank you for the quick response. Can you also give me the code if instead of student name, i have a numeric variable such as an id? Thanks.
novinosrin
Tourmaline | Level 20

Well sounds like all you need to do is replace the student_name with ID .

 

Hmm why let me assume what's in your data. Can you plz create and provide a "mock" sample that is a good representative of your original data. You are likely to get the most appropriate and closest solution if you could do that plz

anandas
Obsidian | Level 7

Thank you.

 

Below is the sample data table.

iddate
218628218JUL1995:13:21:29
200717319JUL1995:10:24:27
200762324JUL1995:11:20:27
200717310AUG1995:10:24:27
200762316AUG1995:11:59:52
218628225AUG1995:13:22:03
211780911OCT1995:13:31:30
206411926JUL1996:16:46:32
206411926JUL1996:16:46:40
206411919AUG1996:16:47:46
122009026AUG1996:00:00:00
201179027AUG1996:00:00:00
201179030AUG1996:00:00:00
201179003SEP1996:00:00:00
200821103SEP1996:10:36:43
201362304SEP1996:09:12:21
201362304SEP1996:09:12:24
201362304SEP1996:09:12:28
201362304SEP1996:09:12:32
201208705SEP1996:13:08:07
125918426SEP1996:00:00:00
201234201OCT1996:00:00:00
206411901OCT1996:16:50:51
121628802OCT1996:00:00:00
200896108OCT1996:11:25:31
200896108OCT1996:11:25:35
122047410OCT1996:00:00:00

 

The above table should have the values as shown the following table;

dateUnduplicated countcumulative count
Jul-9533
Aug-9525
Sep-9505
Oct-9516
Jul-9611
Aug-9623
Sep-9647
Oct-96411

 

I will restate my problem. 

First - I want to count each student only once in the month (first month) in a year. So if a student shows up in July and August and October of "One Year". I want to count that student only in July and not in August and October. Then, the next year, i want to do the same thing.

Second - I want to have a column with a running total (cumulative total) for the months in a year. I want the cumulative totals  by year. So beginning of each year, the count starts from zero. 

 

Thank you very much!

ananda

novinosrin
Tourmaline | Level 20

Thanks. So,

Are you stating-

1. Check only July, Aug and Oct in a calendar year? Not sure what mean here by missing September?

 

Also, The 3 students who showed up July1995 have also showed up in Aug1995. Wouldn't Aug1995 be zero?

2186282 18JUL1995:13:21:29
2007173 19JUL1995:10:24:27
2007623 24JUL1995:11:20:27
2007173 10AUG1995:10:24:27
2007623 16AUG1995:11:59:52
2186282 25AUG1995:13:22:03

 

anandas
Obsidian | Level 7
Check only July, Aug and Oct in a calendar year? Not sure what mean here by missing September?
If a student registers in July, August and October, I want to count that student only once in July and NOT in August and October. If there are no students at all registered in September, then i want September to show as zero students.

Regarding your second question, YES - you are correct. My mistake - I manually was doing this so made a mistake. In this example, there should be zero students in August since they already are showing up in July.

Thank you.

novinosrin
Tourmaline | Level 20

Hi @anandas  I believe your AUG95 results are incorrect perhaps because of a typo. I believe the following should do-


data have;
input id 	date :datetime20.;
format date datetime20.;
cards;
2186282	18JUL1995:13:21:29
2007173	19JUL1995:10:24:27
2007623	24JUL1995:11:20:27
2007173	10AUG1995:10:24:27
2007623	16AUG1995:11:59:52
2186282	25AUG1995:13:22:03
2117809	11OCT1995:13:31:30
2064119	26JUL1996:16:46:32
2064119	26JUL1996:16:46:40
2064119	19AUG1996:16:47:46
1220090	26AUG1996:00:00:00
2011790	27AUG1996:00:00:00
2011790	30AUG1996:00:00:00
2011790	03SEP1996:00:00:00
2008211	03SEP1996:10:36:43
2013623	04SEP1996:09:12:21
2013623	04SEP1996:09:12:24
2013623	04SEP1996:09:12:28
2013623	04SEP1996:09:12:32
2012087	05SEP1996:13:08:07
1259184	26SEP1996:00:00:00
2012342	01OCT1996:00:00:00
2064119	01OCT1996:16:50:51
1216288	02OCT1996:00:00:00
2008961	08OCT1996:11:25:31
2008961	08OCT1996:11:25:35
1220474	10OCT1996:00:00:00
;

/*_dt for creating groupformat*/
data temp;
 set have;
 _dt=date;
run;
data want;
 if _n_=1 then do;
  if 0 then set temp;/*not needed but only for variable arrangement by compiler*/
  dcl hash H () ;
  h.definekey  ("id") ;
  h.definedone () ;
 end;
 do until(last._dt);
  undup_count=0;
  do until(last.date);
   set temp;
   by  _dt date groupformat;
   format _dt dtyear4. date dtmonyy7. ;
   if h.check() ne 0 then do;
    undup_count=sum(undup_count,1);
    h.add();
   end;
  end;
  cumul_count=sum(cumul_count,undup_count);
  output;
 end;
 h.clear();
 drop _dt;
run;

proc print noobs;run;

id date undup_count cumul_count
2007623 JUL1995 3 3
2186282 AUG1995 0 3
2117809 OCT1995 1 4
2064119 JUL1996 1 1
2011790 AUG1996 2 3
1259184 SEP1996 4 7
1220474 OCT1996 4 11

 

novinosrin
Tourmaline | Level 20

Hi @anandas  I have understood the question and have posted the solution. Please try and let me know. Thank you!

novinosrin
Tourmaline | Level 20

Hi again @anandas  I have simplified my previous solution by eliminating the intermediate step in the below-

 


data have;
input id 	date :datetime20.;
format date datetime20.;
cards;
2186282	18JUL1995:13:21:29
2007173	19JUL1995:10:24:27
2007623	24JUL1995:11:20:27
2007173	10AUG1995:10:24:27
2007623	16AUG1995:11:59:52
2186282	25AUG1995:13:22:03
2117809	11OCT1995:13:31:30
2064119	26JUL1996:16:46:32
2064119	26JUL1996:16:46:40
2064119	19AUG1996:16:47:46
1220090	26AUG1996:00:00:00
2011790	27AUG1996:00:00:00
2011790	30AUG1996:00:00:00
2011790	03SEP1996:00:00:00
2008211	03SEP1996:10:36:43
2013623	04SEP1996:09:12:21
2013623	04SEP1996:09:12:24
2013623	04SEP1996:09:12:28
2013623	04SEP1996:09:12:32
2012087	05SEP1996:13:08:07
1259184	26SEP1996:00:00:00
2012342	01OCT1996:00:00:00
2064119	01OCT1996:16:50:51
1216288	02OCT1996:00:00:00
2008961	08OCT1996:11:25:31
2008961	08OCT1996:11:25:35
1220474	10OCT1996:00:00:00
;

data want;
 if _n_=1 then do;
  if 0 then set have;
  dcl hash H () ;
  h.definekey  ("id") ;
  h.definedone () ;
 end;
 undup_count=0;
 do until(last.date);
  set have;
  by  date groupformat;
  format date dtmonyy7. ;
  if put(date,dtyear4.-l) ne put(lag(date),dtyear4. -l) then do;
   cumul_count=0;
   h.clear();
  end;
  if h.check() ne 0 then do;
   undup_count=sum(undup_count,1);
   h.add();
  end;
 end;
 cumul_count+undup_count;
run;

proc print noobs;run;
id date undup_count cumul_count
2007623 JUL1995 3 3
2186282 AUG1995 0 3
2117809 OCT1995 1 4
2064119 JUL1996 1 1
2011790 AUG1996 2 3
1259184 SEP1996 4 7
1220474 OCT1996 4 11

 

anandas
Obsidian | Level 7
Thank you very much! This really helps!
anand

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 1705 views
  • 2 likes
  • 2 in conversation