BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
anandas
Obsidian | Level 7

Hello

 

I have a table like this;

DateStudent Name
8/25/2020Jane
8/25/2020Andy
8/25/2020Mary
8/26/2020Sue
8/26/2020Nate
8/26/2020Andy
8/26/2020Joe
8/27/2020Jane
8/27/2020Kathy

 

I want the table to look like this. As you can see Andy is in Aug 25 and Aug 26 so I want him counted only in the first date. Basically i want an unduplicated count by date and then cumulatively add the count by date. 

Dateunduplicated countcumulative count
8/25/202033
8/26/202036
8/27/202017

 

I am not sure how to get what i want with either proc sql or sas data step.

 

Thanks

ananda

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

Hi again @anandas  I have simplified my previous solution by eliminating the intermediate step in the below-

 


data have;
input id 	date :datetime20.;
format date datetime20.;
cards;
2186282	18JUL1995:13:21:29
2007173	19JUL1995:10:24:27
2007623	24JUL1995:11:20:27
2007173	10AUG1995:10:24:27
2007623	16AUG1995:11:59:52
2186282	25AUG1995:13:22:03
2117809	11OCT1995:13:31:30
2064119	26JUL1996:16:46:32
2064119	26JUL1996:16:46:40
2064119	19AUG1996:16:47:46
1220090	26AUG1996:00:00:00
2011790	27AUG1996:00:00:00
2011790	30AUG1996:00:00:00
2011790	03SEP1996:00:00:00
2008211	03SEP1996:10:36:43
2013623	04SEP1996:09:12:21
2013623	04SEP1996:09:12:24
2013623	04SEP1996:09:12:28
2013623	04SEP1996:09:12:32
2012087	05SEP1996:13:08:07
1259184	26SEP1996:00:00:00
2012342	01OCT1996:00:00:00
2064119	01OCT1996:16:50:51
1216288	02OCT1996:00:00:00
2008961	08OCT1996:11:25:31
2008961	08OCT1996:11:25:35
1220474	10OCT1996:00:00:00
;

data want;
 if _n_=1 then do;
  if 0 then set have;
  dcl hash H () ;
  h.definekey  ("id") ;
  h.definedone () ;
 end;
 undup_count=0;
 do until(last.date);
  set have;
  by  date groupformat;
  format date dtmonyy7. ;
  if put(date,dtyear4.-l) ne put(lag(date),dtyear4. -l) then do;
   cumul_count=0;
   h.clear();
  end;
  if h.check() ne 0 then do;
   undup_count=sum(undup_count,1);
   h.add();
  end;
 end;
 cumul_count+undup_count;
run;

proc print noobs;run;
id date undup_count cumul_count
2007623 JUL1995 3 3
2186282 AUG1995 0 3
2117809 OCT1995 1 4
2064119 JUL1996 1 1
2011790 AUG1996 2 3
1259184 SEP1996 4 7
1220474 OCT1996 4 11

 

View solution in original post

10 REPLIES 10
novinosrin
Tourmaline | Level 20

Hi @anandas  If you are comfortable with Hashes, Here is an easy solution

 


data have;
input Date :mmddyy10.	Student_Name $;
format date mmddyy10.;
cards;
8/25/2020	Jane
8/25/2020	Andy
8/25/2020	Mary
8/26/2020	Sue
8/26/2020	Nate
8/26/2020	Andy
8/26/2020	Joe
8/27/2020	Jane
8/27/2020	Kathy
;

data want;
 if _n_=1 then do;
  dcl hash H () ;
  h.definekey  ("student_name") ;
  h.definedone () ;
 end;
 do until(last.date);
  set have;
  by date;
  if h.check() ne 0 then do;
   undup_count=sum(undup_count,1);
   cumul_count+1;
   h.add();
  end;
 end;
 drop student_name;
run;
anandas
Obsidian | Level 7
Thank you for the quick response. Can you also give me the code if instead of student name, i have a numeric variable such as an id? Thanks.
novinosrin
Tourmaline | Level 20

Well sounds like all you need to do is replace the student_name with ID .

 

Hmm why let me assume what's in your data. Can you plz create and provide a "mock" sample that is a good representative of your original data. You are likely to get the most appropriate and closest solution if you could do that plz

anandas
Obsidian | Level 7

Thank you.

 

Below is the sample data table.

iddate
218628218JUL1995:13:21:29
200717319JUL1995:10:24:27
200762324JUL1995:11:20:27
200717310AUG1995:10:24:27
200762316AUG1995:11:59:52
218628225AUG1995:13:22:03
211780911OCT1995:13:31:30
206411926JUL1996:16:46:32
206411926JUL1996:16:46:40
206411919AUG1996:16:47:46
122009026AUG1996:00:00:00
201179027AUG1996:00:00:00
201179030AUG1996:00:00:00
201179003SEP1996:00:00:00
200821103SEP1996:10:36:43
201362304SEP1996:09:12:21
201362304SEP1996:09:12:24
201362304SEP1996:09:12:28
201362304SEP1996:09:12:32
201208705SEP1996:13:08:07
125918426SEP1996:00:00:00
201234201OCT1996:00:00:00
206411901OCT1996:16:50:51
121628802OCT1996:00:00:00
200896108OCT1996:11:25:31
200896108OCT1996:11:25:35
122047410OCT1996:00:00:00

 

The above table should have the values as shown the following table;

dateUnduplicated countcumulative count
Jul-9533
Aug-9525
Sep-9505
Oct-9516
Jul-9611
Aug-9623
Sep-9647
Oct-96411

 

I will restate my problem. 

First - I want to count each student only once in the month (first month) in a year. So if a student shows up in July and August and October of "One Year". I want to count that student only in July and not in August and October. Then, the next year, i want to do the same thing.

Second - I want to have a column with a running total (cumulative total) for the months in a year. I want the cumulative totals  by year. So beginning of each year, the count starts from zero. 

 

Thank you very much!

ananda

novinosrin
Tourmaline | Level 20

Thanks. So,

Are you stating-

1. Check only July, Aug and Oct in a calendar year? Not sure what mean here by missing September?

 

Also, The 3 students who showed up July1995 have also showed up in Aug1995. Wouldn't Aug1995 be zero?

2186282 18JUL1995:13:21:29
2007173 19JUL1995:10:24:27
2007623 24JUL1995:11:20:27
2007173 10AUG1995:10:24:27
2007623 16AUG1995:11:59:52
2186282 25AUG1995:13:22:03

 

anandas
Obsidian | Level 7
Check only July, Aug and Oct in a calendar year? Not sure what mean here by missing September?
If a student registers in July, August and October, I want to count that student only once in July and NOT in August and October. If there are no students at all registered in September, then i want September to show as zero students.

Regarding your second question, YES - you are correct. My mistake - I manually was doing this so made a mistake. In this example, there should be zero students in August since they already are showing up in July.

Thank you.

novinosrin
Tourmaline | Level 20

Hi @anandas  I believe your AUG95 results are incorrect perhaps because of a typo. I believe the following should do-


data have;
input id 	date :datetime20.;
format date datetime20.;
cards;
2186282	18JUL1995:13:21:29
2007173	19JUL1995:10:24:27
2007623	24JUL1995:11:20:27
2007173	10AUG1995:10:24:27
2007623	16AUG1995:11:59:52
2186282	25AUG1995:13:22:03
2117809	11OCT1995:13:31:30
2064119	26JUL1996:16:46:32
2064119	26JUL1996:16:46:40
2064119	19AUG1996:16:47:46
1220090	26AUG1996:00:00:00
2011790	27AUG1996:00:00:00
2011790	30AUG1996:00:00:00
2011790	03SEP1996:00:00:00
2008211	03SEP1996:10:36:43
2013623	04SEP1996:09:12:21
2013623	04SEP1996:09:12:24
2013623	04SEP1996:09:12:28
2013623	04SEP1996:09:12:32
2012087	05SEP1996:13:08:07
1259184	26SEP1996:00:00:00
2012342	01OCT1996:00:00:00
2064119	01OCT1996:16:50:51
1216288	02OCT1996:00:00:00
2008961	08OCT1996:11:25:31
2008961	08OCT1996:11:25:35
1220474	10OCT1996:00:00:00
;

/*_dt for creating groupformat*/
data temp;
 set have;
 _dt=date;
run;
data want;
 if _n_=1 then do;
  if 0 then set temp;/*not needed but only for variable arrangement by compiler*/
  dcl hash H () ;
  h.definekey  ("id") ;
  h.definedone () ;
 end;
 do until(last._dt);
  undup_count=0;
  do until(last.date);
   set temp;
   by  _dt date groupformat;
   format _dt dtyear4. date dtmonyy7. ;
   if h.check() ne 0 then do;
    undup_count=sum(undup_count,1);
    h.add();
   end;
  end;
  cumul_count=sum(cumul_count,undup_count);
  output;
 end;
 h.clear();
 drop _dt;
run;

proc print noobs;run;

id date undup_count cumul_count
2007623 JUL1995 3 3
2186282 AUG1995 0 3
2117809 OCT1995 1 4
2064119 JUL1996 1 1
2011790 AUG1996 2 3
1259184 SEP1996 4 7
1220474 OCT1996 4 11

 

novinosrin
Tourmaline | Level 20

Hi @anandas  I have understood the question and have posted the solution. Please try and let me know. Thank you!

novinosrin
Tourmaline | Level 20

Hi again @anandas  I have simplified my previous solution by eliminating the intermediate step in the below-

 


data have;
input id 	date :datetime20.;
format date datetime20.;
cards;
2186282	18JUL1995:13:21:29
2007173	19JUL1995:10:24:27
2007623	24JUL1995:11:20:27
2007173	10AUG1995:10:24:27
2007623	16AUG1995:11:59:52
2186282	25AUG1995:13:22:03
2117809	11OCT1995:13:31:30
2064119	26JUL1996:16:46:32
2064119	26JUL1996:16:46:40
2064119	19AUG1996:16:47:46
1220090	26AUG1996:00:00:00
2011790	27AUG1996:00:00:00
2011790	30AUG1996:00:00:00
2011790	03SEP1996:00:00:00
2008211	03SEP1996:10:36:43
2013623	04SEP1996:09:12:21
2013623	04SEP1996:09:12:24
2013623	04SEP1996:09:12:28
2013623	04SEP1996:09:12:32
2012087	05SEP1996:13:08:07
1259184	26SEP1996:00:00:00
2012342	01OCT1996:00:00:00
2064119	01OCT1996:16:50:51
1216288	02OCT1996:00:00:00
2008961	08OCT1996:11:25:31
2008961	08OCT1996:11:25:35
1220474	10OCT1996:00:00:00
;

data want;
 if _n_=1 then do;
  if 0 then set have;
  dcl hash H () ;
  h.definekey  ("id") ;
  h.definedone () ;
 end;
 undup_count=0;
 do until(last.date);
  set have;
  by  date groupformat;
  format date dtmonyy7. ;
  if put(date,dtyear4.-l) ne put(lag(date),dtyear4. -l) then do;
   cumul_count=0;
   h.clear();
  end;
  if h.check() ne 0 then do;
   undup_count=sum(undup_count,1);
   h.add();
  end;
 end;
 cumul_count+undup_count;
run;

proc print noobs;run;
id date undup_count cumul_count
2007623 JUL1995 3 3
2186282 AUG1995 0 3
2117809 OCT1995 1 4
2064119 JUL1996 1 1
2011790 AUG1996 2 3
1259184 SEP1996 4 7
1220474 OCT1996 4 11

 

anandas
Obsidian | Level 7
Thank you very much! This really helps!
anand
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 3352 views
  • 2 likes
  • 2 in conversation