Hello
I have a table like this;
Date | Student Name |
8/25/2020 | Jane |
8/25/2020 | Andy |
8/25/2020 | Mary |
8/26/2020 | Sue |
8/26/2020 | Nate |
8/26/2020 | Andy |
8/26/2020 | Joe |
8/27/2020 | Jane |
8/27/2020 | Kathy |
I want the table to look like this. As you can see Andy is in Aug 25 and Aug 26 so I want him counted only in the first date. Basically i want an unduplicated count by date and then cumulatively add the count by date.
Date | unduplicated count | cumulative count |
8/25/2020 | 3 | 3 |
8/26/2020 | 3 | 6 |
8/27/2020 | 1 | 7 |
I am not sure how to get what i want with either proc sql or sas data step.
Thanks
ananda
Hi again @anandas I have simplified my previous solution by eliminating the intermediate step in the below-
data have;
input id date :datetime20.;
format date datetime20.;
cards;
2186282 18JUL1995:13:21:29
2007173 19JUL1995:10:24:27
2007623 24JUL1995:11:20:27
2007173 10AUG1995:10:24:27
2007623 16AUG1995:11:59:52
2186282 25AUG1995:13:22:03
2117809 11OCT1995:13:31:30
2064119 26JUL1996:16:46:32
2064119 26JUL1996:16:46:40
2064119 19AUG1996:16:47:46
1220090 26AUG1996:00:00:00
2011790 27AUG1996:00:00:00
2011790 30AUG1996:00:00:00
2011790 03SEP1996:00:00:00
2008211 03SEP1996:10:36:43
2013623 04SEP1996:09:12:21
2013623 04SEP1996:09:12:24
2013623 04SEP1996:09:12:28
2013623 04SEP1996:09:12:32
2012087 05SEP1996:13:08:07
1259184 26SEP1996:00:00:00
2012342 01OCT1996:00:00:00
2064119 01OCT1996:16:50:51
1216288 02OCT1996:00:00:00
2008961 08OCT1996:11:25:31
2008961 08OCT1996:11:25:35
1220474 10OCT1996:00:00:00
;
data want;
if _n_=1 then do;
if 0 then set have;
dcl hash H () ;
h.definekey ("id") ;
h.definedone () ;
end;
undup_count=0;
do until(last.date);
set have;
by date groupformat;
format date dtmonyy7. ;
if put(date,dtyear4.-l) ne put(lag(date),dtyear4. -l) then do;
cumul_count=0;
h.clear();
end;
if h.check() ne 0 then do;
undup_count=sum(undup_count,1);
h.add();
end;
end;
cumul_count+undup_count;
run;
proc print noobs;run;
id | date | undup_count | cumul_count |
---|---|---|---|
2007623 | JUL1995 | 3 | 3 |
2186282 | AUG1995 | 0 | 3 |
2117809 | OCT1995 | 1 | 4 |
2064119 | JUL1996 | 1 | 1 |
2011790 | AUG1996 | 2 | 3 |
1259184 | SEP1996 | 4 | 7 |
1220474 | OCT1996 | 4 | 11 |
Hi @anandas If you are comfortable with Hashes, Here is an easy solution
data have;
input Date :mmddyy10. Student_Name $;
format date mmddyy10.;
cards;
8/25/2020 Jane
8/25/2020 Andy
8/25/2020 Mary
8/26/2020 Sue
8/26/2020 Nate
8/26/2020 Andy
8/26/2020 Joe
8/27/2020 Jane
8/27/2020 Kathy
;
data want;
if _n_=1 then do;
dcl hash H () ;
h.definekey ("student_name") ;
h.definedone () ;
end;
do until(last.date);
set have;
by date;
if h.check() ne 0 then do;
undup_count=sum(undup_count,1);
cumul_count+1;
h.add();
end;
end;
drop student_name;
run;
Well sounds like all you need to do is replace the student_name with ID .
Hmm why let me assume what's in your data. Can you plz create and provide a "mock" sample that is a good representative of your original data. You are likely to get the most appropriate and closest solution if you could do that plz
Thank you.
Below is the sample data table.
id | date |
2186282 | 18JUL1995:13:21:29 |
2007173 | 19JUL1995:10:24:27 |
2007623 | 24JUL1995:11:20:27 |
2007173 | 10AUG1995:10:24:27 |
2007623 | 16AUG1995:11:59:52 |
2186282 | 25AUG1995:13:22:03 |
2117809 | 11OCT1995:13:31:30 |
2064119 | 26JUL1996:16:46:32 |
2064119 | 26JUL1996:16:46:40 |
2064119 | 19AUG1996:16:47:46 |
1220090 | 26AUG1996:00:00:00 |
2011790 | 27AUG1996:00:00:00 |
2011790 | 30AUG1996:00:00:00 |
2011790 | 03SEP1996:00:00:00 |
2008211 | 03SEP1996:10:36:43 |
2013623 | 04SEP1996:09:12:21 |
2013623 | 04SEP1996:09:12:24 |
2013623 | 04SEP1996:09:12:28 |
2013623 | 04SEP1996:09:12:32 |
2012087 | 05SEP1996:13:08:07 |
1259184 | 26SEP1996:00:00:00 |
2012342 | 01OCT1996:00:00:00 |
2064119 | 01OCT1996:16:50:51 |
1216288 | 02OCT1996:00:00:00 |
2008961 | 08OCT1996:11:25:31 |
2008961 | 08OCT1996:11:25:35 |
1220474 | 10OCT1996:00:00:00 |
The above table should have the values as shown the following table;
date | Unduplicated count | cumulative count |
Jul-95 | 3 | 3 |
Aug-95 | 2 | 5 |
Sep-95 | 0 | 5 |
Oct-95 | 1 | 6 |
Jul-96 | 1 | 1 |
Aug-96 | 2 | 3 |
Sep-96 | 4 | 7 |
Oct-96 | 4 | 11 |
I will restate my problem.
First - I want to count each student only once in the month (first month) in a year. So if a student shows up in July and August and October of "One Year". I want to count that student only in July and not in August and October. Then, the next year, i want to do the same thing.
Second - I want to have a column with a running total (cumulative total) for the months in a year. I want the cumulative totals by year. So beginning of each year, the count starts from zero.
Thank you very much!
ananda
Thanks. So,
Are you stating-
1. Check only July, Aug and Oct in a calendar year? Not sure what mean here by missing September?
Also, The 3 students who showed up July1995 have also showed up in Aug1995. Wouldn't Aug1995 be zero?
2186282 | 18JUL1995:13:21:29 |
2007173 | 19JUL1995:10:24:27 |
2007623 | 24JUL1995:11:20:27 |
2007173 | 10AUG1995:10:24:27 |
2007623 | 16AUG1995:11:59:52 |
2186282 | 25AUG1995:13:22:03 |
Hi @anandas I believe your AUG95 results are incorrect perhaps because of a typo. I believe the following should do-
data have;
input id date :datetime20.;
format date datetime20.;
cards;
2186282 18JUL1995:13:21:29
2007173 19JUL1995:10:24:27
2007623 24JUL1995:11:20:27
2007173 10AUG1995:10:24:27
2007623 16AUG1995:11:59:52
2186282 25AUG1995:13:22:03
2117809 11OCT1995:13:31:30
2064119 26JUL1996:16:46:32
2064119 26JUL1996:16:46:40
2064119 19AUG1996:16:47:46
1220090 26AUG1996:00:00:00
2011790 27AUG1996:00:00:00
2011790 30AUG1996:00:00:00
2011790 03SEP1996:00:00:00
2008211 03SEP1996:10:36:43
2013623 04SEP1996:09:12:21
2013623 04SEP1996:09:12:24
2013623 04SEP1996:09:12:28
2013623 04SEP1996:09:12:32
2012087 05SEP1996:13:08:07
1259184 26SEP1996:00:00:00
2012342 01OCT1996:00:00:00
2064119 01OCT1996:16:50:51
1216288 02OCT1996:00:00:00
2008961 08OCT1996:11:25:31
2008961 08OCT1996:11:25:35
1220474 10OCT1996:00:00:00
;
/*_dt for creating groupformat*/
data temp;
set have;
_dt=date;
run;
data want;
if _n_=1 then do;
if 0 then set temp;/*not needed but only for variable arrangement by compiler*/
dcl hash H () ;
h.definekey ("id") ;
h.definedone () ;
end;
do until(last._dt);
undup_count=0;
do until(last.date);
set temp;
by _dt date groupformat;
format _dt dtyear4. date dtmonyy7. ;
if h.check() ne 0 then do;
undup_count=sum(undup_count,1);
h.add();
end;
end;
cumul_count=sum(cumul_count,undup_count);
output;
end;
h.clear();
drop _dt;
run;
proc print noobs;run;
id | date | undup_count | cumul_count |
---|---|---|---|
2007623 | JUL1995 | 3 | 3 |
2186282 | AUG1995 | 0 | 3 |
2117809 | OCT1995 | 1 | 4 |
2064119 | JUL1996 | 1 | 1 |
2011790 | AUG1996 | 2 | 3 |
1259184 | SEP1996 | 4 | 7 |
1220474 | OCT1996 | 4 | 11 |
Hi @anandas I have understood the question and have posted the solution. Please try and let me know. Thank you!
Hi again @anandas I have simplified my previous solution by eliminating the intermediate step in the below-
data have;
input id date :datetime20.;
format date datetime20.;
cards;
2186282 18JUL1995:13:21:29
2007173 19JUL1995:10:24:27
2007623 24JUL1995:11:20:27
2007173 10AUG1995:10:24:27
2007623 16AUG1995:11:59:52
2186282 25AUG1995:13:22:03
2117809 11OCT1995:13:31:30
2064119 26JUL1996:16:46:32
2064119 26JUL1996:16:46:40
2064119 19AUG1996:16:47:46
1220090 26AUG1996:00:00:00
2011790 27AUG1996:00:00:00
2011790 30AUG1996:00:00:00
2011790 03SEP1996:00:00:00
2008211 03SEP1996:10:36:43
2013623 04SEP1996:09:12:21
2013623 04SEP1996:09:12:24
2013623 04SEP1996:09:12:28
2013623 04SEP1996:09:12:32
2012087 05SEP1996:13:08:07
1259184 26SEP1996:00:00:00
2012342 01OCT1996:00:00:00
2064119 01OCT1996:16:50:51
1216288 02OCT1996:00:00:00
2008961 08OCT1996:11:25:31
2008961 08OCT1996:11:25:35
1220474 10OCT1996:00:00:00
;
data want;
if _n_=1 then do;
if 0 then set have;
dcl hash H () ;
h.definekey ("id") ;
h.definedone () ;
end;
undup_count=0;
do until(last.date);
set have;
by date groupformat;
format date dtmonyy7. ;
if put(date,dtyear4.-l) ne put(lag(date),dtyear4. -l) then do;
cumul_count=0;
h.clear();
end;
if h.check() ne 0 then do;
undup_count=sum(undup_count,1);
h.add();
end;
end;
cumul_count+undup_count;
run;
proc print noobs;run;
id | date | undup_count | cumul_count |
---|---|---|---|
2007623 | JUL1995 | 3 | 3 |
2186282 | AUG1995 | 0 | 3 |
2117809 | OCT1995 | 1 | 4 |
2064119 | JUL1996 | 1 | 1 |
2011790 | AUG1996 | 2 | 3 |
1259184 | SEP1996 | 4 | 7 |
1220474 | OCT1996 | 4 | 11 |
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.