- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello folks, I need to compare multiple list of IDs by dates and record distinct count of IDs in proc sql union function. Below are my codes:
proc sql; select count(distinct ID) as cnt from( select ID from test(where=(DATE="2019/10")) UNION select ID from test(where=(DATE="2019/11")) UNION select ID from test(where=(DATE="2019/12")) UNION select ID from test(where=(DATE="2020/01")) .../*with more dates*/ ); quit;
Basically, I only need to count distinct IDs every time I "union" another date if that makes sense. Currently, my code only spits out the total distinct count for all the dates I am combining, for instance, after running the code above, SAS will have a single number in the output. But I want to have the distinct count each time I am adding in a date, for example, starting with the distinct count of IDs when date = 2019/09, I will have another count when date=2019/10 was added in, another count when date=2019/11 added in... which goes on until my last date value=2020/05. Is there a solution for my question? Thanks so much!
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I fixed the issue, there should have a parenthesis after clear. Now the following codes seem to work perfectly:
proc sort data=want1; by VAR1 DATE; run;
data TEST;
if _N_=1 then do;
dcl hash H(dataset:'want1(obs=0)');
H.definekey('ID');
H.definedone();
end;
set want1;
by VAR1 DATE;
if first.VAR1 then H.clear();
if first.DATE then sum=0;
if H.check() then do;
sum + 1;
H.add();
end;
if last.DATE then output;
keep VAR1 DATE sum;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I don't think it is quite clear how you expect the result to look.
Can you show us an example of how the result looks? Best is a small example of the starting data and then the result.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi thank you and this's a great question, and I don't have a very clear sense as what SAS's output would look like. I am expecting to have SAS record the distinct count of IDs in a time-series fashion. One key information I forgot to mention is that the IDs by dates are accumulative, meaning at the starting point when date=2019/10, the IDs will all be included when the date moves onto the next month 2019/11. For example, if there are 20 distinct IDs when date=2019/10, SAS can create a variable that captures this number when date=2019/10; and when date=2019/11, there are 15 distinct IDs, but 13 of the 15 IDs are not distinct from the IDs when date=2019/10, only 2 IDs are newly added, so I want SAS to capture 17 as the total distinct count of IDs when date=2019/11. Hoping i am explaining this clearly...
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Like this?
select DATE, count(distinct ID) as CNT from test where DATE in("2019/10","2019/11","2019/12") group by DATE
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for your reply. Your method will give me the distinct count of IDs by date but this is not what I want exactly. What I want is the distinct count of IDs accumulatively if that makes sense. So whatever the IDs have been counted in a previous dates will be counted ONLY once and only the newly added IDs by dates will be added to the count. This requires a comparison of IDs between two adjacent dates I believe. Let me know whether I explain this clearly.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I found an old discussion thread @ChrisNZ you was involved which answers my question perfectly!!!
if you don't mind, may I ask how to set up a looping macro based on following your code? Say if I have to run your following code based on another variable (VAR1) in the HAVE dataset. Basically I need to run your following code multiple times when VAR1=1,2,3,4,5...
Is there a way to do that?
data WANT;
if _N_=1 then do;
dcl hash H(dataset:'HAVE(obs=0)');
H.definekey('ID');
H.definedone();
end;
set HAVE;
by DATE;
if first.DATE then SUM=0;
if H.check() then do;
SUM+1;
H.add();
end;
if last.DATE then output;
keep DATE SUM;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This, maybe?
data WANT;
if _N_=1 then do;
dcl hash H(dataset:'HAVE(obs=0)');
H.definekey('ID');
H.definedone();
end;
set HAVE;
by VAR1 DATE;
if first.VAR1 then H.clear;
if first.DATE then SUM=0;
if H.check() then do;
SUM+1;
H.add();
end;
if last.DATE then output;
keep VAR1 DATE SUM;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for the reply! I ran the code but SAS spit out error messages, any ideas?
669 if first.VAR1 then H.clear;
--------
559 22
76
ERROR 559-185: Invalid object attribute reference H.clear.
ERROR 22-322: Syntax error, expecting one of the following: (, +, =.
ERROR 76-322: Syntax error, statement will be ignored.
670 if first.DATE then sum=0;
671 if H.check() then do;
ERROR: DATA STEP Component Object failure. Aborted during the COMPILATION phase.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I fixed the issue, there should have a parenthesis after clear. Now the following codes seem to work perfectly:
proc sort data=want1; by VAR1 DATE; run;
data TEST;
if _N_=1 then do;
dcl hash H(dataset:'want1(obs=0)');
H.definekey('ID');
H.definedone();
end;
set want1;
by VAR1 DATE;
if first.VAR1 then H.clear();
if first.DATE then sum=0;
if H.check() then do;
sum + 1;
H.add();
end;
if last.DATE then output;
keep VAR1 DATE sum;
run;