Hello folks, I need to compare multiple list of IDs by dates and record distinct count of IDs in proc sql union function. Below are my codes:
proc sql; select count(distinct ID) as cnt from( select ID from test(where=(DATE="2019/10")) UNION select ID from test(where=(DATE="2019/11")) UNION select ID from test(where=(DATE="2019/12")) UNION select ID from test(where=(DATE="2020/01")) .../*with more dates*/ ); quit;
Basically, I only need to count distinct IDs every time I "union" another date if that makes sense. Currently, my code only spits out the total distinct count for all the dates I am combining, for instance, after running the code above, SAS will have a single number in the output. But I want to have the distinct count each time I am adding in a date, for example, starting with the distinct count of IDs when date = 2019/09, I will have another count when date=2019/10 was added in, another count when date=2019/11 added in... which goes on until my last date value=2020/05. Is there a solution for my question? Thanks so much!
I fixed the issue, there should have a parenthesis after clear. Now the following codes seem to work perfectly:
proc sort data=want1; by VAR1 DATE; run;
data TEST;
if _N_=1 then do;
dcl hash H(dataset:'want1(obs=0)');
H.definekey('ID');
H.definedone();
end;
set want1;
by VAR1 DATE;
if first.VAR1 then H.clear();
if first.DATE then sum=0;
if H.check() then do;
sum + 1;
H.add();
end;
if last.DATE then output;
keep VAR1 DATE sum;
run;
I don't think it is quite clear how you expect the result to look.
Can you show us an example of how the result looks? Best is a small example of the starting data and then the result.
Hi thank you and this's a great question, and I don't have a very clear sense as what SAS's output would look like. I am expecting to have SAS record the distinct count of IDs in a time-series fashion. One key information I forgot to mention is that the IDs by dates are accumulative, meaning at the starting point when date=2019/10, the IDs will all be included when the date moves onto the next month 2019/11. For example, if there are 20 distinct IDs when date=2019/10, SAS can create a variable that captures this number when date=2019/10; and when date=2019/11, there are 15 distinct IDs, but 13 of the 15 IDs are not distinct from the IDs when date=2019/10, only 2 IDs are newly added, so I want SAS to capture 17 as the total distinct count of IDs when date=2019/11. Hoping i am explaining this clearly...
Like this?
select DATE, count(distinct ID) as CNT from test where DATE in("2019/10","2019/11","2019/12") group by DATE
Thanks for your reply. Your method will give me the distinct count of IDs by date but this is not what I want exactly. What I want is the distinct count of IDs accumulatively if that makes sense. So whatever the IDs have been counted in a previous dates will be counted ONLY once and only the newly added IDs by dates will be added to the count. This requires a comparison of IDs between two adjacent dates I believe. Let me know whether I explain this clearly.
I found an old discussion thread @ChrisNZ you was involved which answers my question perfectly!!!
if you don't mind, may I ask how to set up a looping macro based on following your code? Say if I have to run your following code based on another variable (VAR1) in the HAVE dataset. Basically I need to run your following code multiple times when VAR1=1,2,3,4,5...
Is there a way to do that?
data WANT;
if _N_=1 then do;
dcl hash H(dataset:'HAVE(obs=0)');
H.definekey('ID');
H.definedone();
end;
set HAVE;
by DATE;
if first.DATE then SUM=0;
if H.check() then do;
SUM+1;
H.add();
end;
if last.DATE then output;
keep DATE SUM;
run;
This, maybe?
data WANT;
if _N_=1 then do;
dcl hash H(dataset:'HAVE(obs=0)');
H.definekey('ID');
H.definedone();
end;
set HAVE;
by VAR1 DATE;
if first.VAR1 then H.clear;
if first.DATE then SUM=0;
if H.check() then do;
SUM+1;
H.add();
end;
if last.DATE then output;
keep VAR1 DATE SUM;
run;
Thank you for the reply! I ran the code but SAS spit out error messages, any ideas?
669 if first.VAR1 then H.clear;
--------
559 22
76
ERROR 559-185: Invalid object attribute reference H.clear.
ERROR 22-322: Syntax error, expecting one of the following: (, +, =.
ERROR 76-322: Syntax error, statement will be ignored.
670 if first.DATE then sum=0;
671 if H.check() then do;
ERROR: DATA STEP Component Object failure. Aborted during the COMPILATION phase.
I fixed the issue, there should have a parenthesis after clear. Now the following codes seem to work perfectly:
proc sort data=want1; by VAR1 DATE; run;
data TEST;
if _N_=1 then do;
dcl hash H(dataset:'want1(obs=0)');
H.definekey('ID');
H.definedone();
end;
set want1;
by VAR1 DATE;
if first.VAR1 then H.clear();
if first.DATE then sum=0;
if H.check() then do;
sum + 1;
H.add();
end;
if last.DATE then output;
keep VAR1 DATE sum;
run;
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Select SAS Training centers are offering in-person courses. View upcoming courses for: