DATA Step, Macro, Functions and more

Count the number of distinct values

Accepted Solution Solved
Reply
Contributor
Posts: 45
Accepted Solution

Count the number of distinct values

[ Edited ]

hello all,

 

I would like to count the number of distinct pat_bubln_id of earliest_filing_id by following codes,

 

proc sql;

create table step1.citatins_of_earliest_filing_id as
SELECT 
t6.psn_name,
t1.earliest_filing_id,
count(distinct t3.pat_publn_id) AS cites_in_2006
from Sasdata.Applicationsnew as t1
join Step1.Earlist_grant_date_earlist_id2 as t2 on t1.appln_id=t2.appln_id
join Sasdata.Publicationsnew as t2b on t2.appln_id=t2b.appln_id
join Sasdata.Citations as t3 on t2b.pat_publn_id=t3.cited_pat_publn_id
join Sasdata.Publicationsnew as t4 on t3.pat_publn_id=t4.pat_publn_id
join Sasdata.Personapplication as t5 on t1.appln_id = t5.appln_id
join Sasdata.Companies as t6 on t5.person_id = t6.person_id
where 
year(t2.granted_date) <> 9999
and t2.granted_date >= '2000-01-01'
and t2.granted_date <= '2000-12-31'
and t2b.publn_first_grant=1
and t4.publn_date <= DATE_ADD(t2.Earliest_grant_date_final,INTERVAL 3 YEAR)
group by co.psn_name, ap.earliest_filing_id
order by co.psn_name, ap.earliest_filing_id
;

quit;

 

but the result shows,

 

292  proc sql;
293
294  create table step1.citatins_of_earliest_filing_id as
295  SELECT
296  t6.psn_name,
297  t1.earliest_filing_id,
298  count(distinct t3.pat_publn_id) AS cites_in_2006
299  from Sasdata.Applicationsnew as t1
300  join Step1.Earlist_grant_date_earlist_id2 as t2 on t1.appln_id=t2.appln_id
301  join Sasdata.Publicationsnew as t2b on t2.appln_id=t2b.appln_id
302  join Sasdata.Citations as t3 on t2b.pat_publn_id=t3.cited_pat_publn_id
303  join Sasdata.Publicationsnew as t4 on t3.pat_publn_id=t4.pat_publn_id
304  join Sasdata.Personapplication as t5 on t1.appln_id = t5.appln_id
305  join Sasdata.Companies as t6 on t5.person_id = t6.person_id
306  where
307  year(t2.granted_date) <> 9999
308  and t2.granted_date >= '2000-01-01'
309  and t2.granted_date <= '2000-12-31'
310  and t2b.publn_first_grant=1
311  and t4.publn_date <= DATE_ADD(t2.Earliest_grant_date_final,INTERVAL 3 YEAR)
                                                                         -
                                                                         22
                                                                         200
ERROR 22-322: Syntax error, expecting one of the following: !, !!, &, (, ), *, **, +, ',', -, '.',
              /, <, <=, <>, =, >, >=, ?, AND, BETWEEN, CONTAINS, EQ, EQT, GE, GET, GT, GTT, IN, IS,
              LE, LET, LIKE, LT, LTT, NE, NET, NOT, NOTIN, OR, ^, ^=, |, ||, ~, ~=.

ERROR 200-322: The symbol is not recognized and will be ignored.

312  group by co.psn_name, ap.earliest_filing_id
313  order by co.psn_name, ap.earliest_filing_id
314  ;
NOTE: The "<>" operator is interpreted as "not equals".
315
316  quit;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE SQL used (Total process time):
      real time           0.01 seconds
      cpu time            0.03 seconds


 

 

could you please give me some suggestions to improve the codes?

 

thanks in advance.


Accepted Solutions
Solution
Thursday
Super User
Super User
Posts: 9,427

Re: I

You need to do some debugging yourself.  The log is telling you what the issue is:

ERROR: INPUT function requires a character argument.

Therefore your variable is numeric, so treat it as numeric (remember I cannot see your data, I can only suggest):

year(t2.Earliest_grant_date_final) <> 9999
and t2.Earliest_grant_date_final >= '01Jan2000'd
and t2.Earliest_grant_date_final <= '31Dec2000'd
and t2b.publn_first_grant=1
and t4.publn_date <= intnx('year',t2.earliest_grant_date_final,3)

View solution in original post


All Replies
Super User
Super User
Posts: 9,427

Re: I

This:

DATE_ADD(t2.Earliest_grant_date_final,INTERVAL 3 YEAR)

Is not an ANSI SQL piece of code.  I suspect you have got this code from a database which has this "date_add" function.  You will need to replace it with either ANSI SQL compliant code, or use SAS functions, or you could pass through to the original database and execute it there.  As for SAS syntax:

intnx('year',t2.earliest_grant_date_final,3)

Should be nearly there. 

Contributor
Posts: 45

Re: I

thanks @RW9 .  

 

but I have a new problem for this code now. By using following codes,

 

SELECT 
t6.psn_name,
t1.earliest_filing_id,
count(distinct t3.pat_publn_id) AS cites_in_2006
from Sasdata.Applicationsnew as t1
join Step1.Earlist_grant_date_earlist_id2 as t2 on t1.appln_id=t2.appln_id
join Sasdata.Publicationsnew as t2b on t2.appln_id=t2b.appln_id
join Sasdata.Citations as t3 on t2b.pat_publn_id=t3.cited_pat_publn_id
join Sasdata.Publicationsnew as t4 on t3.pat_publn_id=t4.pat_publn_id
join Sasdata.Personapplication as t5 on t1.appln_id = t5.appln_id
join Sasdata.Companies as t6 on t5.person_id = t6.person_id
where 
year(t2.Earliest_grant_date_final) <> 9999
and t2.Earliest_grant_date_final >= '2006-01-01'
and t2.Earliest_grant_date_final <= '2006-12-31'
and t2b.publn_first_grant=1
and t4.publn_date <= intnx('year',t2.earliest_grant_date_final,3)
group by t6.psn_name, t1.earliest_filing_id
order by t6.psn_name, t1.earliest_filing_id
;

quit;

the result shows,

 

367  proc sql;
368
369  create table step1.citatins_of_earliest_filing_id as
370  SELECT
371  t6.psn_name,
372  t1.earliest_filing_id,
373  count(distinct t3.pat_publn_id) AS cites_in_2006
374  from Sasdata.Applicationsnew as t1
375  join Step1.Earlist_grant_date_earlist_id2 as t2 on t1.appln_id=t2.appln_id
376  join Sasdata.Publicationsnew as t2b on t2.appln_id=t2b.appln_id
377  join Sasdata.Citations as t3 on t2b.pat_publn_id=t3.cited_pat_publn_id
378  join Sasdata.Publicationsnew as t4 on t3.pat_publn_id=t4.pat_publn_id
379  join Sasdata.Personapplication as t5 on t1.appln_id = t5.appln_id
380  join Sasdata.Companies as t6 on t5.person_id = t6.person_id
381  where
382  year(t2.Earliest_grant_date_final) <> 9999
383  and t2.Earliest_grant_date_final >= '2000-01-01'
384  and t2.Earliest_grant_date_final <= '2000-12-31'
385  and t2b.publn_first_grant=1
386  and t4.publn_date <= intnx('year',t2.earliest_grant_date_final,3)
387  group by t6.psn_name, t1.earliest_filing_id
388  order by t6.psn_name, t1.earliest_filing_id
389  ;
NOTE: The "<>" operator is interpreted as "not equals".
NOTE: Data file SASDATA.PUBLICATIONSNEW.DATA is in a format that is native to another host, or the
      file encoding does not match the session encoding. Cross Environment Data Access will be
      used, which might require additional CPU resources and might reduce performance.
NOTE: Data file SASDATA.PUBLICATIONSNEW.DATA is in a format that is native to another host, or the
      file encoding does not match the session encoding. Cross Environment Data Access will be
      used, which might require additional CPU resources and might reduce performance.
ERROR: Expression using IN has components that are of different data types.
NOTE: The IN referred to may have been transformed from an OR to an IN at some point during PROC
      SQL WHERE clause optimization.
390
391  quit;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE SQL used (Total process time):
      real time           0.55 seconds
      cpu time            0.09 seconds


 

But there is no IN in the codes, could you please show me how to improve it ?

thanks a lot.

Super User
Super User
Posts: 9,427

Re: I

You are mixing up types somewhere:

year(t2.Earliest_grant_date_final) <> 9999 383 and t2.Earliest_grant_date_final >= '2000-01-01' 384 and t2.Earliest_grant_date_final <= '2000-12-31' 385 and t2b.publn_first_grant=1 386 and t4.publn_date <= intnx('year',t2.earliest_grant_date_final,3)

Is earliest_grant_date_final a numeric date value - as indicated by the first line, if so line 2 and 3 are invalid as you can't compare a text string with a number.  If its a character varaible then line 1 and 5 are incorrect as they assume numbers. 

I will assume earliest_grant_date_final is a character:

year(input(t2.Earliest_grant_date_final,yymmdd10.)) <> 9999
and input(t2.Earliest_grant_date_final,yymmdd10.) >= '01Jan2000'd
and input(t2.Earliest_grant_date_final,yymmdd10.) <= '31Dec2000'd
and t2b.publn_first_grant=1
and t4.publn_date <= intnx('year',input(t2.earliest_grant_date_final,yymmdd10.),3)

Something like that.  I would really read up on how dates are stored/used in SAS as they are different from a database.

Contributor
Posts: 45

Re: I

thanks but the result like this,

443  proc sql;
444
445  create table step1.citatins_of_earliest_filing_id as
446  SELECT
447  t6.psn_name,
448  t1.earliest_filing_id,
449  count(distinct t3.pat_publn_id) AS cites_2006
450  from Sasdata.Applicationsnew as t1
451  join Step1.Earlist_grant_date_earlist_id2 as t2 on t1.appln_id=t2.appln_id
452  join Sasdata.Publicationsnew as t2b on t2.appln_id=t2b.appln_id
453  join Sasdata.Citations as t3 on t2b.pat_publn_id=t3.cited_pat_publn_id
454  join Sasdata.Publicationsnew as t4 on t3.pat_publn_id=t4.pat_publn_id
455  join Sasdata.Personapplication as t5 on t1.appln_id = t5.appln_id
456  join Sasdata.Companies as t6 on t5.person_id = t6.person_id
457  where
458  year(input(t2.Earliest_grant_date_final,yymmdd10.)) <> 9999
459  and input(t2.Earliest_grant_date_final,yymmdd10.) >= '01Jan2000'd
460  and input(t2.Earliest_grant_date_final,yymmdd10.) <= '31Dec2000'd
461  and t2b.publn_first_grant=1
462  and t4.publn_date <= intnx('year',input(t2.earliest_grant_date_final,yymmdd10.),3)
463  and t5.applt_seq_nr > 0
464  group by t6.psn_name, t1.earliest_filing_id
465  order by t6.psn_name, t1.earliest_filing_id
466  ;
NOTE: The "<>" operator is interpreted as "not equals".
NOTE: Data file SASDATA.PUBLICATIONSNEW.DATA is in a format that is native to another host, or the
      file encoding does not match the session encoding. Cross Environment Data Access will be
      used, which might require additional CPU resources and might reduce performance.
NOTE: Data file SASDATA.PUBLICATIONSNEW.DATA is in a format that is native to another host, or the
      file encoding does not match the session encoding. Cross Environment Data Access will be
      used, which might require additional CPU resources and might reduce performance.
ERROR: INPUT function requires a character argument.
ERROR: Function YEAR requires a numeric expression as argument 1.
ERROR: Expression using not equals (^=) has components that are of different data types.
ERROR: INPUT function requires a character argument.
ERROR: Expression using greater than or equal (>=) has components that are of different data types.
ERROR: INPUT function requires a character argument.
ERROR: Expression using less than or equal (<=) has components that are of different data types.
ERROR: INPUT function requires a character argument.
ERROR: Function INTNX requires a numeric expression as argument 2.
ERROR: Expression using less than or equal (<=) has components that are of different data types.
ERROR: INPUT function requires a character argument.
ERROR: INPUT function requires a character argument.
ERROR: INPUT function requires a character argument.
ERROR: INPUT function requires a character argument.
467
468  quit;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE SQL used (Total process time):
      real time           0.63 seconds
      cpu time            0.06 seconds


Solution
Thursday
Super User
Super User
Posts: 9,427

Re: I

You need to do some debugging yourself.  The log is telling you what the issue is:

ERROR: INPUT function requires a character argument.

Therefore your variable is numeric, so treat it as numeric (remember I cannot see your data, I can only suggest):

year(t2.Earliest_grant_date_final) <> 9999
and t2.Earliest_grant_date_final >= '01Jan2000'd
and t2.Earliest_grant_date_final <= '31Dec2000'd
and t2b.publn_first_grant=1
and t4.publn_date <= intnx('year',t2.earliest_grant_date_final,3)
Super User
Posts: 9,914

Re: I

Maxim 3: Know your data.

Do a proc contents on table Step1.Earlist_grant_date_earlist_id2 and look for the type and display format of earliest_grant_date_final.

 

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 106 views
  • 1 like
  • 3 in conversation