BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
France
Quartz | Level 8

Dear all,

 

I would like to list all distinct id (docdb_family_id) from each company (psn_name) in 1990, by using following codes,

 

PROC SQL;
  CREATE TABLE base_of_citation AS
    SELECT
    co.psn_name,
    DISTINCT ap.docdb_family_id 
    FROM sample.applications AS ap
    JOIN sample.personapplication AS pe ON ap.appln_id = pe.appln_id
    JOIN sample.companies AS co ON pe.person_id = co.person_id
    WHERE applt_seq_nr > 0
    AND earliest_filing_year=1990
    AND granted = 1
    GROUP BY psn_name
    ORDER BY psn_name
    ;
QUIT;

 

but the result shows like following,

432  PROC SQL;
433    CREATE TABLE base_of_citation AS
434      SELECT
435      co.psn_name,
436      DISTINCT ap.docdb_family_id
                  --
                  22
ERROR 22-322: Syntax error, expecting one of the following: !, !!, &, (, *, **, +, ',', -, /, <,
              <=, <>, =, >, >=, ?, AND, BETWEEN, CONTAINS, EQ, EQT, FROM, GE, GET, GT, GTT, LE,
              LET, LIKE, LT, LTT, NE, NET, OR, ^=, |, ||, ~=.

437      FROM sample.applications AS ap
438      JOIN sample.personapplication AS pe ON ap.appln_id = pe.appln_id
439      JOIN sample.companies AS co ON pe.person_id = co.person_id
440      WHERE applt_seq_nr > 0
441      AND earliest_filing_year=1990
442      AND granted = 1
443      GROUP BY psn_name
444      ORDER BY psn_name
445      ;
446  QUIT;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE SQL used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds

 

Could you please give me some suggestions about it ? thanks in advance.

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

@France wrote:

I would like to list companies (psn_name) and all distinct id (docdb_family_id) which belongs to company (psn_name) in 1990.

 


So tell SQL that is what you want.  You want the set of distinct values of psn_name and docdb_family_id.

select distinct
  psn_name
 ,docdb_family_id
from have 
where <whatever logic means 1990>
;

 

Not sure how to implement "in 1990".  Is that a test of a variable that is on every row?  or some type of aggregate test?

View solution in original post

12 REPLIES 12
PaigeMiller
Diamond | Level 26

DISTINCT has to be the word immediately after SELECT, you can't put it anywhere else.

 

That ought to fix the problem.

--
Paige Miller
France
Quartz | Level 8

thanks for your suggestion @Miller,

but what should I do if I want to keep psn_name in the TABLE ?

PaigeMiller
Diamond | Level 26

I never said to remove psn_name, it's still there, but now it is after DISTINCT instead of BEFORE

--
Paige Miller
France
Quartz | Level 8

do you mean I can write like this ? will psn_name be influenced by DISTINCT ?  or will I get different results? many thanks for your advise.

 

PROC SQL;
  CREATE TABLE base_of_citation AS
    SELECT
    DISTINCT ap.docdb_family_id, co.psn_name
    FROM sample.applications AS ap
    JOIN sample.personapplication AS pe ON ap.appln_id = pe.appln_id
    JOIN sample.companies AS co ON pe.person_id = co.person_id
    WHERE applt_seq_nr > 0
    AND earliest_filing_year=1990
    AND granted = 1
    GROUP BY psn_name
    ORDER BY psn_name
    ;
Reeza
Super User

Distinct will apply to all columns not just the column you identified. It doesn't make sense otherwise though...how would you return a distinct when you have multiple of other variables?


@France wrote:

do you mean I can write like this ? will psn_name be influenced by DISTINCT ?  or will I get different results? many thanks for your advise.

 

PROC SQL;
  CREATE TABLE base_of_citation AS
    SELECT
    DISTINCT ap.docdb_family_id, co.psn_name
    FROM sample.applications AS ap
    JOIN sample.personapplication AS pe ON ap.appln_id = pe.appln_id
    JOIN sample.companies AS co ON pe.person_id = co.person_id
    WHERE applt_seq_nr > 0
    AND earliest_filing_year=1990
    AND granted = 1
    GROUP BY psn_name
    ORDER BY psn_name
    ;

 

France
Quartz | Level 8

thanks for your reply @Reeza,

 

but what should I do for it ?

Reeza
Super User

I don't know because I don't know what your data looks like or what you want as output. 

 


@France wrote:

thanks for your reply @Reeza,

 

but what should I do for it ?


 

France
Quartz | Level 8

I would like to list companies (psn_name) and all distinct id (docdb_family_id) which belongs to company (psn_name) in 1990.

for example,

based on following dataset

psn_name, docdb_family_id
a 1
a 1
a 2
b 3
b 4
b 5

 

 

and get following result,

psn_name, docdb_family_id
a 1
a 2
b 3
b 4
b 5

 

 

besides, if you want to, you could get a sample of this database from the link https://forums.epo.org/patents-application-and-patents-granted-per-year-firm-level-7565#p19890 

Tom
Super User Tom
Super User

@France wrote:

I would like to list companies (psn_name) and all distinct id (docdb_family_id) which belongs to company (psn_name) in 1990.

 


So tell SQL that is what you want.  You want the set of distinct values of psn_name and docdb_family_id.

select distinct
  psn_name
 ,docdb_family_id
from have 
where <whatever logic means 1990>
;

 

Not sure how to implement "in 1990".  Is that a test of a variable that is on every row?  or some type of aggregate test?

PaigeMiller
Diamond | Level 26

@France wrote:

I would like to list companies (psn_name) and all distinct id (docdb_family_id) which belongs to company (psn_name) in 1990.

for example,

based on following dataset

psn_name, docdb_family_id
a 1
a 1
a 2
b 3
b 4
b 5

 

 

and get following result,

psn_name, docdb_family_id
a 1
a 2
b 3
b 4
b 5

 

 

besides, if you want to, you could get a sample of this database from the link https://forums.epo.org/patents-application-and-patents-granted-per-year-firm-level-7565#p19890 


It seems as if you haven't actually tried running the modification to your code that I recommended. Please give it a try.

--
Paige Miller
PaigeMiller
Diamond | Level 26

@France wrote:

do you mean I can write like this ? will psn_name be influenced by DISTINCT ?  or will I get different results? many thanks for your advise.

 

PROC SQL;
  CREATE TABLE base_of_citation AS
    SELECT
    DISTINCT ap.docdb_family_id, co.psn_name
    FROM sample.applications AS ap
    JOIN sample.personapplication AS pe ON ap.appln_id = pe.appln_id
    JOIN sample.companies AS co ON pe.person_id = co.person_id
    WHERE applt_seq_nr > 0
    AND earliest_filing_year=1990
    AND granted = 1
    GROUP BY psn_name
    ORDER BY psn_name
    ;

The simplest answer to your question is that you should try it and see what happens. (Although it's not obvious to me why you made a change to your code and now have psn_name after docdb_family_id ... )

--
Paige Miller
SuryaKiran
Meteorite | Level 14

Hi,

 

If you want to keep only unique "ap.docdb_family_id" then what values you want if "co.psn_name" is not unique. For example 

data test;
input id var $;
datalines;
1 A
1 B
2 A
3 A
4 A
;
run;

In this test data id 1 has two values, if you want to have unique id values then you need either 'A' or 'B'. 

 

Proc sort will delete the duplicates by key, it will keep the first record and delete the rest, in the example it keeps 'A' and deletes 'B' record.

Proc sort data=want nodupkey;
by id ;
run;

Another approach is using monotonic() to assign row number and grab the max row number ie, assuming the recent record. You might have some date or row number that you can use to identify recent record.

Proc sql;
create table want as 
select id,var
	from (select monotonic() as row_no,* from test)
	group by id
having max(row_no)=row_no;
quit;

 

Thanks,
Suryakiran

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 1186 views
  • 1 like
  • 5 in conversation