BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
France
Quartz | Level 8

how can I rewrite the following codes in order to make the computation from 200001 to 201012?

Proc SQL; 
CREATE table step4.Y_number_of_pat AS SELECT co.psn_name, COUNT (distinct(case WHEN earliest_filing_year =2000 THEN DOCDB_FAMILY_ID END)) AS application2000, COUNT (distinct(case WHEN earliest_filing_year =2001 THEN DOCDB_FAMILY_ID END)) AS application2001, COUNT (distinct(case WHEN earliest_filing_year =2002 THEN DOCDB_FAMILY_ID END)) AS application2002, COUNT (distinct(case WHEN earliest_filing_year =2003 THEN DOCDB_FAMILY_ID END)) AS application2003, COUNT (distinct(case WHEN earliest_filing_year =2004 THEN DOCDB_FAMILY_ID END)) AS application2004, COUNT (distinct(case WHEN earliest_filing_year =2005 THEN DOCDB_FAMILY_ID END)) AS application2005, COUNT (distinct(case WHEN earliest_filing_year =2006 THEN DOCDB_FAMILY_ID END)) AS application2006, COUNT (distinct(case WHEN earliest_filing_year =2007 THEN DOCDB_FAMILY_ID END)) AS application2007, COUNT (distinct(case WHEN earliest_filing_year =2008 THEN DOCDB_FAMILY_ID END)) AS application2008, COUNT (distinct(case WHEN earliest_filing_year =2009 THEN DOCDB_FAMILY_ID END)) AS application2009, COUNT (distinct(case WHEN earliest_filing_year =2010 THEN DOCDB_FAMILY_ID END)) AS application2010 FROM Step1.appln_new AS ap JOIN Pat_ori.Personapplication AS pe ON ap.appln_id = pe.appln_id JOIN Pat_ori.Companies AS co ON pe.person_id = co.person_id WHERE applt_seq_nr > 0 /* only include patent applicants */ GROUP BY psn_name ORDER BY psn_name ; QUIT;

I count the number of distinct DOCDB_FAMILY_ID per year by using the above codes. but now I'd like to count the number of distinct DOCDB_FAMILY_ID per month. I will have to use the code like

 

COUNT (distinct(case WHEN earliest_filing_date =200001 THEN DOCDB_FAMILY_ID END)) AS application200001,
COUNT (distinct(case WHEN earliest_filing_date =200002 THEN DOCDB_FAMILY_ID END)) AS application200002,
...
COUNT (distinct(case WHEN earliest_filing_date =201012 THEN DOCDB_FAMILY_ID END)) AS application201012

and copy it hundreds time.

 

could you please give me some suggestions to simplify the codes?

thanks in advance.

1 ACCEPTED SOLUTION

Accepted Solutions
s_lassen
Meteorite | Level 14

Putting data (month and year) into variable names is generally not a good idea. Why not just do the computation "lengthwise" (more observations, fewer variables):

Proc SQL; 
CREATE table step4.Y_number_of_pat AS
 SELECT
  co.psn_name,
  COUNT (distinct(DOCDB_FAMILY_ID) AS application,
  earliest_filing_year,
  month
 FROM Step1.appln_new AS ap 
 JOIN Pat_ori.Personapplication AS pe ON ap.appln_id = pe.appln_id
 JOIN Pat_ori.Companies AS co ON pe.person_id = co.person_id
 WHERE applt_seq_nr > 0 /* only include patent applicants */
 GROUP BY psn_name,earliest_filing_year,month
 ORDER BY psn_name,earliest_filing_year,month
; 
QUIT;

If you think you absolutely must have the data format with many variables (and not just a report with year or month across, use PROC REPORT for that), it is much easier to apply a PROC TRANSPOSE step to your SQL output data.

View solution in original post

4 REPLIES 4
Community_Guide
SAS Moderator

Hello @France,


Your question requires more details before experts can help. Can you revise your question to include more information? 

 

Review this checklist:

  • Specify a meaningful subject line for your topic.  Avoid generic subjects like "need help," "SAS query," or "urgent."
  • When appropriate, provide sample data in text or DATA step format.  See this article for one method you can use.
  • If you're encountering an error in SAS, include the SAS log or a screenshot of the error condition. Use the Photos button to include the image in your message.
    use_buttons.png
  • It also helps to include an example (table or picture) of the result that you're trying to achieve.

To edit your original message, select the "blue gear" icon at the top of the message and select Edit Message.  From there you can adjust the title and add more details to the body of the message.  Or, simply reply to this message with any additional information you can supply.

 

edit_post.png

SAS experts are eager to help -- help them by providing as much detail as you can.

 

This prewritten response was triggered for you by fellow SAS Support Communities member @ballardw

.
ballardw
Super User

If you have an actual date value then you can create groups based on the formatted values.

 

Example with a data set you should have available.

proc summary data= sashelp.stocks nway;
   class stock date;
   format date yymmn6.;
   var open high low close volume;
   output out=work.example (drop=_freq_) mean=;
run;

NWAY only shows the combinations of Class variables that all the variables contribute to, otherwise you get combination of all records, stock only, date only as well as the stock date combination. The date is actually a trading date. The format yymmn6 says to group the values by calendar month and display as 198608 for example.

 

 

Class variables would be the equivalent of group by variables. The N statistic instead of Mean= would count non-missing.

It is really generally a much better idea to summarize into a single variable with group membership shown with the date variable.

Then if you need something human readable to use the date as an across variable for example.

proc tabulate data=sashelp.stocks;
   class stock date;
   format date yymmn6.;
   var open high low close volume;
   table stock=''*(open high low),
         date='Monthly mean'* (mean='');
run;
s_lassen
Meteorite | Level 14

Putting data (month and year) into variable names is generally not a good idea. Why not just do the computation "lengthwise" (more observations, fewer variables):

Proc SQL; 
CREATE table step4.Y_number_of_pat AS
 SELECT
  co.psn_name,
  COUNT (distinct(DOCDB_FAMILY_ID) AS application,
  earliest_filing_year,
  month
 FROM Step1.appln_new AS ap 
 JOIN Pat_ori.Personapplication AS pe ON ap.appln_id = pe.appln_id
 JOIN Pat_ori.Companies AS co ON pe.person_id = co.person_id
 WHERE applt_seq_nr > 0 /* only include patent applicants */
 GROUP BY psn_name,earliest_filing_year,month
 ORDER BY psn_name,earliest_filing_year,month
; 
QUIT;

If you think you absolutely must have the data format with many variables (and not just a report with year or month across, use PROC REPORT for that), it is much easier to apply a PROC TRANSPOSE step to your SQL output data.

PaigeMiller
Diamond | Level 26

@France wrote:

how can I rewrite the following codes in order to make the computation from 200001 to 201012?

Proc SQL; 
CREATE table step4.Y_number_of_pat AS
 SELECT
  co.psn_name,
  COUNT (distinct(case WHEN earliest_filing_year =2000 THEN DOCDB_FAMILY_ID END)) AS application2000,
  COUNT (distinct(case WHEN earliest_filing_year =2001 THEN DOCDB_FAMILY_ID END)) AS application2001,
   ...
; 
QUIT;

I count the number of distinct DOCDB_FAMILY_ID per year by using the above codes. but now I'd like to count the number of distinct DOCDB_FAMILY_ID per month. I will have to use the code like

 

COUNT (distinct(case WHEN earliest_filing_date =200001 THEN DOCDB_FAMILY_ID END)) AS application200001,
COUNT (distinct(case WHEN earliest_filing_date =200002 THEN DOCDB_FAMILY_ID END)) AS application200002,
...
COUNT (distinct(case WHEN earliest_filing_date =201012 THEN DOCDB_FAMILY_ID END)) AS application201012

and copy it hundreds time.

 

could you please give me some suggestions to simplify the codes?

thanks in advance.


As others have said, we need to know WHY you want variables application200001 through application201012. Most of the time, this is unnecessary work, and you can use earliest_filing_date rather than the variables named application200001 etc.

 

If you are just counting the number of occurrences (which seems to be what your SQL code is doing), then there are other ways to do this, such as GROUP BY in SQL, or PROC MEANS/PROC SUMMARY/PROC FREQ.

--
Paige Miller

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 669 views
  • 0 likes
  • 5 in conversation