BookmarkSubscribeRSS Feed
JamieTee
Fluorite | Level 6

I have a few datasets for which I have many various DATE variables. I'm looking to get the data distribution on each one. I've seen past posts which describe how to get min, max, nmissing, but I cannot seem to find a way to get the Q1 and Q3 marks (and if possible P1 and P99).

I need to use PROC SQL because I am accessing data through some data servers which operate through PROC SQL.

Thank you kindly!

4 REPLIES 4
Reeza
Super User

If you're going through SQL and must use SQL are you using Explicit Pass through? Otherwise you can use the standard PROC MEANS approach. 

 

If you are using explicit pass through, then your code needs to be SQL specific and you need to use the specific functions that align with your SQL variant. 

 

How do you know which variables are dates that you want to analyze? Based on variable type or a naming convention?

 


@JamieTee wrote:

I have a few datasets for which I have many various DATE variables. I'm looking to get the data distribution on each one. I've seen past posts which describe how to get min, max, nmissing, but I cannot seem to find a way to get the Q1 and Q3 marks (and if possible P1 and P99).

I need to use PROC SQL because I am accessing data through some data servers which operate through PROC SQL.

Thank you kindly!


 

JamieTee
Fluorite | Level 6

Hi @Reeza thank you for your reply!

 

I looked up explicit pass through (https://communities.sas.com/t5/SAS-Programming/Implicit-vs-Explicit-SQL-Pass-through-SQL-Query-in-SA...), and it looks like that's exactly what I'm using. 

 

I would be identifying the DATE variables based on their format, which is generally DATE9. - would it be possible to search by the 'Type' of the variable instead (in case something is in a different DATE format)? Thank you kindly for your help. 

Reeza
Super User

How I would do this is first to identify all my date variables. 

All DB have dictionary tables that indicate the variable name and type. You can query that to build your list of variables you need to apply it to. 

 

Second is to figure out what the function is you need for explicit pass through based on the SQL variant. Most have it, but they're never efficient at those calculations by the way.

 

If your data is in the smaller range (few million) I'd recommend downloading the data and then using a full SAS approach. That'll save you programming time but it'll take slightly longer to run - I suspect you won't make up the run time difference in this process but you'll learn how to do it I suppose. 

 

PGStats
Opal | Level 21

If the external DBMS doesn't provide the tools to calculate the quantiles, it could surely give you the record counts for each date. You could thus query for the dates and their frequencies and use proc univariate with a freq statement to get the quantiles. I think you would have to do this separately for each date variable.

 

hth

PG

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1338 views
  • 3 likes
  • 3 in conversation