Help using Base SAS procedures

Summarise question`

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 13
Accepted Solution

Summarise question`

Hi Guys, sorry for the very noobish question i have just started to used SAS 2 days agoSmiley Happy

I want to run some data quality tests on one field in a very large table, such as looking the format and length of one field.

So i have created the following code:

Data New;

SET SASHELP.CARS;

x=length(MAKE);

Run;

proc freq data=new;

table x;

run;

That work ok for the table CARS as that table is not that big, but the testing need to be done on a table conaining over 66million transaction so i am bit worried that the DATA steps would create a 66 million record in my working directiory.

Is there a better way to carry out statistics on the length of one field? can this done be directy using a proc or do i need to create the field first in Data?

Any help much appreciated!

NB: If you guys are willing to share any scripts checking for table integrity (such as checking for not only length but special character that would be very helpful!)

cheers,

Tom


Accepted Solutions
Solution
‎02-01-2012 01:36 PM
Super User
Posts: 5,497

Summarise question`

Tom,

It's actually pretty straightforward.  Since you just started using SAS two days ago, I'll add an extra tip or two.

data new / view=new;

set sashelp.cars (keep=make);

len_make = lengthn(make);

run;

proc freq data=new;

tables len_make;

run;

A view will store instructions on how to extract the data, rather than storing the data itself.  So you don't need to worry about overloading your work directory.

When bringing in large data sets, applying keep= on the SET statement limits which fields get read and will speed up your program dramatically.

Use a variable name that will help you read the report later.  In this case, I suggested len_make instead of x.

Finally, the LENGTHN function is slightly different than the LENGTH function.  LENGTH has an unusual feature in that it never returns 0.  When the incoming field is blank, it returns 1.  LENGTHN will return 0 for blank fields.

Good luck.

View solution in original post


All Replies
Respected Advisor
Posts: 3,156

Re: Summarise question`

Are you aware of proc contents?

proc contents data=SASHELP.CARS;

run;

It seems get what you want. In term of special characters, are they part of variable name or variable value? Proc contents will take care of the former, while for later, you have to check it record by record.

Welcome on board!

Haikuo

Solution
‎02-01-2012 01:36 PM
Super User
Posts: 5,497

Summarise question`

Tom,

It's actually pretty straightforward.  Since you just started using SAS two days ago, I'll add an extra tip or two.

data new / view=new;

set sashelp.cars (keep=make);

len_make = lengthn(make);

run;

proc freq data=new;

tables len_make;

run;

A view will store instructions on how to extract the data, rather than storing the data itself.  So you don't need to worry about overloading your work directory.

When bringing in large data sets, applying keep= on the SET statement limits which fields get read and will speed up your program dramatically.

Use a variable name that will help you read the report later.  In this case, I suggested len_make instead of x.

Finally, the LENGTHN function is slightly different than the LENGTH function.  LENGTH has an unusual feature in that it never returns 0.  When the incoming field is blank, it returns 1.  LENGTHN will return 0 for blank fields.

Good luck.

Super User
Posts: 19,770

Summarise question`

It sounds like you're doing data checking/cleaning and variable integrity checks rather than table integrity checks.

Try searching for data cleaning at lexjansen.com to get some starting points of ways that you can make it easier.

One suggestion might be to run proc freq on the table first then look at the length of the variable that is outputted instead.

If you're accessing data through a SQL or ORACLE server there may be other efficiencies available as well.

proc freq data=sashelp.cars;

table make/out=car_makes;

run;

Data New;

SET car_makes;

x=length(MAKE);

Run;

Occasional Contributor
Posts: 13

Summarise question`

Thanks a lot guys, looks like i am going to enjoy that forum, all your answers were very helpful!

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 160 views
  • 6 likes
  • 4 in conversation