BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Pontch
Fluorite | Level 6

Hi Guys, sorry for the very noobish question i have just started to used SAS 2 days ago:)

I want to run some data quality tests on one field in a very large table, such as looking the format and length of one field.

So i have created the following code:

Data New;

SET SASHELP.CARS;

x=length(MAKE);

Run;

proc freq data=new;

table x;

run;

That work ok for the table CARS as that table is not that big, but the testing need to be done on a table conaining over 66million transaction so i am bit worried that the DATA steps would create a 66 million record in my working directiory.

Is there a better way to carry out statistics on the length of one field? can this done be directy using a proc or do i need to create the field first in Data?

Any help much appreciated!

NB: If you guys are willing to share any scripts checking for table integrity (such as checking for not only length but special character that would be very helpful!)

cheers,

Tom

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

Tom,

It's actually pretty straightforward.  Since you just started using SAS two days ago, I'll add an extra tip or two.

data new / view=new;

set sashelp.cars (keep=make);

len_make = lengthn(make);

run;

proc freq data=new;

tables len_make;

run;

A view will store instructions on how to extract the data, rather than storing the data itself.  So you don't need to worry about overloading your work directory.

When bringing in large data sets, applying keep= on the SET statement limits which fields get read and will speed up your program dramatically.

Use a variable name that will help you read the report later.  In this case, I suggested len_make instead of x.

Finally, the LENGTHN function is slightly different than the LENGTH function.  LENGTH has an unusual feature in that it never returns 0.  When the incoming field is blank, it returns 1.  LENGTHN will return 0 for blank fields.

Good luck.

View solution in original post

4 REPLIES 4
Haikuo
Onyx | Level 15

Are you aware of proc contents?

proc contents data=SASHELP.CARS;

run;

It seems get what you want. In term of special characters, are they part of variable name or variable value? Proc contents will take care of the former, while for later, you have to check it record by record.

Welcome on board!

Haikuo

Astounding
PROC Star

Tom,

It's actually pretty straightforward.  Since you just started using SAS two days ago, I'll add an extra tip or two.

data new / view=new;

set sashelp.cars (keep=make);

len_make = lengthn(make);

run;

proc freq data=new;

tables len_make;

run;

A view will store instructions on how to extract the data, rather than storing the data itself.  So you don't need to worry about overloading your work directory.

When bringing in large data sets, applying keep= on the SET statement limits which fields get read and will speed up your program dramatically.

Use a variable name that will help you read the report later.  In this case, I suggested len_make instead of x.

Finally, the LENGTHN function is slightly different than the LENGTH function.  LENGTH has an unusual feature in that it never returns 0.  When the incoming field is blank, it returns 1.  LENGTHN will return 0 for blank fields.

Good luck.

Reeza
Super User

It sounds like you're doing data checking/cleaning and variable integrity checks rather than table integrity checks.

Try searching for data cleaning at lexjansen.com to get some starting points of ways that you can make it easier.

One suggestion might be to run proc freq on the table first then look at the length of the variable that is outputted instead.

If you're accessing data through a SQL or ORACLE server there may be other efficiencies available as well.

proc freq data=sashelp.cars;

table make/out=car_makes;

run;

Data New;

SET car_makes;

x=length(MAKE);

Run;

Pontch
Fluorite | Level 6

Thanks a lot guys, looks like i am going to enjoy that forum, all your answers were very helpful!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 714 views
  • 6 likes
  • 4 in conversation