BookmarkSubscribeRSS Feed
buddha_d
Pyrite | Level 9

Dear SAS experts,

         I want to have some suggestions to validate a dataset which has 100 variables and over 100000 records. I am importing data in string form (even for the numeric data). I observed some variables getting truncated in the end. How do I validate each variable to make sure that the data populated totally (not truncated)? 

          How do we validate the dataset? 

 

Thanks in advance 

6 REPLIES 6
buddha_d
Pyrite | Level 9

Just as an example :

 

data one;
infile "xyz.txt" RECFM=V LRECL=2000 PAD MISSOVER;
length
a1 $20.
a2 $100.
a3 $50.
;
input a1 $
a2 $
a3 $
;
run;

SASKiwi
PROC Star

Unless you have a completely correct version of your data to compare to you are not going to be able to validate your data fully.

 

What is it about your input file that means you are unable to read it correctly in the first place?

buddha_d
Pyrite | Level 9

SAS Kiwi,

           For example the string value is 1532564.7564 and I am getting 1532564.756 after my import. So, My question is this is the value that got is getting truncated. But, when I change it to numberical data then I would get the full value. Like wise, Licnum is character data (eg:12xd456) and this is getting truncated in the last digit ( shows up as12xd456). 

         there are about 100 columns with 100,000 records, how do I validate each column that there is no truncation. I have specification sheet, but this is not importing each column perfectly. In order to get the data with out truncation, how do I code each column to check and see that data imported with out truncation. When it is a huge dataset, it is hard to check each line, so I am thinking of having some kind of macro that checks the maximum length to begin with. Based on that I could check the specifications and confirm if I need to increase the length of  a variable.

 

Thanks 

buddha_d
Pyrite | Level 9

sorry for the typo LicNum showing up as  12xd4 instead of 12xd456 (actutal)

SASKiwi
PROC Star

Why are you reading in numbers as strings? If you read them in as numbers to begin with you wouldn't get truncation.

 

For example 1532564.7564 can be read using a numeric INFORMAT like so:

 

input @10 MyNum 12.;
Reeza
Super User
Do you have a specification document? SAS cannot know what the values should be, it does a best guess. If it doesn't work, then you need to tell SAS what it should be or how to read it, which needs to come from somewhere. So for the 100 variables, how do you know, besides inspecting every record, what the type, format, and length it should be.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1218 views
  • 0 likes
  • 3 in conversation