So I have a data with over a million records say;
name salary wages
Dog 100 20
Cat 120 40
Cattle 391 35
. . .
How do I check if "Cattle " is not truncated in my data and is it also possible to check the variable name for any truncated record/observation ?
Thanks.
@CathyVI wrote:
So I have a data with over a million records say;
name salary wages
Dog 100 20
Cat 120 40
Cattle 391 35
. . .
How do I check if "Cattle " is not truncated in my data and is it also possible to check the variable name for any truncated record/observation ?
Thanks.
I would start with Proc Contents. If the length of any of my defined lengths looks off the there is a problem.
One question I would also ask is why do you suspect you may have truncated data? The main reason I can think of that possibility is using Proc Import to "read" data. Since Proc Import makes guesses each time a file is read then similar files can end up with different variable properties such as length or variable type. If the data source is a text file of some sort one likely source of truncation can be removed by using a GUESSINGROWS=MAX; statement.
If your data is coming from some other file type, spreadsheet perhaps, then GUESSINGROWS is not available and only a few rows are examined to set the variable types and lengths. Make a text file such as CSV. Import that.
But you still have potential issues with different files. If you were given any documentation as to the expected lengths of the variables then you should use a data step to read the data and use the documentation to set variable types and lengths.
Note that Proc Import will create data step code when reading a text file. Copy from the editor, set the lengths to the documented or expected lengths. Reread.
@CathyVI wrote:
So I have a data with over a million records say;
name salary wages
Dog 100 20
Cat 120 40
Cattle 391 35
. . .
How do I check if "Cattle " is not truncated in my data and is it also possible to check the variable name for any truncated record/observation ?
Thanks.
Truncation implies the transformation of data from source to target. So, by definition, you have untruncated data in source, and potentially truncated data in target.
So, get the max length of your data from source, then compare to the allocated length of your target.
I'm not saying this is the prettiest code but perhaps you can run with it:
data source;
set sashelp.class;
run;
proc sql noprint;
create table lengths as
select "NAME" as name, max(length(name)) as src_length
from sashelp.class;
quit;
data target;
length name $3; * causes truncation ;
set source;
run;
proc contents data=target out=contents noprint;
run;
proc sql;
create table baddata as
select
*
from
contents a
inner join
lengths b
on
lowcase(a.name)=lowcase(b.name)
having
a.length < b.src_length
;
quit;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.