- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
So I have a data with over a million records say;
name salary wages
Dog 100 20
Cat 120 40
Cattle 391 35
. . .
How do I check if "Cattle " is not truncated in my data and is it also possible to check the variable name for any truncated record/observation ?
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@CathyVI wrote:
So I have a data with over a million records say;
name salary wages
Dog 100 20
Cat 120 40
Cattle 391 35
. . .
How do I check if "Cattle " is not truncated in my data and is it also possible to check the variable name for any truncated record/observation ?
Thanks.
I would start with Proc Contents. If the length of any of my defined lengths looks off the there is a problem.
One question I would also ask is why do you suspect you may have truncated data? The main reason I can think of that possibility is using Proc Import to "read" data. Since Proc Import makes guesses each time a file is read then similar files can end up with different variable properties such as length or variable type. If the data source is a text file of some sort one likely source of truncation can be removed by using a GUESSINGROWS=MAX; statement.
If your data is coming from some other file type, spreadsheet perhaps, then GUESSINGROWS is not available and only a few rows are examined to set the variable types and lengths. Make a text file such as CSV. Import that.
But you still have potential issues with different files. If you were given any documentation as to the expected lengths of the variables then you should use a data step to read the data and use the documentation to set variable types and lengths.
Note that Proc Import will create data step code when reading a text file. Copy from the editor, set the lengths to the documented or expected lengths. Reread.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@CathyVI wrote:
So I have a data with over a million records say;
name salary wages
Dog 100 20
Cat 120 40
Cattle 391 35
. . .
How do I check if "Cattle " is not truncated in my data and is it also possible to check the variable name for any truncated record/observation ?
Thanks.
Truncation implies the transformation of data from source to target. So, by definition, you have untruncated data in source, and potentially truncated data in target.
So, get the max length of your data from source, then compare to the allocated length of your target.
I'm not saying this is the prettiest code but perhaps you can run with it:
data source;
set sashelp.class;
run;
proc sql noprint;
create table lengths as
select "NAME" as name, max(length(name)) as src_length
from sashelp.class;
quit;
data target;
length name $3; * causes truncation ;
set source;
run;
proc contents data=target out=contents noprint;
run;
proc sql;
create table baddata as
select
*
from
contents a
inner join
lengths b
on
lowcase(a.name)=lowcase(b.name)
having
a.length < b.src_length
;
quit;
Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.