DATA Step, Macro, Functions and more

Data Truncation

Reply
Super Contributor
Posts: 673

Data Truncation


The default lengths (the dataset has all character variables)while importing to SAS datasets for all the variables is 100. Later part of the data processing, the lengths of variables are assigned.
Now I would like to compare if data is truncated due to new length. and if so, i would want the row number, old value and truncated value saved in dataset.

for example, the default length for varaible pharma_outlet is 100 and it has the following two values.

mckefsson patient care solutions
mckefsson patient care solutions & medical equipment & devices

The new assigned length is 50.

pharma_outlet                                                                          pharma_outlet_new  
mckefsson patient care solutions                                                  mckefsson patient care solutions
mckefsson patient care solutions & medical equipment & devices           mckefsson patient care solutions & medical equipm

 

from the above, the second record is truncated.
How to output, the second record and its row number?

The datasets have 50 variables and it will be cumbersome to compare old and new column names.

Trusted Advisor
Posts: 1,566

Re: Data Truncation

Try using PROC COMPARE. It will list variables with unequal values.

 

Super User
Posts: 19,809

Re: Data Truncation

Do you have a systematic way of knowing the new and old variable names?

Are they in the same dataset? Or different datasets? If different, multiple different datasets or one? 

 

You're probably looking at a macro, but I would also recommend starting with PROC COMPARE. Note that you can specify which variables need to be compared. 

Super Contributor
Posts: 673

Re: Data Truncation

They can be identified with suffix _new.
They are all in the same dataset.
PROC Star
Posts: 325

Re: Data Truncation

[ Edited ]

i started using complev and I really like , it will help you understand how much difference your current variable value to new value. But your purpose i think below query may work for most of sceanerios. if they are same table then use same name for both tables and change variable names. 

data have1;
infile datalines truncover;
informat var1 $100.;
input var1 $100.;
datalines;
pharma_outlet                                                              
mckefsson patient care solutions
mckefsson patient care solutions & medical equipment & devices 
;
data have2;
informat var1 $50.;
input var1 $50.;
datalines;
pharma_outlet                                                              
mckefsson patient care solutions
mckefsson patient care solutions & medical equipment & devices 
;

proc sql;

select a.var1 as original,
       b.var1 as truncatedvalue     
  from have1 a
  inner join have2 b
  on b.var1 = substr(a.var1,1,50) 
  and a.var1 ne b.var1;

quit;

 

Super Contributor
Posts: 673

Re: Data Truncation

There are no have1 and have2 , there's only one dataset.
PROC Star
Posts: 325

Re: Data Truncation

then something like this

proc sql;

select a.var1 as original,
       b.var_new as truncatedvalue     
  from have a
  inner join have b
  on b.var_new = substr(a.var1,1,50) 
  and a.var1 ne b.var_new;

quit;
Super Contributor
Posts: 673

Re: Data Truncation

As I said there are 50 old variables and 50 new variables. To distinguish
between them, the new variable has suffix _new.
Thinking about data step as it gives flexibility of arrays. But don't know
how to create arrays one for old and one for new.
PROC Star
Posts: 325

Re: Data Truncation

aha, i missed that part

Super User
Posts: 19,809

Re: Data Truncation

data want;
set have;

truncated = 0;

if varName ne varName_New then truncated = 1; 

run;
Super User
Posts: 19,809

Re: Data Truncation

data want;
set have;

array old_names (*) $ <list of old variables here>;
array new_names(*) $ <list of new variables here>;

array flag(*) flag1 - flag50 (50*0); *number of variables here should be specified - note they are initialized to zero for every line;

do i=1 to dim(old_names);
if old_names(i) ne new_names(i) then flag(i) = 1;
end;

run;
Trusted Advisor
Posts: 1,566

Re: Data Truncation

[ Edited ]

Code was edited to fulfill your new query:

"I would want to kep only those variables from old_names and new_varaibles that satisfies the condition old_flag(i) = '1';

say for instance old_name(1) is disease and if the new length and old length of disease dont match then old_flag=1.
so i want to keep only disease_old and disease_new varaibles".

 

Assuming each old variable VAR has its VAR_NEW you can try next code.

Addapt 3 macro variables at top:

%let lib=work; /* adapt library name */
%let dsn=test; /* adapt dataset name */
%let key="key"; /* adapt list of key variables: "key1" "key2" etc. */

/**** For TEST only - start ***/
data test;
key = 'KEY';
varx = 'Short text';
varx_new = 'Short text replaced with long one';
output;
run;
/**** For TEST - end ***/

proc sql;
create table vars as select name
from dictionary.columns
where libname=upcase("&lib") and
memname=upcase("&dsn")
order by name;
quit;

filename pgm '/folders/myshortcuts/My_Folders/flat/temp.sas';

data _null_;
set vars end=eof;
file pgm;
if _N_=1 then do;
put 'data want;';
put "set &lib..&dsn;";
end;
name = lowcase(name);
if name not in (&key) and
index(name,'_new') = 0
then do;
a_line = 'if '||strip(name)|| ' NE '||strip(name)||'_new;'; /* truncated text */
put a_line;
end;
if eof then put 'RUN;';
run;
%include pgm /source2;

The code creates a program as a temporary file and then INCLUDEs it.

Super Contributor
Posts: 673

Re: Data Truncation


data want;
set have;

array old_names(*) $ &old_names;
array new_names(*) $ &tgt_names;
array old_flag(*) $ &old_flags;


do i=1 to dim(old_names);
if length(old_names(i)) ne length(new_names(i)) then old_flag(i) = '1';
else old_flag(i)='0';
if old_flag(i) = '1' then output;
-----Keep Statement logic-----
end;
drop i;


run;

 

I would want to kep only those variables from old_names and new_varaibles that satisfies the condition old_flag(i) = '1';

say for instance old_name(1) is disease and if the new length and old length of disease dont match then old_flag=1.
so i want to keep only disease_old and disease_new varaibles,

Super User
Posts: 19,809

Re: Data Truncation

Keep/Drop are not conditional statements. 

You'll need to process the entire file, then summarize the flags to determine which variables need to be kept or not. 

 

This solution, provided, answers your initial question (below), there was no mention of keep/dropping variables. At any rate, you need to summarize the flags to see if all are 0/1 and then any that have a 1 are added to your list of drop/keep which woud happen in a next step.

 

Now I would like to compare if data is truncated due to new length. and if so, i would want the row number, old value and truncated value saved in dataset.
Super User
Super User
Posts: 7,050

Re: Data Truncation

Use PROC COMPARE.  You can get the list of variables by getting reading the contents.

data have ;
  input var1 $80. ;
  length new_var1 $50 ;
  new_var1=var1;
  x=1;
cards4;
mckefsson patient care solutions                                                    
mckefsson patient care solutions & medical equipment & devices
;;;;

proc contents data=have out=contents noprint;
run;

proc sql ;
  select a.name,b.name 
    into :orig separated by ' '
       , :new  separated by ' '
    from contents a 
       , contents b
    where upcase('NEW_'||a.name) = upcase(b.name)
  ;
quit;

proc compare data=have ;
  var &orig ;
  with &new ;
run;
Ask a Question
Discussion stats
  • 14 replies
  • 305 views
  • 4 likes
  • 5 in conversation