The default lengths (the dataset has all character variables)while importing to SAS datasets for all the variables is 100. Later part of the data processing, the lengths of variables are assigned.
Now I would like to compare if data is truncated due to new length. and if so, i would want the row number, old value and truncated value saved in dataset.
for example, the default length for varaible pharma_outlet is 100 and it has the following two values.
mckefsson patient care solutions
mckefsson patient care solutions & medical equipment & devices
The new assigned length is 50.
pharma_outlet | pharma_outlet_new |
mckefsson patient care solutions | mckefsson patient care solutions |
mckefsson patient care solutions & medical equipment & devices | mckefsson patient care solutions & medical equipm |
from the above, the second record is truncated.
How to output, the second record and its row number?
The datasets have 50 variables and it will be cumbersome to compare old and new column names.
Try using PROC COMPARE. It will list variables with unequal values.
Do you have a systematic way of knowing the new and old variable names?
Are they in the same dataset? Or different datasets? If different, multiple different datasets or one?
You're probably looking at a macro, but I would also recommend starting with PROC COMPARE. Note that you can specify which variables need to be compared.
i started using complev and I really like , it will help you understand how much difference your current variable value to new value. But your purpose i think below query may work for most of sceanerios. if they are same table then use same name for both tables and change variable names.
data have1; infile datalines truncover; informat var1 $100.; input var1 $100.; datalines; pharma_outlet mckefsson patient care solutions mckefsson patient care solutions & medical equipment & devices ; data have2; informat var1 $50.; input var1 $50.; datalines; pharma_outlet mckefsson patient care solutions mckefsson patient care solutions & medical equipment & devices ; proc sql; select a.var1 as original, b.var1 as truncatedvalue from have1 a inner join have2 b on b.var1 = substr(a.var1,1,50) and a.var1 ne b.var1; quit;
then something like this
proc sql; select a.var1 as original, b.var_new as truncatedvalue from have a inner join have b on b.var_new = substr(a.var1,1,50) and a.var1 ne b.var_new; quit;
aha, i missed that part
data want;
set have;
truncated = 0;
if varName ne varName_New then truncated = 1;
run;
data want;
set have;
array old_names (*) $ <list of old variables here>;
array new_names(*) $ <list of new variables here>;
array flag(*) flag1 - flag50 (50*0); *number of variables here should be specified - note they are initialized to zero for every line;
do i=1 to dim(old_names);
if old_names(i) ne new_names(i) then flag(i) = 1;
end;
run;
Code was edited to fulfill your new query:
"I would want to kep only those variables from old_names and new_varaibles that satisfies the condition old_flag(i) = '1';
say for instance old_name(1) is disease and if the new length and old length of disease dont match then old_flag=1.
so i want to keep only disease_old and disease_new varaibles".
Assuming each old variable VAR has its VAR_NEW you can try next code.
Addapt 3 macro variables at top:
%let lib=work; /* adapt library name */
%let dsn=test; /* adapt dataset name */
%let key="key"; /* adapt list of key variables: "key1" "key2" etc. */
/**** For TEST only - start ***/
data test;
key = 'KEY';
varx = 'Short text';
varx_new = 'Short text replaced with long one';
output;
run;
/**** For TEST - end ***/
proc sql;
create table vars as select name
from dictionary.columns
where libname=upcase("&lib") and
memname=upcase("&dsn")
order by name;
quit;
filename pgm '/folders/myshortcuts/My_Folders/flat/temp.sas';
data _null_;
set vars end=eof;
file pgm;
if _N_=1 then do;
put 'data want;';
put "set &lib..&dsn;";
end;
name = lowcase(name);
if name not in (&key) and
index(name,'_new') = 0
then do;
a_line = 'if '||strip(name)|| ' NE '||strip(name)||'_new;'; /* truncated text */
put a_line;
end;
if eof then put 'RUN;';
run;
%include pgm /source2;
The code creates a program as a temporary file and then INCLUDEs it.
data want;
set have;
array old_names(*) $ &old_names;
array new_names(*) $ &tgt_names;
array old_flag(*) $ &old_flags;
do i=1 to dim(old_names);
if length(old_names(i)) ne length(new_names(i)) then old_flag(i) = '1';
else old_flag(i)='0';
if old_flag(i) = '1' then output;
-----Keep Statement logic-----
end;
drop i;
run;
I would want to kep only those variables from old_names and new_varaibles that satisfies the condition old_flag(i) = '1';
say for instance old_name(1) is disease and if the new length and old length of disease dont match then old_flag=1.
so i want to keep only disease_old and disease_new varaibles,
Keep/Drop are not conditional statements.
You'll need to process the entire file, then summarize the flags to determine which variables need to be kept or not.
This solution, provided, answers your initial question (below), there was no mention of keep/dropping variables. At any rate, you need to summarize the flags to see if all are 0/1 and then any that have a 1 are added to your list of drop/keep which woud happen in a next step.
Now I would like to compare if data is truncated due to new length. and if so, i would want the row number, old value and truncated value saved in dataset.
Use PROC COMPARE. You can get the list of variables by getting reading the contents.
data have ;
input var1 $80. ;
length new_var1 $50 ;
new_var1=var1;
x=1;
cards4;
mckefsson patient care solutions
mckefsson patient care solutions & medical equipment & devices
;;;;
proc contents data=have out=contents noprint;
run;
proc sql ;
select a.name,b.name
into :orig separated by ' '
, :new separated by ' '
from contents a
, contents b
where upcase('NEW_'||a.name) = upcase(b.name)
;
quit;
proc compare data=have ;
var &orig ;
with &new ;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.