BookmarkSubscribeRSS Feed
SASPhile
Quartz | Level 8


The default lengths (the dataset has all character variables)while importing to SAS datasets for all the variables is 100. Later part of the data processing, the lengths of variables are assigned.
Now I would like to compare if data is truncated due to new length. and if so, i would want the row number, old value and truncated value saved in dataset.

for example, the default length for varaible pharma_outlet is 100 and it has the following two values.

mckefsson patient care solutions
mckefsson patient care solutions & medical equipment & devices

The new assigned length is 50.

pharma_outlet                                                                          pharma_outlet_new  
mckefsson patient care solutions                                                  mckefsson patient care solutions
mckefsson patient care solutions & medical equipment & devices           mckefsson patient care solutions & medical equipm

 

from the above, the second record is truncated.
How to output, the second record and its row number?

The datasets have 50 variables and it will be cumbersome to compare old and new column names.

14 REPLIES 14
Shmuel
Garnet | Level 18

Try using PROC COMPARE. It will list variables with unequal values.

 

Reeza
Super User

Do you have a systematic way of knowing the new and old variable names?

Are they in the same dataset? Or different datasets? If different, multiple different datasets or one? 

 

You're probably looking at a macro, but I would also recommend starting with PROC COMPARE. Note that you can specify which variables need to be compared. 

SASPhile
Quartz | Level 8
They can be identified with suffix _new.
They are all in the same dataset.
kiranv_
Rhodochrosite | Level 12

i started using complev and I really like , it will help you understand how much difference your current variable value to new value. But your purpose i think below query may work for most of sceanerios. if they are same table then use same name for both tables and change variable names. 

data have1;
infile datalines truncover;
informat var1 $100.;
input var1 $100.;
datalines;
pharma_outlet                                                              
mckefsson patient care solutions
mckefsson patient care solutions & medical equipment & devices 
;
data have2;
informat var1 $50.;
input var1 $50.;
datalines;
pharma_outlet                                                              
mckefsson patient care solutions
mckefsson patient care solutions & medical equipment & devices 
;

proc sql;

select a.var1 as original,
       b.var1 as truncatedvalue     
  from have1 a
  inner join have2 b
  on b.var1 = substr(a.var1,1,50) 
  and a.var1 ne b.var1;

quit;

 

SASPhile
Quartz | Level 8
There are no have1 and have2 , there's only one dataset.
kiranv_
Rhodochrosite | Level 12

then something like this

proc sql;

select a.var1 as original,
       b.var_new as truncatedvalue     
  from have a
  inner join have b
  on b.var_new = substr(a.var1,1,50) 
  and a.var1 ne b.var_new;

quit;
SASPhile
Quartz | Level 8
As I said there are 50 old variables and 50 new variables. To distinguish
between them, the new variable has suffix _new.
Thinking about data step as it gives flexibility of arrays. But don't know
how to create arrays one for old and one for new.
kiranv_
Rhodochrosite | Level 12

aha, i missed that part

Reeza
Super User
data want;
set have;

truncated = 0;

if varName ne varName_New then truncated = 1; 

run;
Reeza
Super User
data want;
set have;

array old_names (*) $ <list of old variables here>;
array new_names(*) $ <list of new variables here>;

array flag(*) flag1 - flag50 (50*0); *number of variables here should be specified - note they are initialized to zero for every line;

do i=1 to dim(old_names);
if old_names(i) ne new_names(i) then flag(i) = 1;
end;

run;
Shmuel
Garnet | Level 18

Code was edited to fulfill your new query:

"I would want to kep only those variables from old_names and new_varaibles that satisfies the condition old_flag(i) = '1';

say for instance old_name(1) is disease and if the new length and old length of disease dont match then old_flag=1.
so i want to keep only disease_old and disease_new varaibles".

 

Assuming each old variable VAR has its VAR_NEW you can try next code.

Addapt 3 macro variables at top:

%let lib=work; /* adapt library name */
%let dsn=test; /* adapt dataset name */
%let key="key"; /* adapt list of key variables: "key1" "key2" etc. */

/**** For TEST only - start ***/
data test;
key = 'KEY';
varx = 'Short text';
varx_new = 'Short text replaced with long one';
output;
run;
/**** For TEST - end ***/

proc sql;
create table vars as select name
from dictionary.columns
where libname=upcase("&lib") and
memname=upcase("&dsn")
order by name;
quit;

filename pgm '/folders/myshortcuts/My_Folders/flat/temp.sas';

data _null_;
set vars end=eof;
file pgm;
if _N_=1 then do;
put 'data want;';
put "set &lib..&dsn;";
end;
name = lowcase(name);
if name not in (&key) and
index(name,'_new') = 0
then do;
a_line = 'if '||strip(name)|| ' NE '||strip(name)||'_new;'; /* truncated text */
put a_line;
end;
if eof then put 'RUN;';
run;
%include pgm /source2;

The code creates a program as a temporary file and then INCLUDEs it.

SASPhile
Quartz | Level 8


data want;
set have;

array old_names(*) $ &old_names;
array new_names(*) $ &tgt_names;
array old_flag(*) $ &old_flags;


do i=1 to dim(old_names);
if length(old_names(i)) ne length(new_names(i)) then old_flag(i) = '1';
else old_flag(i)='0';
if old_flag(i) = '1' then output;
-----Keep Statement logic-----
end;
drop i;


run;

 

I would want to kep only those variables from old_names and new_varaibles that satisfies the condition old_flag(i) = '1';

say for instance old_name(1) is disease and if the new length and old length of disease dont match then old_flag=1.
so i want to keep only disease_old and disease_new varaibles,

Reeza
Super User

Keep/Drop are not conditional statements. 

You'll need to process the entire file, then summarize the flags to determine which variables need to be kept or not. 

 

This solution, provided, answers your initial question (below), there was no mention of keep/dropping variables. At any rate, you need to summarize the flags to see if all are 0/1 and then any that have a 1 are added to your list of drop/keep which woud happen in a next step.

 

Now I would like to compare if data is truncated due to new length. and if so, i would want the row number, old value and truncated value saved in dataset.
Tom
Super User Tom
Super User

Use PROC COMPARE.  You can get the list of variables by getting reading the contents.

data have ;
  input var1 $80. ;
  length new_var1 $50 ;
  new_var1=var1;
  x=1;
cards4;
mckefsson patient care solutions                                                    
mckefsson patient care solutions & medical equipment & devices
;;;;

proc contents data=have out=contents noprint;
run;

proc sql ;
  select a.name,b.name 
    into :orig separated by ' '
       , :new  separated by ' '
    from contents a 
       , contents b
    where upcase('NEW_'||a.name) = upcase(b.name)
  ;
quit;

proc compare data=have ;
  var &orig ;
  with &new ;
run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 14 replies
  • 1884 views
  • 4 likes
  • 5 in conversation