Help using Base SAS procedures

Mass formatting

Reply
Contributor
Posts: 35

Mass formatting

Hello SAS Wizards,

Once again I find myself looking to your inestimable, collective wisdom.  My problem is pretty simple (I think), but I'm trying to find an efficient method that doesn't have me coding for hours.

Basically, I have two datasets; one older, one newer.  The older dataset has information from 1990 to 2010.  The newer dataset has information for 2011.  Each dataset has the same 82 variables with the same 82 variable names (though the variables are in a different order in the newer set).  The variable names do not follow an easy naming convention (by easy naming convention, I mean something like var1 through var82; instead the variables are named things like AccT, Dep, PIC, etc.)

The problem I have is that the older dataset has correct formatting (i.e., some variables are dates, some are characters, some are numeric) whereas the newer dataset has all variables stored as characters.  I need to get the two datasets appended into a single set (with the correct formatting of the older dataset), but if I PROC APPEND with FORCE, then all the numeric/date variables in the second dataset become missing because they are stored as characters.  Currently all I can come up with is writing hundreds of lines of code where I manually create a twin "new" variable in the newer dataset that is equal to the old variable but with correct formatting and then drop the old variable, but that will be hours of coding with potential for huge amounts of human error.

I'm mostly curious if there is an easy way to grab the formatting from the first dataset and then just quickly apply the formatting to the new dataset (the variables have the same names), or just some phenomenal coding-magic to get it done quickly, efficiently, and with as little human error as possible.

Thanks!

Super Contributor
Posts: 1,636

Re: Mass formatting

Hi,

the code below converts all the character variables in a dataset to numeric variables.

data have;

  input (a b c d e)($) ;

  cards;

11 22 33 44 55

  ;

proc sql noprint;

select catx(' ','input(',name,',best12.) as',name) into : list separated by ','

  from dictionary.columns

   where libname='WORK' and memname='HAVE' and type ='char'; /*note: libname and  memname must be capital letters */

create table want as

  select &list from have;

quit;

proc contents data=want;run;

Linlin

Contributor
Posts: 35

Re: Mass formatting

Roughly 1/10 of the variables need to remain formatted as character.  While a mass conversion to numeric gets me closer to what I need in terms of what percent of the variables are formatted correctly, strictly speaking it is not a perfect solution.

Super Contributor
Posts: 1,636

Re: Mass formatting

you can exclude the variables that should not be converted:

data have;

  input id sex (a b c d e)($) ;

  cards;

1 2 11 22 33 44 55

  ;

proc sql noprint;

select catx(' ','input(',name,',best12.) as',name) into : list1 separated by ','

  from dictionary.columns

   where libname='WORK' and memname='HAVE' and type ='char' and name not in ('a','b');

select name into : list2 separated by ','

  from dictionary.columns

   where libname='WORK' and memname='HAVE' and type ='num' ;

create table want as

  select &list1,&list2,a,b from have;

quit;

proc contents data=want;run;

Contributor
Posts: 35

Re: Mass formatting

There is some weird stuff happening with this code.  Some of my variables that need to be converted to numeric are stored as text in scientific notation (e.g., they might be stored in text as something similar to 9.785643E13), but when I run LinLin's code on them, rather than the result coming out as a numeric equivalent to the scientific notation, they are often truncated down to only the decimal part (i.e., the resulting number is only 9.785643, not 97856430000000).

Thoughts?

Super Contributor
Posts: 1,636

Re: Mass formatting

try to put your variables in scientific notation in &list3, your date variables in &list4 if you have any.

data have;
length f1 f2 $11 date1 date2 $10;
  input id sex (a b c d e)($) f1 f2 date1 date2;
  cards;
1 2 11 22 33 44 55 9.785643E13 8.785643E10 01/12/2012 2/12/2012
;
proc sql noprint;

select catx(' ','input(',name,',best32.) as',name) into : list1 separated by ','
  from dictionary.columns
   where libname='WORK' and memname='HAVE' and type ='char' and name not in ('a','b','f1','f2','date1','date2');

select name into : list2 separated by ','  from dictionary.columns 
   where libname='WORK' and memname='HAVE' and type ='num' ;

select catx(' ','input(',name,',e13.) as',name) into : list3 separated by ','  from dictionary.columns
   where libname='WORK' and memname='HAVE' and name in ('f1','f2');

select catx(' ','input(',name,',mmddyy10.) as',name) into : list4 separated by ','  from dictionary.columns
   where libname='WORK' and memname='HAVE' and name in ('date1','date2');

create table want as
  select &list1,&list2,&list3,&list4,a,b from have;

quit;
proc contents data=want;run;proc print;run;

Respected Advisor
Posts: 3,799

Re: Mass formatting

For this technique you will need to associate INFORMATS to the numeric variables that need them, especially SAS DATE variables.  In my example I just created the BASE data with PROC IMPORT from a CSV and it created the informats.  You could associate the informats permanently or with in INFORMAT statement I have mark the place below.  The entire self contained example is attached.

*** BASE=HEART the existing good data;

*** DATA=HEARTC the data to append that is all CHARACTER and needs to be converted;

*** Target variables: nums in base;

proc transpose data=heart(obs=0) out=heartNumVars;

   var _numeric_;

   run;

Proc sql noprint;

   select _name_ into :numvars separated by ' ' from heartNumVars;

   quit;

   run;

*** Flip so we can read them;

proc transpose data=heartC out=tall;

   by _obs_;

   var &numvars;

   run;

data tall(drop=&numvars);

   set tall;

   if vtypeX(_NAME_) eq 'N' then num1 = inputN(col1,vinformatX(_NAME_));

   return;

   set heart(keep=&numvars);  *** this gives INFORMAT;

   *INFORMAT var infmt. ...;  *** could even be done here;

   * to make a few adjustments;

   run;

proc print data=tall width=min;

   run;

*** Back to wide as numeric;

proc transpose data=tall out=Nums;

   by _obs_;

   var num1;

   run;

*** Merge with untransposed character variables;

data heartC2Num;

   merge heartC(drop=&numvars) nums;

   by _obs_;

   run;

proc contents varnum;

proc print width=min;

   run;

*** Append to base;

options varlenchk=nowarn;

data heart;

   if 0 then modify heart;

   set heartC2num;

   output;

   run;

options varlenchk=warn;

proc print width=min data=heart;

   run;

Attachment
Super User
Posts: 10,046

Re: Mass formatting

Since in SAS there is only two different type variables , numeric and character .

What we need to do is changing Character into Numeric in NEW dataset.

The format will be keeped completely at OLD dataset.

data old;
 set sashelp.class;
 format age yymmdd10. height dollar12.2;
run;

data new;
length name $ 8 ;
age='32'; height='123.32';weight='34.32';name='Patrick';sex='M';output;
age='32'; height='123.32';weight='34.32';name='Tom';sex='M';output;
age='32'; height='123.32';weight='34.32';name='Arthur';sex='M';output;
run;
proc sql noprint;
select name into : name separated by ' '
  from dictionary.columns
   where libname='WORK' and memname='OLD' and type ='num' ;

select cats('_',name,'=input(',name,',best32.);') into : change separated by ' '
  from dictionary.columns
   where libname='WORK' and memname='OLD' and type ='num' ;

select cats('_',name,'=',name) into : rename separated by ' '
  from dictionary.columns
   where libname='WORK' and memname='OLD' and type ='num' ;
quit;
data _new(drop= &name );
 set new ;
 &change 
run;
proc datasets library=work nolist;
 modify _new;
 rename &rename ;
quit;
proc append base=old data=_new force;run;

Ksharp

Ask a Question
Discussion stats
  • 7 replies
  • 428 views
  • 0 likes
  • 4 in conversation