DATA Step, Macro, Functions and more

union of tables with fields in different formats in one iteration

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 17
Accepted Solution

union of tables with fields in different formats in one iteration

hello.

working with a quite "dirty" data, I've met an interesting problem.

lets say, i have two tables, described by:

data tb_x;

informat dt_1 date9.;

format dt_1 date9.;

input dt_1;

datalines;

01jan2012

;

data tb_y;

input dt_1 $11.;

datalines;

01/jan/2012

;

i need to union them in one iteration as:

data cumulative;

set tb_:;

run;

so,

format new_dt date9.;

new_dt = input(dt_1,ANYDTDTE18.);

should be added somehow to the dataset to avoid: "ERROR: Variable dt_1 has been defined as both character and numeric."

please, help!!!

Thank you in advance!

Denis


Accepted Solutions
Solution
‎08-15-2012 01:57 PM
Respected Advisor
Posts: 3,124

Re: union of tables with fields in different formats in one iteration

Sorry late for the party, just got back to my station. I see you already have a working solution, here is just another approach:

/*First, to get the names of all of the data sets that needs to be stacked*/

proc sql noprint;

select distinct cats(memname) into :dname separated by ' ' FROM DICTIONARY.COLUMNS WHERE libname='WORK' and substr(memname,1,3)='TB_'; /*someone please tell me why =: is not working here? is it a SQL thing?*/

quit;

/*Second: Use idea from Patrick's first post, to alter the variable type for all of your data set of interest*/

%macro typ_change;

%do i=1 %to %sysfunc(countw(&dname));

%let temp=%scan(&dname,&i);

data &temp;

set &temp (rename=(dt_1=_dt));

if     vtype(_dt)='C' then dt_1=input(_dt,ANYDTDTE18.);

else dt_1=_dt;

drop _dt;

run;

%end;

%mend;

%typ_change

/*Last: Use your original code without sweat*/

data want;

set tb_:;

format dt_1 date9.;

run;

Haikuo

View solution in original post


All Replies
Respected Advisor
Posts: 3,892

Re: union of tables with fields in different formats in one iteration

Just use an appropriate informat in your 2nd data step so that you end up with a numeric variable containing a SAS date value in both data sets.

data tb_y;

input dt_1 date11.;

format dt_1 date9.;

datalines;

01/jan/2012

;

or if the date string in your raw data are mixed within one file then you could use code like

options datestyle=dmy;

data demo;

input dt_1 :anydtdte.;

format dt_1 date9.;

datalines;

01/jan/2012

01jan2012

01012012

01jan2012:06:12:15

;

run;

.... or following Hai.Kuo's lead that you've got the data already as SAS tables code like below could do:

options datestyle=dmy;

proc sql;

  create table want as

  select *

  from tb_x

  union corr all

  select input(dt_1,anydtdte21.) as dt_1

  from tb_y

  ;

quit;

Occasional Contributor
Posts: 17

Re: union of tables with fields in different formats in one iteration

thx, Patrick, but it was just a descritpion.

the problem is that i have a lot tables with prefix "tb_" and fields with one name in different formats!

Denis

Respected Advisor
Posts: 3,124

Re: union of tables with fields in different formats in one iteration

This should do it:

data tb_x;

informat dt_1 date9.;

format dt_1 date9.;

input dt_1;

datalines;

01jan2012

;

data tb_y;

input dt_1 $11.;

datalines;

01/jan/2012

;

data want;

set tb_x tb_y (rename=dt_1=_dt);

new_dt=input(_dt,ANYDTDTE18.);

dt_1=coalesce(dt_1,new_dt);

keep dt_1;

run;

Haikuo

Occasional Contributor
Posts: 17

Re: union of tables with fields in different formats in one iteration

i have dozens of tables, with prefix "tb_" in its names and dt_1 field defined as character in one part of tables and as numeric in another, so ":" must be used for union.

your script works perfectly for exactly this example, knowing:

1. table names

2. where tb_1 as a date and where as a character

so, unfortunately, it does not solve of my problem

Respected Advisor
Posts: 3,124

Re: union of tables with fields in different formats in one iteration

IMHO, I doubt if there is  a true ONE data step solution for you. However, you can always apply the similar logic into Macro to save some coding labor if the naming pattern of your data sets can be captured.

Haikuo

Occasional Contributor
Posts: 17

Re: union of tables with fields in different formats in one iteration

"if the naming pattern of your data sets can be captured"- looks like it can't, prefix "tb_" is only common part.

typing each tables name to correct field's type to correct one is a nightmare!

Denis

Respected Advisor
Posts: 3,892

Re: union of tables with fields in different formats in one iteration

Here some code doing what Hai.Kuo suggested. You probabely will need to amend it a bit to make it fit your real data.

What the program does is creating code based on data dictionary information and then %include the created code to execute it.

If you want to see the generated code without executing it then just uncomment the line "file print;" and comment the lines "file code;" and "%include code;"

filename code temp;
data test;
/*  file print;*/
  file code;
  set sashelp.vcolumn (where=(libname='WORK' and substrn(memname,1,3)='TB_' and upcase(name) = 'DT_1')) end=last;
  if _n_=1 then
  do;
    put 'data want;';
    put @2 'set ';
  end;
  if xtype='num' then
  do;
    LineOfCode=cats(libname,'.',memname);
    put @4 LineOfCode;
  end;
  else if xtype='char' then
  do;
    LineOfCode=cats(libname,'.',memname,' (rename=dt_1=_dt)');
    put @4 LineOfCode;
  end;

  if last then
  do;
    put @2 ';';
    put @2 'new_dt=input(_dt,ANYDTDTE18.);';
    put @2 'dt_1=coalesce(dt_1,new_dt);';
    put @2 'drop _: new_dt;';
    put 'run;';
  end;
run;

%include code;

Super User
Posts: 5,083

Re: union of tables with fields in different formats in one iteration

Apologies, Patrick.  If I had known you were working on this, I wouldn't have interfered!

Respected Advisor
Posts: 3,892

Re: union of tables with fields in different formats in one iteration

Absolutely no need for apologies. It's always very interesting to see how others approach a problem - and I'm always learning so much from you guys.

Super User
Posts: 5,083

Re: union of tables with fields in different formats in one iteration

To automate, begin by finding all the data set names.  For example:

proc sql noprint;

   create table my_datasets as select memname, type

   from dictionary.columns

   where libname='WORK' and substr(memname,1,3) = 'TB_' and upcase(name)='DT_1';

quit;

That will give you a list of your data set names that begin with TB_, and TYPE will be either "char" or "num" depending on how dt_1 is defined.

From that point, there are a few ways to use the list to generate the code you need.  Here is one:


data _null_;

   set my_datasets end=done;

   if _n_=1 then call execute ('data cumulative;')

   end_flag = 'done' || strip(put(_n_,3.));

   call execute('do until (' || end_flag || '); set ' || memname);

   if type='num' then call execute('end = ' || end_flag || ';');

   else call execute('(rename=(dt_1=tempvar)) end = ' || end_flag || '; dt_1 = input(tempvar,date9.); drop tempvar;');

   call execute('output; end;');

   if done then call execute('stop; run;');

run;

It's untested code, so I may have overlooked a piece here or there, but it uses a viable approach.  Each observation from MY_DATASETS lets you generate a loop that looks like either:

do until (done1);

   set tb_x;

   output;

end;

OR

do until (done2);

   set tb_y (rename=(dt_1=tempvar)) end=done2;

   dt_1 = input(tempvar,date9.);

   output;

   drop tempvar;

end;

Good luck.

Occasional Contributor
Posts: 17

Re: union of tables with fields in different formats in one iteration

Patrick, Astounding!

Guys, thanks both of you!

Frankly speaking I wasn’t expecting so fast reaction on my post.

Patrick, your approach is not too clear for me, but it works fine and I’ll definitely analyze its logic and syntax.

Astounding, your approach does not work correctly for fields with character type (year=1920 in output for any value), but looks interesting anyway.

I really do appreciate this.

Super User
Posts: 5,083

Re: union of tables with fields in different formats in one iteration

Yup, that's an oversight of the untested code.  I tried to generate this:

dt_1 = input(tempvar, date9.);

Using date9 only reads  '01/jan/20'  from  '01/jan/2012' so that's where the 1920 is coming from.  The switch to a wider format, such as ANYDTDTE18. should take care of that. 

Respected Advisor
Posts: 3,892

Re: union of tables with fields in different formats in one iteration

Hi Denis

Astounding and I are basically doing the same thing. We're just using different programming techniques. What I'm doing is queriying the dictionary tables to select all relevant tables " sashelp.vcolumn (where=(libname='WORK' and substrn(memname,1,3)='TB_' and upcase(name) = 'DT_1')) end=last;"

I then check if the "DT_1" column is character or numeric (" xtype='num') and based on that I write some text to a temporary file (the put statements). So I basically write text to a file which happens to be valid SAS code (put statements writing text to "filename code temp;").

Once this is done I use "%include code;" to include the code I've created in the "data _null_;" step. This executes the code.

This is a technique to dynamically create SAS code and then execute it.

Astounding proposes principally the same approach. He just uses "call execute" for this. I personally believe that using the "filename code temp;" approach is more transparent as it allows you to write the code to the output destination ("file print") during development which makes it very easy to check if you're getting the code generation right. Not sure how one debugs stuff when using the "call execute" approach.

Hope that made it a bit clearer for you.

P.S: If you add before the include statement "options source2;" you'll see in the log what code get's included for execution by the %include statement.

Solution
‎08-15-2012 01:57 PM
Respected Advisor
Posts: 3,124

Re: union of tables with fields in different formats in one iteration

Sorry late for the party, just got back to my station. I see you already have a working solution, here is just another approach:

/*First, to get the names of all of the data sets that needs to be stacked*/

proc sql noprint;

select distinct cats(memname) into :dname separated by ' ' FROM DICTIONARY.COLUMNS WHERE libname='WORK' and substr(memname,1,3)='TB_'; /*someone please tell me why =: is not working here? is it a SQL thing?*/

quit;

/*Second: Use idea from Patrick's first post, to alter the variable type for all of your data set of interest*/

%macro typ_change;

%do i=1 %to %sysfunc(countw(&dname));

%let temp=%scan(&dname,&i);

data &temp;

set &temp (rename=(dt_1=_dt));

if     vtype(_dt)='C' then dt_1=input(_dt,ANYDTDTE18.);

else dt_1=_dt;

drop _dt;

run;

%end;

%mend;

%typ_change

/*Last: Use your original code without sweat*/

data want;

set tb_:;

format dt_1 date9.;

run;

Haikuo

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 15 replies
  • 509 views
  • 6 likes
  • 4 in conversation