BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Denis
Calcite | Level 5

hello.

working with a quite "dirty" data, I've met an interesting problem.

lets say, i have two tables, described by:

data tb_x;

informat dt_1 date9.;

format dt_1 date9.;

input dt_1;

datalines;

01jan2012

;

data tb_y;

input dt_1 $11.;

datalines;

01/jan/2012

;

i need to union them in one iteration as:

data cumulative;

set tb_:;

run;

so,

format new_dt date9.;

new_dt = input(dt_1,ANYDTDTE18.);

should be added somehow to the dataset to avoid: "ERROR: Variable dt_1 has been defined as both character and numeric."

please, help!!!

Thank you in advance!

Denis

1 ACCEPTED SOLUTION

Accepted Solutions
Haikuo
Onyx | Level 15

Sorry late for the party, just got back to my station. I see you already have a working solution, here is just another approach:

/*First, to get the names of all of the data sets that needs to be stacked*/

proc sql noprint;

select distinct cats(memname) into :dname separated by ' ' FROM DICTIONARY.COLUMNS WHERE libname='WORK' and substr(memname,1,3)='TB_'; /*someone please tell me why =: is not working here? is it a SQL thing?*/

quit;

/*Second: Use idea from Patrick's first post, to alter the variable type for all of your data set of interest*/

%macro typ_change;

%do i=1 %to %sysfunc(countw(&dname));

%let temp=%scan(&dname,&i);

data &temp;

set &temp (rename=(dt_1=_dt));

if     vtype(_dt)='C' then dt_1=input(_dt,ANYDTDTE18.);

else dt_1=_dt;

drop _dt;

run;

%end;

%mend;

%typ_change

/*Last: Use your original code without sweat*/

data want;

set tb_:;

format dt_1 date9.;

run;

Haikuo

View solution in original post

15 REPLIES 15
Patrick
Opal | Level 21

Just use an appropriate informat in your 2nd data step so that you end up with a numeric variable containing a SAS date value in both data sets.

data tb_y;

input dt_1 date11.;

format dt_1 date9.;

datalines;

01/jan/2012

;

or if the date string in your raw data are mixed within one file then you could use code like

options datestyle=dmy;

data demo;

input dt_1 :anydtdte.;

format dt_1 date9.;

datalines;

01/jan/2012

01jan2012

01012012

01jan2012:06:12:15

;

run;

.... or following Hai.Kuo's lead that you've got the data already as SAS tables code like below could do:

options datestyle=dmy;

proc sql;

  create table want as

  select *

  from tb_x

  union corr all

  select input(dt_1,anydtdte21.) as dt_1

  from tb_y

  ;

quit;

Denis
Calcite | Level 5

thx, Patrick, but it was just a descritpion.

the problem is that i have a lot tables with prefix "tb_" and fields with one name in different formats!

Denis

Haikuo
Onyx | Level 15

This should do it:

data tb_x;

informat dt_1 date9.;

format dt_1 date9.;

input dt_1;

datalines;

01jan2012

;

data tb_y;

input dt_1 $11.;

datalines;

01/jan/2012

;

data want;

set tb_x tb_y (rename=dt_1=_dt);

new_dt=input(_dt,ANYDTDTE18.);

dt_1=coalesce(dt_1,new_dt);

keep dt_1;

run;

Haikuo

Denis
Calcite | Level 5

i have dozens of tables, with prefix "tb_" in its names and dt_1 field defined as character in one part of tables and as numeric in another, so ":" must be used for union.

your script works perfectly for exactly this example, knowing:

1. table names

2. where tb_1 as a date and where as a character

so, unfortunately, it does not solve of my problem

Haikuo
Onyx | Level 15

IMHO, I doubt if there is  a true ONE data step solution for you. However, you can always apply the similar logic into Macro to save some coding labor if the naming pattern of your data sets can be captured.

Haikuo

Denis
Calcite | Level 5

"if the naming pattern of your data sets can be captured"- looks like it can't, prefix "tb_" is only common part.

typing each tables name to correct field's type to correct one is a nightmare!

Denis

Patrick
Opal | Level 21

Here some code doing what Hai.Kuo suggested. You probabely will need to amend it a bit to make it fit your real data.

What the program does is creating code based on data dictionary information and then %include the created code to execute it.

If you want to see the generated code without executing it then just uncomment the line "file print;" and comment the lines "file code;" and "%include code;"

filename code temp;
data test;
/*  file print;*/
  file code;
  set sashelp.vcolumn (where=(libname='WORK' and substrn(memname,1,3)='TB_' and upcase(name) = 'DT_1')) end=last;
  if _n_=1 then
  do;
    put 'data want;';
    put @2 'set ';
  end;
  if xtype='num' then
  do;
    LineOfCode=cats(libname,'.',memname);
    put @4 LineOfCode;
  end;
  else if xtype='char' then
  do;
    LineOfCode=cats(libname,'.',memname,' (rename=dt_1=_dt)');
    put @4 LineOfCode;
  end;

  if last then
  do;
    put @2 ';';
    put @2 'new_dt=input(_dt,ANYDTDTE18.);';
    put @2 'dt_1=coalesce(dt_1,new_dt);';
    put @2 'drop _: new_dt;';
    put 'run;';
  end;
run;

%include code;

Astounding
PROC Star

Apologies, Patrick.  If I had known you were working on this, I wouldn't have interfered!

Patrick
Opal | Level 21

Absolutely no need for apologies. It's always very interesting to see how others approach a problem - and I'm always learning so much from you guys.

Astounding
PROC Star

To automate, begin by finding all the data set names.  For example:

proc sql noprint;

   create table my_datasets as select memname, type

   from dictionary.columns

   where libname='WORK' and substr(memname,1,3) = 'TB_' and upcase(name)='DT_1';

quit;

That will give you a list of your data set names that begin with TB_, and TYPE will be either "char" or "num" depending on how dt_1 is defined.

From that point, there are a few ways to use the list to generate the code you need.  Here is one:


data _null_;

   set my_datasets end=done;

   if _n_=1 then call execute ('data cumulative;')

   end_flag = 'done' || strip(put(_n_,3.));

   call execute('do until (' || end_flag || '); set ' || memname);

   if type='num' then call execute('end = ' || end_flag || ';');

   else call execute('(rename=(dt_1=tempvar)) end = ' || end_flag || '; dt_1 = input(tempvar,date9.); drop tempvar;');

   call execute('output; end;');

   if done then call execute('stop; run;');

run;

It's untested code, so I may have overlooked a piece here or there, but it uses a viable approach.  Each observation from MY_DATASETS lets you generate a loop that looks like either:

do until (done1);

   set tb_x;

   output;

end;

OR

do until (done2);

   set tb_y (rename=(dt_1=tempvar)) end=done2;

   dt_1 = input(tempvar,date9.);

   output;

   drop tempvar;

end;

Good luck.

Denis
Calcite | Level 5

Patrick, Astounding!

Guys, thanks both of you!

Frankly speaking I wasn’t expecting so fast reaction on my post.

Patrick, your approach is not too clear for me, but it works fine and I’ll definitely analyze its logic and syntax.

Astounding, your approach does not work correctly for fields with character type (year=1920 in output for any value), but looks interesting anyway.

I really do appreciate this.

Astounding
PROC Star

Yup, that's an oversight of the untested code.  I tried to generate this:

dt_1 = input(tempvar, date9.);

Using date9 only reads  '01/jan/20'  from  '01/jan/2012' so that's where the 1920 is coming from.  The switch to a wider format, such as ANYDTDTE18. should take care of that. 

Patrick
Opal | Level 21

Hi Denis

Astounding and I are basically doing the same thing. We're just using different programming techniques. What I'm doing is queriying the dictionary tables to select all relevant tables " sashelp.vcolumn (where=(libname='WORK' and substrn(memname,1,3)='TB_' and upcase(name) = 'DT_1')) end=last;"

I then check if the "DT_1" column is character or numeric (" xtype='num') and based on that I write some text to a temporary file (the put statements). So I basically write text to a file which happens to be valid SAS code (put statements writing text to "filename code temp;").

Once this is done I use "%include code;" to include the code I've created in the "data _null_;" step. This executes the code.

This is a technique to dynamically create SAS code and then execute it.

Astounding proposes principally the same approach. He just uses "call execute" for this. I personally believe that using the "filename code temp;" approach is more transparent as it allows you to write the code to the output destination ("file print") during development which makes it very easy to check if you're getting the code generation right. Not sure how one debugs stuff when using the "call execute" approach.

Hope that made it a bit clearer for you.

P.S: If you add before the include statement "options source2;" you'll see in the log what code get's included for execution by the %include statement.

Haikuo
Onyx | Level 15

Sorry late for the party, just got back to my station. I see you already have a working solution, here is just another approach:

/*First, to get the names of all of the data sets that needs to be stacked*/

proc sql noprint;

select distinct cats(memname) into :dname separated by ' ' FROM DICTIONARY.COLUMNS WHERE libname='WORK' and substr(memname,1,3)='TB_'; /*someone please tell me why =: is not working here? is it a SQL thing?*/

quit;

/*Second: Use idea from Patrick's first post, to alter the variable type for all of your data set of interest*/

%macro typ_change;

%do i=1 %to %sysfunc(countw(&dname));

%let temp=%scan(&dname,&i);

data &temp;

set &temp (rename=(dt_1=_dt));

if     vtype(_dt)='C' then dt_1=input(_dt,ANYDTDTE18.);

else dt_1=_dt;

drop _dt;

run;

%end;

%mend;

%typ_change

/*Last: Use your original code without sweat*/

data want;

set tb_:;

format dt_1 date9.;

run;

Haikuo

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 15 replies
  • 1885 views
  • 6 likes
  • 4 in conversation