Hi, @SAS-PD , according to @Tom 's code and comments, I modified my version of code (in the last thread, the data type in the raw data has been changed to $ and an array was added to convert it to numeric using the same &varlist macro variable) to solve your question of convert character data type to numeric for date variables. Could you please let me know if it solve your question, thanks!
Hello dxiao2017 ,
Thank you very much for looking into this.
While creating vdfmtchg, I am getting below error now ...
NOTE: Numeric values have been converted to character
values at the places given by: (Line):(Column).
64:7
NOTE: Invalid argument to function INPUT at line 64 column 18.
-------------------------------------
could it be due to some dates has uu or u or (alphabets) in char dates? If so, how do I solve this?
Thanks for following up!
Hi, @SAS-PD , thanks a lot for reply me!
I think your problem is about converting the date variables to real numeric type, i.e., the data type in the proc content table is numeric or date type, not character.
In my code, although the date values had been converted to numeric that can be used in calculation (you can see they are numeric values as the missing values were displayed as dot), the data type in the proc contents table was still character, also, when look at the attributions of those date variables, they did not have any informat or format, and the data type was character. And this is the problem. As I mentioned in previous thread, I was going to convert those columns in this data table to real numeric ones and tried but did not figure out. One thing I thought can solve the problem but did not try is using a data step to reset the data table and convert those columns to numeric type, like this (I did not try it and the code are not correct😞)
data numtype;
infile datalines;
informat &varlist infromats.;
format &varlist formats.;
set chartype;
datalines;
One thing to do is perhaps add a $ after the array name, like this (I did not run it the and maybe the code cannot run through😞)
data vdfmtchg;
set vdsort;
array dfmtchg[*] $ &varlist;
do i=1 to dim(dfmtchg);
dfmtchg[i]=input(dfmtchg[i],ddmmyy10.);
end;
drop i;
run;
I think this does not solve the problem because the dfmtchg[i] I want to create is numeric, and this $ sign will make SAS confused. Another thing I can think is add another numeric array to create a set of new column, which convert the column in &varlist to numeric, but this also I did not try it.
I do not know what are the "uu or u or (alphabets) in char dates" you talk about, but I guess it is still the problem of data type. One thing to do is inspect and improve your raw datasets, you only have 50+ datasets and 120+ variables, you can inspect what's wrong/not consistent with the data informats, formats, data type, and column values first (using proc content, proc freq or whatever basic steps), and then decide what is the next step, the first thing to do may be make all raw datasets consistent and clean. To conclude, I think the problem is the raw data, if all columns and values in the raw data are conformed to a consistent standard or rule, there would be no such complex problem in the consecutive data editing and analyzing steps.
This step does not make much sense.
data numtype;
infile datalines;
informat &varlist infromats.;
format &varlist formats.;
set chartype;
Having variables listed in the macro variable VARLIST first appear in a INFORMAT statement will have the SIDE EFFECT of forcing SAS to immediately make a GUESS about how to define the variables. Since the informat you used does not start with a $ SAS will assume it is a numeric informat even if it cannot find an informat with the name INFROMATS..
If the variables are actually character variables in the dataset CHARTYPE then you will get an ERROR as a single variable cannot be both numeric and character.
Note that the INFORMAT and FORMAT statements are used to attach informats and/or formats to the variables. If you want to DEFINE the variables use a LENGTH or ATTRIB statement instead.
And then you end the data step with in-line data by adding the DATALINES statement. But the step have no INPUT statement to read any of the in in-line data. Instead end the data step with a RUN statement.
Hi @SAS-PD , have you figured out and solved your question? I write a macro to convert all tables with all columns of character date values to numeric (I assume that in your raw datasets, all date variables are character type, and I write the macro base on this condition). But some major problems need to be solved before use this macro: 1) because I used a nested do loop, it produced 2*3=6 tables (in fact I do not know why it is 6, I thought it should be 2*4=8), i.e., for each of my raw data table, it produced two more duplicates. Also, each time I run this macro again, it produced new variables and tables according to the statement and make more duplicates of the datasets and variables. Without solve this problem the macro cannot be used because it causes mess by the duplications, 2) I am not sure if I should use any %local statement inside the macro to prevent it overwrite any macro variable that already has that macro name, @Tom , could you please take a look on my code and offer some suggestions and modification, thanks very much! And @SAS-PD , could you please let me know if this solve your questions, thanks a lot!
data vdraw1;
input subjid $ var5 $ var6 $ v11dat $
v12dat $ v13dat $;
datalines;
subj1 xxx sy 01/01/21 01/01/22 01/01/23
subj2 xttx ry 02/01/21 02/01/22 02/01/23
;
run;
proc print data=vdraw1;run;
data vdraw2;
input subjid $ var3 $ v21dat $;
datalines;
subj1 xx 02/02/02
subj2 z .
;
run;
proc print data=vdraw2;run;
proc sql;
select distinct memname
into :table1-
from dictionary.columns
where libname='WORK' and
memname like 'VDRAW%' and
name like '%dat';
quit;
%let tableobs=&sqlobs;
%put &tableobs;
proc sql;
select distinct name
into :var1-
from dictionary.columns
where libname='WORK' and
memname like 'VDRAW%' and
name like '%dat';
quit;
%let varobs=&sqlobs;
%put &varobs;
%macro char2num;
%do i=1 %to &tableobs;
data &&table&i.._num;
set &&table&i;
%do j=1 %to &varobs;
&&var&j.._num=input(&&var&j,mmddyy10.);
%end;
run;
proc print data=&&table&i.._num;
proc contents data=&&table&i.._num;
%end;
%mend char2num;
%char2num;
Another thing I would like to mention is that, @SAS-PD , in your raw datasets, the informat and format of the date variables are not consistent, I marked it out (see picture), the informat is ddmmyy., the format is mmddyy., i.e., the date and month are reversed and confusing without knowing which is month and which is date.
The reason you keep find more and more things to convert is because of the way you are selecting the variables and tables. You look for tables with names that start with VDRAW and the new tables you create match that naming pattern also.
If you are going to make a macro I generally find it works better if you make the macro work basically like PROC so you can treat your macro call like just another step in your program.
For this one you might make a macro that operates on one dataset at a time. So something like this:
%macro char2num(dsname,informat=mmddyy10.,format=date9.);
%local libname memname varlist nvars i var;
%* Find the libref and memname of the input dataset ;
%let libname=%scan(%upcase(work.&dsname),-2,.);
%let memname=%scan(%upcase(&dsname),-1,.);
%* Make a list of the %DAT variables that are character;
proc sql noprint;
select name into :varlist separated by '|'
from dictionary.columns
where libname="&libname"
and memname="&memname"
and upcase(name) like '%DAT'
and type='char'
;
%let nvars=&sqlobs;
quit;
* Make a new dataset with character date variables converted ;
* Rename the original character variables ;
data &dsname._num;
set &dsname;
%do i=1 %to &nvars;
%let var=%scan(&varlist,&i,|);
__&i = input(&var,&informat);
format __&i &format;
rename &var=&var._char __&i=&var;
%end;
run;
%mend char2num;
Let's test it with one of your sample datasets.
46 options mprint; 47 %char2num(vdraw1); MPRINT(CHAR2NUM): proc sql noprint; MPRINT(CHAR2NUM): select name into :varlist separated by '|' from dictionary.columns where libname="WORK" and memname="VDRAW1" and upcase(name) like '%DAT' and type='char' ; MPRINT(CHAR2NUM): quit; NOTE: PROCEDURE SQL used (Total process time): real time 0.00 seconds cpu time 0.00 seconds MPRINT(CHAR2NUM): * Make a new dataset with character date variables converted ; MPRINT(CHAR2NUM): * Rename the original character variables ; MPRINT(CHAR2NUM): data vdraw1_num; MPRINT(CHAR2NUM): set vdraw1; MPRINT(CHAR2NUM): __1 = input(v11dat,mmddyy10.); MPRINT(CHAR2NUM): format __1 date9.; MPRINT(CHAR2NUM): rename v11dat=v11dat_char __1=v11dat; MPRINT(CHAR2NUM): __2 = input(v12dat,mmddyy10.); MPRINT(CHAR2NUM): format __2 date9.; MPRINT(CHAR2NUM): rename v12dat=v12dat_char __2=v12dat; MPRINT(CHAR2NUM): __3 = input(v13dat,mmddyy10.); MPRINT(CHAR2NUM): format __3 date9.; MPRINT(CHAR2NUM): rename v13dat=v13dat_char __3=v13dat; MPRINT(CHAR2NUM): run; NOTE: There were 2 observations read from the data set WORK.VDRAW1. NOTE: The data set WORK.VDRAW1_NUM has 2 observations and 9 variables. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds
Results:
Note: Since the SQL query is only looking for character variables if you run it again on this new VDAT1_NUM dataset it will make VDAT1_NUM_NUM as a copy without any changes since there are no more character variables whose name ends in DAT.
Note: If you want the converted datasets written to WORK instead of the original library change the code for the DATA statement to use the MEMNAME macro variable instead of the DSNAME macro variable.
Now to apply it to all of the datasets you might just use some simple open code instead of macro. Use a similar SQL query to get the names of the tables to be converted and use them to generate calls to the macro into a macro variable. Then just expand the macro variable to run the macro calls.
proc sql noprint;
%let mcalls=;
select distinct cats('%char2num(',libname,'.',memname,')')
into :mcalls separated by ';'
from dictionary.columns
where libname="WORK"
and memname like 'VDRAW%'
and upcase(name) like '%DAT'
and type='char'
;
quit;
&mcalls;
Thanks a lot, Tom! I will look into the details in your thread and figure it out later. The basic idea here is modify the macro according to your comments, and then write another macro (which modifies the datasets) to run this macro (which modifies the variables), i.e., use nested macros instead of nested do loops (I guess I am not going to use the macro call &mcalls now because I have not learnt it yet, it's a bit advanced for me at the moment) . I will figure it out later.
PS: hi @SAS-PD , if you figure it out could you please let me know, thanks!
Using either MDY or DMY order to display dates will confuse 50% of your audience.
To avoid this use DATE or YYMMDD format.
Also avoid using only two digits to represent the year. The Y2K problem has not actually gone away.
Hi @SAS-PD , I did not expect my answer was marked as the solution, it's a happy surprise to me, it's the first solution I provided in this community! Thank you! BTW, have you figure out and correct the problems in my code? I myself do not know how to make it a perfect workable macro 😃.
Hi @Tom , base on your code, suggestion, and comments, I modified the code in my previous thread to deal with the character date value problem (an array was added to convert character date values to numeric date values 😐PS: I should be able to post this thread much more earlier when those date values had been converted to numeric, but I spent too much time on an unnecessary step, which was to display the converted numeric date values in date9. format and to change the data type, informat and format to numeric or date, as those numeric date columns was still character type in the proc contents table😐 and I still have not figure out it😐, data type, format, informat are always pain). Hope this solve @SAS-PD 's question. Code and results are as follows:
data vdraw1;
input subjid $ var5 $ var6 $ v11dat $
v12dat $ v13dat $;
datalines;
subj1 xxx sy 01/01/21 01/01/22 01/01/23
subj2 xttx ry 02/01/21 02/01/22 02/01/23
;
run;
proc print data=vdraw1;run;
data vdraw2;
input subjid $ var3 $ v21dat $;
datalines;
subj1 xx 02/02/02
subj2 z .
;
run;
proc print data=vdraw2;run;
proc sql;
select memname,
name
into :dslist separated by ' ',
:varlist separated by ' '
from dictionary.columns
where libname='WORK' and
memname like 'VDRAW%' and
name like '%dat';
quit;
data vdate;
set &dslist;
keep subjid &varlist;
run;
proc print data=vdate;run;
proc sort data=vdate out=vdsort nodup;
by subjid;
run;
proc print data=vdsort;run;
data vdfmtchg;
set vdsort;
array dfmtchg[*] &varlist;
do i=1 to dim(dfmtchg);
dfmtchg[i]=input(dfmtchg[i],ddmmyy10.);
end;
drop i;
run;
proc print data=vdfmtchg;run;
data vdfinal;
set vdfmtchg;
by subjid;
if first.subjid then do;
min=.;
max=.;
end;
retain min max;
min=min(of min &varlist);
max=max(of max &varlist);
if last.subjid;
keep subjid min max;
format min max date9.;
run;
proc print data=vdfinal;run;
This step does not make any sense if the variables are CHARACTER.
data vdfmtchg;
set vdsort;
array dfmtchg[*] &varlist;
do i=1 to dim(dfmtchg);
dfmtchg[i]=input(dfmtchg[i],ddmmyy10.);
end;
drop i;
run;
Because the INPUT() funciton is going to return a NUMBER which you then assign back to the CHARACTER variable. So at that point SAS will user the BEST12. format to convert the number of days the INPUT() function calculated into the digit strings you saw in your printout.
You cannot tell it since ODS output windows "eat" the leading spaces but the values your character variables look like:
22281 .
Since the BEST12. format will right align the digit strings in the 12 characters it writes.
The only reason it worked is because you later asked the MIN() function to find the minimum value of these character variables. But the MIN() function only works on NUMBERS. So SAS used the 32. informat to convert those 12 character strings back into numbers. Which you then attached the DATE9. format to so that they printed in a way that humans could understand them as representing dates.
Thanks a lot for your feedback and the detailed explanation Tom! I will accordingly try figure it out (base on the code in your previous threads) later!
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.