Hi Everyone,
I run the proc import below to import my file (attached).
However, some column of number turn to character.
Can you help me to change all variables starting with _ into numeric?
Thank you for your help.
HHCFX
Each month, I need to import similar file and I don't know how many column there are. Thus, using data step might not be good idea for me.
%put &=sysvlong ;
options validvarname=v7 ;
proc import datafile="C:\Users\sample.CSV"
out=client dbms=CSV replace;
getnames=yes;
guessingrows=100;
run;
That is not a good format for a guessing procedure like PROC IMPORT. Having date in metadata (variable names) is not a good idea in general. Since your file is just a text file you can instead just read it with a data step.
data want ;
length id $40 date 8 value 8 ;
infile "&path/sample.csv" dsd truncover ;
if _n_=1 then do ;
array dates (100) _temporary_;
input id @ ;
do ncol=1 by 1 until(date=.);
input date :??mmddyy. @ ;
dates(ncol)=date ;
end;
ncol=ncol-1;
retain ncol;
input ;
end;
input id @;
do col=1 to ncol;
input value ?? @;
date=dates(col);
output;
end;
drop col ncol;
format date yymmdd10.;
run;
Note that your file also has text string ' - ' for some of the values. I have told the data step to just make those missing. You could create a custom INFORMAT that would set those to a special missing instead.
You could recreate your original report from that data structure by using PROC REPORT.
proc report data=want;
where date between '01DEC2018'd and '05dec2018'd ;
columns id value,date ;
define id / group ;
define date / across ' ';
define value / ' ';
run;
Yes, the actual data always has number column starts in column 4.
In the sample uploaded, I delete 2 first column.
and it also ended with grand total.
The report will contain every days in a calendar month.
So to a certain extend, the answer is Yes. but there might be a need of an If-then for number of day in month.
@hhchenfx wrote:
Hi Everyone,
I run the proc import below to import my file (attached).
However, some column of number turn to character.
Can you help me to change all variables starting with _ into numeric?
Thank you for your help.
HHCFX
Each month, I need to import similar file and I don't know how many column there are. Thus, using data step might not be good idea for me.
%put &=sysvlong ; options validvarname=v7 ; proc import datafile="C:\Users\sample.CSV" out=client dbms=CSV replace; getnames=yes; guessingrows=100; run;
The problem is that your data contains values that are not numeric
Example from your file the first 5 rows:
X1B_X1SF_X1TI_DWX1,,,8,8,8,8,8,,,8,8,8.5,8,7.5,,,8,8,8,8,8,,,8, - ,8,8,,,,8,152 X1B_X1SF_X1TI_PJM_II,,,6.5,6,7.5,8,8,,,8,9,9.5,6,4.5,,,8.5,6,4,4,6.5,,,8,,,4,6,,,7,127 X1B_X1SF_X1TI_PJM_II,,,1.5,5,2,1,,,,,,,4,3,,,,3,6,7,2,,,,,,,4.5,,,1,40 X1B_X1SF_X1TI_SX1_II,,,8,8,8,8,8,,,6,9,9,8,6.5,,,8,8,8,8,8,,,8,,8,8,8,,,8,158.5 X1B_X1SF_X1TI_DWX1,,,,,,,,,,,,,,,,,,,,,,,,8, - , - , - , - ,,,8,16
Those fields that contain dashes are character not numeric. You have more examples further down in your file. Any column that has a dash in the first 100 rows (your guessing rows) will be set to character type.
Time to write a data step to read things correctly, whatever "correctly" may mean in the presence of a dash surrounded by several spaces.
Not terrible fond of making variable names alike '12/1/2018'n either. That way your variable name contains actual data in the name.
That is not a good format for a guessing procedure like PROC IMPORT. Having date in metadata (variable names) is not a good idea in general. Since your file is just a text file you can instead just read it with a data step.
data want ;
length id $40 date 8 value 8 ;
infile "&path/sample.csv" dsd truncover ;
if _n_=1 then do ;
array dates (100) _temporary_;
input id @ ;
do ncol=1 by 1 until(date=.);
input date :??mmddyy. @ ;
dates(ncol)=date ;
end;
ncol=ncol-1;
retain ncol;
input ;
end;
input id @;
do col=1 to ncol;
input value ?? @;
date=dates(col);
output;
end;
drop col ncol;
format date yymmdd10.;
run;
Note that your file also has text string ' - ' for some of the values. I have told the data step to just make those missing. You could create a custom INFORMAT that would set those to a special missing instead.
You could recreate your original report from that data structure by using PROC REPORT.
proc report data=want;
where date between '01DEC2018'd and '05dec2018'd ;
columns id value,date ;
define id / group ;
define date / across ' ';
define value / ' ';
run;
I see.
I can replace all these "-" with blank in Excel in the new file enclosed
can it help to simplify the problem?
I really appreciate your help!
HHCFX
@hhchenfx wrote:
I see.
I can replace all these "-" with blank in Excel in the new file enclosed
can it help to simplify the problem?
Opening a CSV file in Excel can cause a lot of problems. If you are not careful Excel will transform some of the value. For example it might convert character variables with hyphens in them into dates.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.