Hi Everyone,
I run the proc import below to import my file (attached).
However, some column of number turn to character.
Can you help me to change all variables starting with _ into numeric?
Thank you for your help.
HHCFX
Each month, I need to import similar file and I don't know how many column there are. Thus, using data step might not be good idea for me.
%put &=sysvlong ;
options validvarname=v7 ;
proc import datafile="C:\Users\sample.CSV"
out=client dbms=CSV replace;
getnames=yes;
guessingrows=100;
run;
That is not a good format for a guessing procedure like PROC IMPORT. Having date in metadata (variable names) is not a good idea in general. Since your file is just a text file you can instead just read it with a data step.
data want ;
length id $40 date 8 value 8 ;
infile "&path/sample.csv" dsd truncover ;
if _n_=1 then do ;
array dates (100) _temporary_;
input id @ ;
do ncol=1 by 1 until(date=.);
input date :??mmddyy. @ ;
dates(ncol)=date ;
end;
ncol=ncol-1;
retain ncol;
input ;
end;
input id @;
do col=1 to ncol;
input value ?? @;
date=dates(col);
output;
end;
drop col ncol;
format date yymmdd10.;
run;
Note that your file also has text string ' - ' for some of the values. I have told the data step to just make those missing. You could create a custom INFORMAT that would set those to a special missing instead.
You could recreate your original report from that data structure by using PROC REPORT.
proc report data=want;
where date between '01DEC2018'd and '05dec2018'd ;
columns id value,date ;
define id / group ;
define date / across ' ';
define value / ' ';
run;
Yes, the actual data always has number column starts in column 4.
In the sample uploaded, I delete 2 first column.
and it also ended with grand total.
The report will contain every days in a calendar month.
So to a certain extend, the answer is Yes. but there might be a need of an If-then for number of day in month.
@hhchenfx wrote:
Hi Everyone,
I run the proc import below to import my file (attached).
However, some column of number turn to character.
Can you help me to change all variables starting with _ into numeric?
Thank you for your help.
HHCFX
Each month, I need to import similar file and I don't know how many column there are. Thus, using data step might not be good idea for me.
%put &=sysvlong ; options validvarname=v7 ; proc import datafile="C:\Users\sample.CSV" out=client dbms=CSV replace; getnames=yes; guessingrows=100; run;
The problem is that your data contains values that are not numeric
Example from your file the first 5 rows:
X1B_X1SF_X1TI_DWX1,,,8,8,8,8,8,,,8,8,8.5,8,7.5,,,8,8,8,8,8,,,8, - ,8,8,,,,8,152 X1B_X1SF_X1TI_PJM_II,,,6.5,6,7.5,8,8,,,8,9,9.5,6,4.5,,,8.5,6,4,4,6.5,,,8,,,4,6,,,7,127 X1B_X1SF_X1TI_PJM_II,,,1.5,5,2,1,,,,,,,4,3,,,,3,6,7,2,,,,,,,4.5,,,1,40 X1B_X1SF_X1TI_SX1_II,,,8,8,8,8,8,,,6,9,9,8,6.5,,,8,8,8,8,8,,,8,,8,8,8,,,8,158.5 X1B_X1SF_X1TI_DWX1,,,,,,,,,,,,,,,,,,,,,,,,8, - , - , - , - ,,,8,16
Those fields that contain dashes are character not numeric. You have more examples further down in your file. Any column that has a dash in the first 100 rows (your guessing rows) will be set to character type.
Time to write a data step to read things correctly, whatever "correctly" may mean in the presence of a dash surrounded by several spaces.
Not terrible fond of making variable names alike '12/1/2018'n either. That way your variable name contains actual data in the name.
That is not a good format for a guessing procedure like PROC IMPORT. Having date in metadata (variable names) is not a good idea in general. Since your file is just a text file you can instead just read it with a data step.
data want ;
length id $40 date 8 value 8 ;
infile "&path/sample.csv" dsd truncover ;
if _n_=1 then do ;
array dates (100) _temporary_;
input id @ ;
do ncol=1 by 1 until(date=.);
input date :??mmddyy. @ ;
dates(ncol)=date ;
end;
ncol=ncol-1;
retain ncol;
input ;
end;
input id @;
do col=1 to ncol;
input value ?? @;
date=dates(col);
output;
end;
drop col ncol;
format date yymmdd10.;
run;
Note that your file also has text string ' - ' for some of the values. I have told the data step to just make those missing. You could create a custom INFORMAT that would set those to a special missing instead.
You could recreate your original report from that data structure by using PROC REPORT.
proc report data=want;
where date between '01DEC2018'd and '05dec2018'd ;
columns id value,date ;
define id / group ;
define date / across ' ';
define value / ' ';
run;
I see.
I can replace all these "-" with blank in Excel in the new file enclosed
can it help to simplify the problem?
I really appreciate your help!
HHCFX
@hhchenfx wrote:
I see.
I can replace all these "-" with blank in Excel in the new file enclosed
can it help to simplify the problem?
Opening a CSV file in Excel can cause a lot of problems. If you are not careful Excel will transform some of the value. For example it might convert character variables with hyphens in them into dates.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.