Thank you for your patience with me!!
Because we are talking about transposing, so I want to ask one more further question
My other file has the DATATYPE with the format below
DATADATE
04JAN1999
05JAN1999
06JAN1999
07JAN1999
08JAN1999
11JAN1999
12JAN1999
How can I adjust the code given and adjusted by you to force the DATATYPE to receive this format
data want;
infile 'E:\Harv 27th_Feb_2021\Report final\exchange_rate_Datastream_adjusted1.csv'
dlm=',' dsd lrecl=32000 firstobs=2 missover;
informat DATADATE ddmmyy10.
AUD EUR CAD DKK HKD ILS JPY NZD NOK SGD KRW
SEK CHF TWD GBP ARS BRL CLP CNY COP HRK CZK
EGP HUF INR IDR KES MYR MXN MAD PKR PEN PHP
PLN RON ZAR LKR THB TRY VES ATS DEM EEK ESP
GRD LTL LVL NLG ROL SIT SKK TRL VEB ZWD myna.;
format DATADATE ddmmyy10.;
input DATADATE
AUD EUR CAD DKK HKD ILS JPY NZD NOK SGD KRW
SEK CHF TWD GBP ARS BRL CLP CNY COP HRK CZK
EGP HUF INR IDR KES MYR MXN MAD PKR PEN PHP
PLN RON ZAR LKR THB TRY VES ATS DEM EEK ESP
GRD LTL LVL NLG ROL SIT SKK TRL VEB ZWD ;
run;
I try to change the ddmmyy10. in the code above to ddmmmyyyy10. the code getting wrong.
Or when I follow this paper, I changed ddmmyy10. to DDMMYYN10 (noted: no "." at the end) as mentioned in the link here. The code still not work properly
Even when I put the dot"." after DDMMYYN10 , it still not works properly
The code
data want;
infile 'E:\Harv 27th_Feb_2021\Report final\exchange_rate_Datastream_adjusted1.csv'
dlm=',' dsd lrecl=32000 firstobs=2 missover;
informat DATADATE DDMMYYN10.
AUD EUR CAD DKK HKD ILS JPY NZD NOK SGD KRW
SEK CHF TWD GBP ARS BRL CLP CNY COP HRK CZK
EGP HUF INR IDR KES MYR MXN MAD PKR PEN PHP
PLN RON ZAR LKR THB TRY VES ATS DEM EEK ESP
GRD LTL LVL NLG ROL SIT SKK TRL VEB ZWD myna.;
format DATADATE DDMMYYN10.;
input DATADATE
AUD EUR CAD DKK HKD ILS JPY NZD NOK SGD KRW
SEK CHF TWD GBP ARS BRL CLP CNY COP HRK CZK
EGP HUF INR IDR KES MYR MXN MAD PKR PEN PHP
PLN RON ZAR LKR THB TRY VES ATS DEM EEK ESP
GRD LTL LVL NLG ROL SIT SKK TRL VEB ZWD ;
run;
The log
28 data want;
29 infile 'E:\Harv 27th_Feb_2021\Report final\exchange_rate_Datastream_adjusted1.csv'
30 dlm=',' dsd lrecl=32000 firstobs=2 missover;
31 informat DATADATE DDMMYYN10.
__________
485
NOTE 485-185: Informat DDMMYYN was not found or could not be loaded.
32 AUD EUR CAD DKK HKD ILS JPY NZD NOK SGD KRW
33 SEK CHF TWD GBP ARS BRL CLP CNY COP HRK CZK
34 EGP HUF INR IDR KES MYR MXN MAD PKR PEN PHP
35 PLN RON ZAR LKR THB TRY VES ATS DEM EEK ESP
36 GRD LTL LVL NLG ROL SIT SKK TRL VEB ZWD myna.;
37 format DATADATE DDMMYYN10.;
__________
29
ERROR 29-185: Width specified for format DDMMYYN is invalid.
38 input DATADATE
39 AUD EUR CAD DKK HKD ILS JPY NZD NOK SGD KRW
40 SEK CHF TWD GBP ARS BRL CLP CNY COP HRK CZK
41 EGP HUF INR IDR KES MYR MXN MAD PKR PEN PHP
42 PLN RON ZAR LKR THB TRY VES ATS DEM EEK ESP
43 GRD LTL LVL NLG ROL SIT SKK TRL VEB ZWD ;
44 run;
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.WANT may be incomplete. When this step was stopped there were 0 observations and 55 variables.
WARNING: Data set WORK.WANT was not replaced because this step was stopped.
Could you please tell me what I did wrong in this case?
I will come back to the case merging the transposing inside the import code later, I am still examining and learning then.
P/S:As @Shmuel mentioned, maybe lrecl is not a need in this case, but the value 32000 may not be erroneous in this case. Can I ask your idea about that ?
Formats do not change the value. They only display it differently.
data test;
format
d1 date9.
d2 ddmmyy10.
;
d1 = today();
d2 = d1;
run;
Read up about SAS date (and time) values here.
Thank you @Kurt_Bremser and @Tom
I changed the ddmmyy10. to date9. in this code and the result gets wrong
DATADATE AUD EUR CAD
. 1.63132 0.8548 1.5511
. 1.63026 0.85157 1.536
. 1.63026 0.85157 1.536
. 1.61577 0.84753 1.5251
The log is as below
NOTE: A byte-order mark in the file "E:\Harv 27th_Feb_2021\Report final\exchange_rate_Datastream_adjusted1.csv" (for fileref
"#LN00782") indicates that the data is encoded in "utf-8". This encoding will be used to process the file.
NOTE: The infile 'E:\Harv 27th_Feb_2021\Report final\exchange_rate_Datastream_adjusted1.csv' is:
Filename=E:\Harv 27th_Feb_2021\Report final\exchange_rate_Datastream_adjusted1.csv,
RECFM=V,LRECL=128000,File Size (bytes)=2298494,
Last Modified=04 March 2021 18:49:03,
Create Time=04 March 2021 18:49:03
NOTE: Invalid data for DATADATE in line 2 1-10.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0
2 30/12/1998,1.63132,0.8548,1.5511,6.3655,7.7477,4.1654,114.715,1.90223,7.5875,1.6568,1200.5,8.1123,1.
101 3703,32.2305,1.66375,0.9991,1.2081,473.25,8.2787,1545.5,6.2603,29.99199,3.41,215.495,42.53,8100,61.9
201 ,5.43,9.935,9.2705,49.795,3.1565,39.15,3.497,NA,5.87,68.4,36.64999,0.31535,NA,11.763,1.672,13.3388,1
301 42.25,282.365,4.0013,0.5675,1.8842,1.115,161.2243,36.956,NA,564.55,NA 369
DATADATE=. AUD=1.63132 EUR=0.8548 CAD=1.5511 DKK=6.3655 HKD=7.7477 ILS=4.1654 JPY=114.715 NZD=1.90223 NOK=7.5875 SGD=1.6568
KRW=1200.5 SEK=8.1123 CHF=1.3703 TWD=32.2305 GBP=1.66375 ARS=0.9991 BRL=1.2081 CLP=473.25 CNY=8.2787 COP=1545.5 HRK=6.2603
CZK=29.99199 EGP=3.41 HUF=215.495 INR=42.53 IDR=8100 KES=61.9 MYR=5.43 MXN=9.935 MAD=9.2705 PKR=49.795 PEN=3.1565 PHP=39.15
PLN=3.497 RON=. ZAR=5.87 LKR=68.4 THB=36.64999 TRY=0.31535 VES=. ATS=11.763 DEM=1.672 EEK=13.3388 ESP=142.25 GRD=282.365 LTL=4.0013
LVL=0.5675 NLG=1.8842 ROL=1.115 SIT=161.2243 SKK=36.956 TRL=. VEB=564.55 ZWD=. _ERROR_=1 _N_=1
NOTE: Invalid data for DATADATE in line 3 1-10.
Could you please hint me how to fix it? I thought the values are right and agree that format only forms the display after reading the document. But I am wondering what causes the "." in column DATADATE and how to fix it properly.
On the other hand, when I changed the format of DATADATE in the code transpose, it works well
proc format;
invalue myna
'NA' = .
other = [best12.]
;
run;
data want;
infile 'E:\Harv 27th_Feb_2021\Report final\exchange_rate_Datastream_adjusted1.csv'
dlm=',' dsd lrecl=32000 firstobs=2 missover;
informat DATADATE ddmmyy10.
AUD EUR CAD DKK HKD ILS JPY NZD NOK SGD KRW
SEK CHF TWD GBP ARS BRL CLP CNY COP HRK CZK
EGP HUF INR IDR KES MYR MXN MAD PKR PEN PHP
PLN RON ZAR LKR THB TRY VES ATS DEM EEK ESP
GRD LTL LVL NLG ROL SIT SKK TRL VEB ZWD myna.;
format DATADATE ddmmyy10.;
input DATADATE
AUD EUR CAD DKK HKD ILS JPY NZD NOK SGD KRW
SEK CHF TWD GBP ARS BRL CLP CNY COP HRK CZK
EGP HUF INR IDR KES MYR MXN MAD PKR PEN PHP
PLN RON ZAR LKR THB TRY VES ATS DEM EEK ESP
GRD LTL LVL NLG ROL SIT SKK TRL VEB ZWD ;
run;
proc transpose
data=want
out=currency (
rename=(col1=rate _name_=currency)
)
;
by datadate;
var AUD--ZWD;
format datadate date9.;
run;
The result
DATADATE currency rate
30DEC1998 AUD 1.63132
30DEC1998 EUR 0.8548
30DEC1998 CAD 1.5511
30DEC1998 DKK 6.3655
30DEC1998 HKD 7.7477
30DEC1998 ILS 4.1654
Warm regards.
Please share the modified code.
Most likely you changed the INFORMAT instead of the FORMAT. The value of the first field on that line is not compatible with the DATE informat, but would work fine with the DDMMYY informat.
Ahhhhhhhh
Thanks @Tom
My mistake. I forget a simple thing that informat is how we let SAS read the datatype while format is how SAS display the datatype. That is why my previous code here does not work.
proc format;
invalue myna
'NA' = .
other = [best12.]
;
run;
data want;
infile 'E:\Harv 27th_Feb_2021\Report final\exchange_rate_Datastream_adjusted1.csv'
dlm=',' dsd lrecl=32000 firstobs=2 missover;
informat DATADATE date9./*changed line*/
AUD EUR CAD DKK HKD ILS JPY NZD NOK SGD KRW
SEK CHF TWD GBP ARS BRL CLP CNY COP HRK CZK
EGP HUF INR IDR KES MYR MXN MAD PKR PEN PHP
PLN RON ZAR LKR THB TRY VES ATS DEM EEK ESP
GRD LTL LVL NLG ROL SIT SKK TRL VEB ZWD myna.;
format DATADATE date9.;/*changed line*/
input DATADATE
AUD EUR CAD DKK HKD ILS JPY NZD NOK SGD KRW
SEK CHF TWD GBP ARS BRL CLP CNY COP HRK CZK
EGP HUF INR IDR KES MYR MXN MAD PKR PEN PHP
PLN RON ZAR LKR THB TRY VES ATS DEM EEK ESP
GRD LTL LVL NLG ROL SIT SKK TRL VEB ZWD ;
run;
But when I sit down and take a second thought, I did change only the display (format), the code works well
proc format;
invalue myna
'NA' = .
other = [best12.]
;
run;
data want;
infile 'E:\Harv 27th_Feb_2021\Report final\exchange_rate_Datastream_adjusted1.csv'
dlm=',' dsd lrecl=32000 firstobs=2 missover;
informat DATADATE ddmmyy10.
AUD EUR CAD DKK HKD ILS JPY NZD NOK SGD KRW
SEK CHF TWD GBP ARS BRL CLP CNY COP HRK CZK
EGP HUF INR IDR KES MYR MXN MAD PKR PEN PHP
PLN RON ZAR LKR THB TRY VES ATS DEM EEK ESP
GRD LTL LVL NLG ROL SIT SKK TRL VEB ZWD myna.;
format DATADATE date9.;/*changed line*/
input DATADATE
AUD EUR CAD DKK HKD ILS JPY NZD NOK SGD KRW
SEK CHF TWD GBP ARS BRL CLP CNY COP HRK CZK
EGP HUF INR IDR KES MYR MXN MAD PKR PEN PHP
PLN RON ZAR LKR THB TRY VES ATS DEM EEK ESP
GRD LTL LVL NLG ROL SIT SKK TRL VEB ZWD ;
run;
The result finally is here
DATADATE AUD EUR CAD
30DEC1998 1.63132 0.8548 1.5511
31DEC1998 1.63026 0.85157 1.536
01JAN1999 1.63026 0.85157 1.536
04JAN1999 1.61577 0.84753 1.5251
05JAN1999 1.60772 0.84861 1.521
06JAN1999 1.58907 0.85324 1.5098
07JAN1999 1.58178 0.85682 1.5123
08JAN1999 1.57928 0.86558 1.5141
Warmest regards.
Do NOT change the informat, only change the format.
The date as shown in the line from the file is in DDMMYY format.
You are using a 20+ year old documentation link (Version 8 of SAS) and it has some typos. It is missing the period at the end of the format specification, but that should not have throw you off because format specifications always include a period in SAS code. Otherwise the look like a variable name. It is also showing the wrong output style for that format. There are other typos in that documentation page. See if you can find them.
As mentioned, your starting point for SAS documentation is https://documentation.sas.com, for Programming it is https://documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.4&docsetId=pgmsashome&docsetTarget=h...
There is no need to use PROC IMPORT to read a text file. Just write your own data step to read the file.
Especially if you know all of the variables should be numeric.
Copy the header row into your program editor.
Name,AUD,EUR,CAD,DKK,HKD,ILS,JPY,NZD,NOK,SGD,KRW,SEK,CHF,TWD,GBP,ARS,BRL,CLP,CNY,COP,HRK,CZK,EGP,HUF,INR,IDR,KES,MYR,MXN,MAD,PKR,PEN,PHP,PLN,RON,ZAR,LKR,THB,TRY,VES,ATS,DEM,EEK,ESP,GRD,LTL,LVL,NLG,ROL,SIT,SKK,TRL,VEB,ZWD
Change all the commas to spaces. And use it to generate an INPUT statement.
input Name :$20.
AUD EUR CAD DKK HKD ILS JPY NZD NOK SGD KRW SEK CHF TWD GBP
ARS BRL CLP CNY COP HRK CZK EGP HUF INR IDR KES MYR MXN MAD PKR
PEN PHP PLN RON ZAR LKR THB TRY VES ATS DEM EEK ESP GRD LTL LVL
NLG ROL SIT SKK TRL VEB ZWD
;
Now just add a DATA statement , an INFILE statement and a RUN statement and you will have a dataset.
If you want to deal with the NA strings in the numeric fields you could build a custom informat. Or perhaps just modify the input buffer first.
data cars;
infile 'E:\Harv 27th_Feb_2021\Report final\exchange_rate_Datastream_adjusted_csv.csv'
dsd firstobs=2 truncover
;
input @;
_infile_ = tranwrd(_infile_,',NA,',',,');
input Name :$20.
AUD EUR CAD DKK HKD ILS JPY NZD NOK SGD KRW SEK CHF TWD GBP
ARS BRL CLP CNY COP HRK CZK EGP HUF INR IDR KES MYR MXN MAD PKR
PEN PHP PLN RON ZAR LKR THB TRY VES ATS DEM EEK ESP GRD LTL LVL
NLG ROL SIT SKK TRL VEB ZWD
;
run;
PS Why do you NAME values look like DD/MM/YYYY style dates?
Thank @Tom
Thank you for your step-by-step guidance.
The first time I deal with CSV importing, but I have the feeling that we mostly need to import from the second observation because the first observations are always character. And because of that, we need to import manually for the first row. It is a feeling to me, hope that I do not misinterpret.
Regarding your concern. sorry, it is my fault I forget to change the "Name" character, it should be "DATADATE".
Warm regards.
No. PROC IMPORT is a guessing procedure for reading CSV files. It looks at the file and guesses how many variables there are , what names to use for each variable, what data type and when character what length, etc.
If you just look at the file yourself you can usually do a much better job of guessing what is in the file. You won't be fooled by the extra commas on the end of the line or extra empty lines at the end of the file.
So my step-by-step was to show you how easy it is to generate the code yourself using your brain as the guessing engine instead of PROC IMPORT.
And once you have the working code you can easily modify it. For example you can change the name of the first variable and the data type and what informat it uses to read the values from the CSV file.
Also the error message generated by the extra commas is not saying there are many variables named "Name". It is saying that you cannot have a variable without any name. The string "Name" is just constant text in the generated error message. If you look carefully you will see there is an extra space after Name in the error message where the actual name read from the CSV file would go.
Hi @Reeza
Thank you for pointing it out!
I follow this link to see what is LRECL. It documented that
So, it means that from 1-32767, there is only one number that n cannot receive is 32000. From my understanding, if we use LRECL=32000, it means that LRECL=1/32767, did I explain reasonably?
Warmest regards.
@Phil_NZ - a little bit of theory - what is a file:
You probably know, at least the next two terms: VARIABLE and RECORD.
SAS uses the terms OBSERVATION (instead record).
LRECL stands for Logical RECord Length.
RECFM stands for RECord ForMat.
If you sum the length of all varaibles in a record, including delimiters and control characters (like EOL = End Of Line) you get the Record Length.
Depending on record format - the record length may be a fix size or variable size. When reading a text file sas writes to the log the minimum size and the maximum size of the input record.
Data is written to Hard-Disk as a block of records. The total length of a block is called BLOK SIZE.
As that, LRECL=1/32767 is erroneous. LRECL can get an integer value which is dependent on the file you read/relate to. You need know the real length of the record when the file was written.
In some cases each block contains one record only - then LRECL is equal the Block Size. I think it happens when you write, for example, to a printer.
In your case, using PROC IMPORT or a data step with INFILE, you need not define the LRECL.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.