BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Phil_NZ
Barite | Level 11

Hi @Kurt_Bremser 

Thank you for your patience with me!!

 

Because we are talking about transposing, so I want to ask one more further question

My other file has the DATATYPE with the format below

DATADATE
04JAN1999
05JAN1999
06JAN1999
07JAN1999
08JAN1999
11JAN1999
12JAN1999

How can I adjust the code given and adjusted by you to force the DATATYPE to receive this format

data want;
  infile 'E:\Harv 27th_Feb_2021\Report final\exchange_rate_Datastream_adjusted1.csv'
       dlm=',' dsd lrecl=32000 firstobs=2 missover; 
  informat DATADATE ddmmyy10.
          AUD EUR CAD DKK HKD ILS JPY NZD NOK SGD KRW 
          SEK CHF TWD GBP ARS BRL CLP CNY COP HRK CZK 
          EGP HUF INR IDR KES MYR MXN MAD PKR PEN PHP 
          PLN RON ZAR LKR THB TRY VES ATS DEM EEK ESP 
          GRD LTL LVL NLG ROL SIT SKK TRL VEB ZWD  myna.;
  format DATADATE ddmmyy10.;
  input DATADATE 
          AUD EUR CAD DKK HKD ILS JPY NZD NOK SGD KRW 
          SEK CHF TWD GBP ARS BRL CLP CNY COP HRK CZK 
          EGP HUF INR IDR KES MYR MXN MAD PKR PEN PHP 
          PLN RON ZAR LKR THB TRY VES ATS DEM EEK ESP 
          GRD LTL LVL NLG ROL SIT SKK TRL VEB ZWD  ;
run;

I try to change the ddmmyy10. in the code above to ddmmmyyyy10. the code getting wrong.

Or when I follow this paper, I changed ddmmyy10. to DDMMYYN10 (noted: no "." at the end) as mentioned in the link here. The code still not work properly

 

My97_0-1614856423881.png

 

My97_1-1614856497290.png

 

Even when I put the dot"." after DDMMYYN10 , it still not works properly

The code

data want;
  infile 'E:\Harv 27th_Feb_2021\Report final\exchange_rate_Datastream_adjusted1.csv'
       dlm=',' dsd lrecl=32000 firstobs=2 missover; 
  informat DATADATE DDMMYYN10.
          AUD EUR CAD DKK HKD ILS JPY NZD NOK SGD KRW 
          SEK CHF TWD GBP ARS BRL CLP CNY COP HRK CZK 
          EGP HUF INR IDR KES MYR MXN MAD PKR PEN PHP 
          PLN RON ZAR LKR THB TRY VES ATS DEM EEK ESP 
          GRD LTL LVL NLG ROL SIT SKK TRL VEB ZWD  myna.;
  format DATADATE DDMMYYN10.;
  input DATADATE 
          AUD EUR CAD DKK HKD ILS JPY NZD NOK SGD KRW 
          SEK CHF TWD GBP ARS BRL CLP CNY COP HRK CZK 
          EGP HUF INR IDR KES MYR MXN MAD PKR PEN PHP 
          PLN RON ZAR LKR THB TRY VES ATS DEM EEK ESP 
          GRD LTL LVL NLG ROL SIT SKK TRL VEB ZWD  ;
run;

The log

28         data want;
29           infile 'E:\Harv 27th_Feb_2021\Report final\exchange_rate_Datastream_adjusted1.csv'
30                dlm=',' dsd lrecl=32000 firstobs=2 missover;
31           informat DATADATE DDMMYYN10.
                               __________
                               485
NOTE 485-185: Informat DDMMYYN was not found or could not be loaded.

32                   AUD EUR CAD DKK HKD ILS JPY NZD NOK SGD KRW
33                   SEK CHF TWD GBP ARS BRL CLP CNY COP HRK CZK
34                   EGP HUF INR IDR KES MYR MXN MAD PKR PEN PHP
35                   PLN RON ZAR LKR THB TRY VES ATS DEM EEK ESP
36                   GRD LTL LVL NLG ROL SIT SKK TRL VEB ZWD  myna.;
37           format DATADATE DDMMYYN10.;
                             __________
                             29
ERROR 29-185: Width specified for format DDMMYYN is invalid.

38           input DATADATE
39                   AUD EUR CAD DKK HKD ILS JPY NZD NOK SGD KRW
40                   SEK CHF TWD GBP ARS BRL CLP CNY COP HRK CZK
41                   EGP HUF INR IDR KES MYR MXN MAD PKR PEN PHP
42                   PLN RON ZAR LKR THB TRY VES ATS DEM EEK ESP
43                   GRD LTL LVL NLG ROL SIT SKK TRL VEB ZWD  ;
44         run;

NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.WANT may be incomplete.  When this step was stopped there were 0 observations and 55 variables.
WARNING: Data set WORK.WANT was not replaced because this step was stopped.

Could you please tell me what I did wrong in this case?

I will come back to the case merging the transposing inside the import code later, I am still examining and learning then.

 

P/S:As @Shmuel  mentioned, maybe lrecl is not a need in this case, but the value 32000 may not be erroneous in this case. Can I ask your idea about that ?

 

Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.
Kurt_Bremser
Super User

Formats do not change the value. They only display it differently.

data test;
format
  d1 date9.
  d2 ddmmyy10.
;
d1 = today();
d2 = d1;
run;

Read up about SAS date (and time) values here.

Phil_NZ
Barite | Level 11

Thank you @Kurt_Bremser and @Tom 

 

I changed the ddmmyy10. to date9. in this code and the result gets wrong

DATADATE	AUD	    EUR	    CAD
.	        1.63132	0.8548	1.5511
.	        1.63026	0.85157	1.536
.	        1.63026	0.85157	1.536
.	        1.61577	0.84753	1.5251

The log is as below

NOTE: A byte-order mark in the file "E:\Harv 27th_Feb_2021\Report final\exchange_rate_Datastream_adjusted1.csv" (for fileref 
      "#LN00782") indicates that the data is encoded in "utf-8".  This encoding will be used to process the file.
NOTE: The infile 'E:\Harv 27th_Feb_2021\Report final\exchange_rate_Datastream_adjusted1.csv' is:
      Filename=E:\Harv 27th_Feb_2021\Report final\exchange_rate_Datastream_adjusted1.csv,
      RECFM=V,LRECL=128000,File Size (bytes)=2298494,
      Last Modified=04 March 2021 18:49:03,
      Create Time=04 March 2021 18:49:03

NOTE: Invalid data for DATADATE in line 2 1-10.
RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0                      
2         30/12/1998,1.63132,0.8548,1.5511,6.3655,7.7477,4.1654,114.715,1.90223,7.5875,1.6568,1200.5,8.1123,1.
     101  3703,32.2305,1.66375,0.9991,1.2081,473.25,8.2787,1545.5,6.2603,29.99199,3.41,215.495,42.53,8100,61.9
     201  ,5.43,9.935,9.2705,49.795,3.1565,39.15,3.497,NA,5.87,68.4,36.64999,0.31535,NA,11.763,1.672,13.3388,1
     301  42.25,282.365,4.0013,0.5675,1.8842,1.115,161.2243,36.956,NA,564.55,NA 369
DATADATE=. AUD=1.63132 EUR=0.8548 CAD=1.5511 DKK=6.3655 HKD=7.7477 ILS=4.1654 JPY=114.715 NZD=1.90223 NOK=7.5875 SGD=1.6568
KRW=1200.5 SEK=8.1123 CHF=1.3703 TWD=32.2305 GBP=1.66375 ARS=0.9991 BRL=1.2081 CLP=473.25 CNY=8.2787 COP=1545.5 HRK=6.2603
CZK=29.99199 EGP=3.41 HUF=215.495 INR=42.53 IDR=8100 KES=61.9 MYR=5.43 MXN=9.935 MAD=9.2705 PKR=49.795 PEN=3.1565 PHP=39.15
PLN=3.497 RON=. ZAR=5.87 LKR=68.4 THB=36.64999 TRY=0.31535 VES=. ATS=11.763 DEM=1.672 EEK=13.3388 ESP=142.25 GRD=282.365 LTL=4.0013
LVL=0.5675 NLG=1.8842 ROL=1.115 SIT=161.2243 SKK=36.956 TRL=. VEB=564.55 ZWD=. _ERROR_=1 _N_=1
NOTE: Invalid data for DATADATE in line 3 1-10.

Could you please hint me how to fix it? I thought the values are right and agree that format only forms the display after reading the document. But I am wondering what causes the "." in column DATADATE and how to fix it properly.

 

 

On the other hand, when I changed the format of DATADATE in the code transpose, it works well

proc format;
invalue myna
'NA' = .
other = [best12.]
;
run;

data want;
  infile 'E:\Harv 27th_Feb_2021\Report final\exchange_rate_Datastream_adjusted1.csv'
       dlm=',' dsd lrecl=32000 firstobs=2 missover; 
  informat DATADATE ddmmyy10.
          AUD EUR CAD DKK HKD ILS JPY NZD NOK SGD KRW 
          SEK CHF TWD GBP ARS BRL CLP CNY COP HRK CZK 
          EGP HUF INR IDR KES MYR MXN MAD PKR PEN PHP 
          PLN RON ZAR LKR THB TRY VES ATS DEM EEK ESP 
          GRD LTL LVL NLG ROL SIT SKK TRL VEB ZWD  myna.;
  format DATADATE ddmmyy10.;
  input DATADATE 
          AUD EUR CAD DKK HKD ILS JPY NZD NOK SGD KRW 
          SEK CHF TWD GBP ARS BRL CLP CNY COP HRK CZK 
          EGP HUF INR IDR KES MYR MXN MAD PKR PEN PHP 
          PLN RON ZAR LKR THB TRY VES ATS DEM EEK ESP 
          GRD LTL LVL NLG ROL SIT SKK TRL VEB ZWD  ;
run;

proc transpose
  data=want
  out=currency (
    rename=(col1=rate _name_=currency)
  )
;
by datadate;
var AUD--ZWD;
format datadate date9.;
run;

The result

DATADATE	currency	rate
30DEC1998	AUD	        1.63132
30DEC1998	EUR      	0.8548
30DEC1998	CAD     	1.5511
30DEC1998	DKK	        6.3655
30DEC1998	HKD	        7.7477
30DEC1998	ILS      	4.1654

Warm regards.

 

Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.
Tom
Super User Tom
Super User

Please share the modified code.

Most likely you changed the INFORMAT instead of the FORMAT.  The value of the first field on that line is not compatible with the DATE informat, but would work fine with the DDMMYY informat.

Phil_NZ
Barite | Level 11

Ahhhhhhhh

Thanks @Tom 

My mistake. I forget a simple thing that informat is how we let SAS read the datatype while format is how SAS display the datatype. That is why my previous code here does not work.

proc format;
invalue myna
'NA' = .
other = [best12.]
;
run;

data want;
  infile 'E:\Harv 27th_Feb_2021\Report final\exchange_rate_Datastream_adjusted1.csv'
       dlm=',' dsd lrecl=32000 firstobs=2 missover; 
  informat DATADATE date9./*changed line*/
          AUD EUR CAD DKK HKD ILS JPY NZD NOK SGD KRW 
          SEK CHF TWD GBP ARS BRL CLP CNY COP HRK CZK 
          EGP HUF INR IDR KES MYR MXN MAD PKR PEN PHP 
          PLN RON ZAR LKR THB TRY VES ATS DEM EEK ESP 
          GRD LTL LVL NLG ROL SIT SKK TRL VEB ZWD  myna.;
  format DATADATE date9.;/*changed line*/
  input DATADATE 
          AUD EUR CAD DKK HKD ILS JPY NZD NOK SGD KRW 
          SEK CHF TWD GBP ARS BRL CLP CNY COP HRK CZK 
          EGP HUF INR IDR KES MYR MXN MAD PKR PEN PHP 
          PLN RON ZAR LKR THB TRY VES ATS DEM EEK ESP 
          GRD LTL LVL NLG ROL SIT SKK TRL VEB ZWD  ;
run;

But when I sit down and take a second thought, I did change only the display (format), the code works well

proc format;
invalue myna
'NA' = .
other = [best12.]
;
run;

data want;
  infile 'E:\Harv 27th_Feb_2021\Report final\exchange_rate_Datastream_adjusted1.csv'
       dlm=',' dsd lrecl=32000 firstobs=2 missover; 
  informat DATADATE ddmmyy10.
          AUD EUR CAD DKK HKD ILS JPY NZD NOK SGD KRW 
          SEK CHF TWD GBP ARS BRL CLP CNY COP HRK CZK 
          EGP HUF INR IDR KES MYR MXN MAD PKR PEN PHP 
          PLN RON ZAR LKR THB TRY VES ATS DEM EEK ESP 
          GRD LTL LVL NLG ROL SIT SKK TRL VEB ZWD  myna.;
  format DATADATE date9.;/*changed line*/
  input DATADATE 
          AUD EUR CAD DKK HKD ILS JPY NZD NOK SGD KRW 
          SEK CHF TWD GBP ARS BRL CLP CNY COP HRK CZK 
          EGP HUF INR IDR KES MYR MXN MAD PKR PEN PHP 
          PLN RON ZAR LKR THB TRY VES ATS DEM EEK ESP 
          GRD LTL LVL NLG ROL SIT SKK TRL VEB ZWD  ;
run;

The result finally is here

DATADATE	AUD	    EUR	    CAD
30DEC1998	1.63132	0.8548	1.5511
31DEC1998	1.63026	0.85157	1.536
01JAN1999	1.63026	0.85157	1.536
04JAN1999	1.61577	0.84753	1.5251
05JAN1999	1.60772	0.84861	1.521
06JAN1999	1.58907	0.85324	1.5098
07JAN1999	1.58178	0.85682	1.5123
08JAN1999	1.57928	0.86558	1.5141

Warmest regards.

Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.
Reeza
Super User
Your informat reflects how the variable looks like currently.

You've specified
informat DATADATE DDMMYYN10.
which means it expects something like 01032021 - not sure where the extra 2 digits go. N in the format name means no separator.

But your data looks like:

04/03/2021 which is ddmmyy10. no "N".

It's worth spending some time understanding informats versus formats if that's a new concept to you.
Tom
Super User Tom
Super User

You are using a 20+ year old documentation link (Version 8 of SAS) and it has some typos.  It is missing the period at the end of the format specification, but that should not have throw you off because format specifications always include a period in SAS code. Otherwise the look like a variable name.  It is also showing the wrong output style for that format.  There are other typos in that documentation page.  See if you can find them.

Tom
Super User Tom
Super User

There is no need to use PROC IMPORT to read a text file.  Just write your own data step to read the file.

Especially if you know all of the variables should be numeric.

Copy the header row into your program editor.

Name,AUD,EUR,CAD,DKK,HKD,ILS,JPY,NZD,NOK,SGD,KRW,SEK,CHF,TWD,GBP,ARS,BRL,CLP,CNY,COP,HRK,CZK,EGP,HUF,INR,IDR,KES,MYR,MXN,MAD,PKR,PEN,PHP,PLN,RON,ZAR,LKR,THB,TRY,VES,ATS,DEM,EEK,ESP,GRD,LTL,LVL,NLG,ROL,SIT,SKK,TRL,VEB,ZWD

Change all the commas to spaces. And use it to generate an INPUT statement.

input Name :$20.
     AUD EUR CAD DKK HKD ILS JPY NZD NOK SGD KRW SEK CHF TWD GBP
     ARS BRL CLP CNY COP HRK CZK EGP HUF INR IDR KES MYR MXN MAD PKR
     PEN PHP PLN RON ZAR LKR THB TRY VES ATS DEM EEK ESP GRD LTL LVL
     NLG ROL SIT SKK TRL VEB ZWD
;

Now just add a DATA statement , an INFILE statement and a RUN statement and you will have a dataset.

If you want to deal with the NA strings in the numeric fields you could build a custom informat.  Or perhaps just modify the input buffer first.

data cars;
  infile 'E:\Harv 27th_Feb_2021\Report final\exchange_rate_Datastream_adjusted_csv.csv'
  dsd firstobs=2 truncover 
  ;
  input @;
  _infile_ = tranwrd(_infile_,',NA,',',,');
  input Name :$20.
     AUD EUR CAD DKK HKD ILS JPY NZD NOK SGD KRW SEK CHF TWD GBP
     ARS BRL CLP CNY COP HRK CZK EGP HUF INR IDR KES MYR MXN MAD PKR
     PEN PHP PLN RON ZAR LKR THB TRY VES ATS DEM EEK ESP GRD LTL LVL
     NLG ROL SIT SKK TRL VEB ZWD
  ;
run;

PS Why do you NAME values look like DD/MM/YYYY style dates?

 

 

Phil_NZ
Barite | Level 11

Thank @Tom 

Thank you for your step-by-step guidance. 

The first time I deal with CSV importing, but I have the feeling that we mostly need to import from the second observation because the first observations are always character. And because of that, we need to import manually for the first row. It is a feeling to me, hope that I do not misinterpret.

 

Regarding your concern. sorry, it is my fault I forget to change the "Name" character, it should be "DATADATE".

Warm regards.

 

Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.
Tom
Super User Tom
Super User

No.  PROC IMPORT is a guessing procedure for reading CSV files. It looks at the file and guesses how many variables there are , what names to use for each variable, what data type and when character what length, etc.  

If you just look at the file yourself you can usually do a much better job of guessing what is in the file.  You won't be fooled by the extra commas on the end of the line or extra empty lines at the end of the file.

 

So my step-by-step was to show you how easy it is to generate the code yourself using your brain as the guessing engine instead of PROC IMPORT.

 

And once you have the working code you can easily modify it.  For example you can change the name of the first variable and the data type and what informat it uses to read the values from the CSV file.

 

Also the error message generated by the extra commas is not saying there are many variables named "Name".  It is saying that you cannot have a variable without any name.  The string "Name" is just constant text in the generated error message. If you look carefully you will see there is an extra space after Name in the error message where the actual name read from the CSV file would go.

Reeza
Super User
And I'd bet that the LRECL is not set correctly as well.
Phil_NZ
Barite | Level 11

Hi @Reeza 

Thank you for pointing it out!

I follow this link to see what is LRECL. It documented that 

 

My97_0-1614849095824.png

 

So, it means that from 1-32767, there is only one number that n cannot receive is 32000. From my understanding, if we use  LRECL=32000, it means that LRECL=1/32767, did I explain reasonably?

 

Warmest regards.

Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.
Shmuel
Garnet | Level 18

@Phil_NZ - a little bit of theory - what is a file:

You probably know, at least the next two terms: VARIABLE and RECORD.

SAS uses the terms OBSERVATION (instead record).

LRECL stands for Logical RECord Length. 

RECFM stands for RECord ForMat.

 

If you sum the length of all varaibles in a record, including delimiters and control characters (like EOL = End Of Line) you get the Record Length.

Depending on record format - the record length may be a fix size or variable size. When reading a text file sas writes to the log the minimum size and the maximum size of the input record.

 

Data is written to Hard-Disk as a block of records. The total length of a block is called BLOK SIZE.

 

As that, LRECL=1/32767  is erroneous. LRECL can get an integer value which is dependent on the file you read/relate to. You need know the real length of the record when the file was written.

 

In some cases each block contains one record only - then LRECL is equal the Block Size. I think it happens when you write, for example, to a printer.

 

In your case, using PROC IMPORT or a data step with INFILE, you need not define the LRECL.

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 32 replies
  • 4468 views
  • 20 likes
  • 7 in conversation