Reading in csv file with date using infile and merging 2 files with di...

RLC07 · Posted 01-29-2021 12:56 PM

Hi Experts,

I have a 2-part question and could use your help in reading in reading in another csv file. This time my trouble is reading in the date variable in with the proper formatting while using 'infile', instead of 'proc import', because of the Polish characters.

The csv file that I need to read in has 3 variables. The dates are shown as mm/dd/yyyy (eg. 01/28/2021) but as you can see below, all the dates are missing.

%let path=C:\Users\SAS\Poland;
%let fname=Poland_cases.csv;
options DATESTYLE=mdy;
data master2;
length loc_name $20.;
format date mmddyy10.;
infile "&path\&fname" encoding='utf-8' dsd firstobs=2 dlm=',';
input loc_name $ infected
@10 date anydtdte12.;
run;

For this file, I have to use encoding ='utf-8' or instead of encoding ='wlatin2' or else the Polish characters wouldn't read in properly. For reading in the date, how should I change my code? I've tried a few different things already.

And the second part of my question is, will it be problematic for me to merge this file (encoding = 'utf-8') with another weekly file that was created using the daily raw data files that were read in using encoding='wlatin2'? Thanks very much!

Kurt_Bremser · Posted 01-29-2021 01:20 PM

Please post a few lines of the file by copy/pasting into a window opened with the </> button. Open the file with a text editor (not with Excel!) to do this.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

RLC07 · Posted 01-29-2021 01:47 PM

loc_name,infected,date
Dolnośląskie,1035,11/24/2020
Dolnośląskie,825,11/25/2020
Dolnośląskie,1112,11/26/2020
Dolnośląskie,1318,11/27/2020

Tom · Posted 01-29-2021 02:30 PM

You do not want to use pointer movement controls ( the @ in your code) or fixed format mode with a delimited file. Just let the INPUT statement parse the line based on the delimiter. Use the : (colon) modifier in front of any in-line informats included in your INPUT statement so that values are read in list mode.

Also I wouldn't use the ANYDTDTE informat, it will create different dates for the some input string based on the LOCALE setting. Better to use the informat that actually matches the order that month , day and year fields are expected to appear in the data. Plus if you use year month day order when displaying the dates then you won't confuse half your audience when the DAY number is 12 or less.

data want ;
  infile cards dsd truncover ;
  input loc_name :$20. infected date :mmddyy.;
  format date yymmdd10.;
cards;
Dolnośląskie,1035,11/24/2020
Dolnośląskie,825,11/25/2020
Dolnośląskie,1112,11/26/2020
Dolnośląskie,1318,11/27/2020
;

proc print;
run;

Results:

RLC07 · Posted 01-29-2021 04:32 PM

Thank you!

RLC07 · Posted 01-29-2021 04:38 PM

Thanks for the detailed explanation. I included '@' after finding an example code that also suggested I used the ANYDTDTE informat in order to read in dates from a csv in any format. Obviously, it didn't work. Thanks for providing me with the codes - it's been many years since I have used the card statement, but this read in my file accurately. Thanks again!

Kurt_Bremser · Posted 01-29-2021 02:29 PM

For the read part, this should do it:

%let path=C:\Users\SAS\Poland;
%let fname=Poland_cases.csv;

data master2;
length loc_name $20.;
format date mmddyy10.;
infile "&path\&fname" encoding='utf-8' dsd firstobs=2 dlm=',';
input
  loc_name
  infected
  date :mmddyy10.
;
run;

For the merge, get to know your data better. For instance, display the loc_name with a $hex format (length double that of the variable), so you can see the real content, and if you need to do a translation.

Also do a search for "sas convert character from latin to utf", you will find lots of helpful information.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

RLC07 · Posted 01-29-2021 04:34 PM

Thanks for the code; it read in the file properly. Thank you and thanks also for search term for finding info to the second part of my question.

Tom · Posted 01-30-2021 12:51 PM

If you are going to use mixed encodings then you will want to run your session using UTF-8 encoding. Otherwise it will be impossible to map all of the different characters represented in a different single byte encoding to the single byte encoding that your session is using.

But as long as the source datasets are properly created with the right metadata about what encoding the characters are using you should be able to combine them in your UTF-8 session. One thing to watch out for is whether the strings are defined long enough to store the UTF-8 codes. In a single byte encoding (like WLATIN1 or WLATIN2) each character takes one byte of storage. But when translating to UTF-8 codes some characters take up to 4 bytes.

RLC07 · Posted 02-02-2021 03:14 PM

@Tom Thanks for your advice! I ran the session using UTF-8 encoding and merging the two files worked like a charm. What happens if I were to run my session without using SAS 9.4 Unicode Support (e.g. SAS 9.4 English) but specify the encoding (UTF-8 or WLATIN2) when each file is read in and try to merge them? Assuming SAS 9.4 English with encoding specified can support reading in the characters correctly, will I have to convert the encoding for one of the files and make sure they are the same before the merge?

Tom · Posted 02-02-2021 04:10 PM

It depends on the actual characters in the file. If the file is using WLATIN2 and all of the characters that actually appear in the file can be mapped into your current SAS sessions encoding (let's assume that it is using WLATIN1) then you will have no problems. But as soon as one of the input text files has something that is not one of the 256 possible characters supported by WLATIN1 then you will get a null value. Now it the other file you are matching has the same character (prehaps represented in a different encoding) then it will also be mapped to the same strange thing. So the strings will match, but you will get some false positive matches where two string are different in just one character but neither character from the two sources can map into your current encoding.

RLC07 · Posted 02-04-2021 10:07 PM

Good to know! Sounds like there is no one straightforward answer other than
I must get to know and understand my data files. As always, I appreciate
the time you take to write out a detailed response. Thanks very much!

Reading in csv file with date using infile and merging 2 files with different encoding

Re: Reading in csv file with date using infile and merging 2 files with different encoding

Re: Reading in csv file with date using infile and merging 2 files with different encoding

Re: Reading in csv file with date using infile and merging 2 files with different encoding

Re: Reading in csv file with date using infile and merging 2 files with different encoding

Re: Reading in csv file with date using infile and merging 2 files with different encoding

Re: Reading in csv file with date using infile and merging 2 files with different encoding

Re: Reading in csv file with date using infile and merging 2 files with different encoding

Re: Reading in csv file with date using infile and merging 2 files with different encoding

Re: Reading in csv file with date using infile and merging 2 files with different encoding

Re: Reading in csv file with date using infile and merging 2 files with different encoding

Re: Reading in csv file with date using infile and merging 2 files with different encoding

Registration is open

Registration is open

SAS Training: Just a Click Away