BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.

Hello everyone,

 

I have A CSV file containing the list of individuals with their ICD diagnosis on multiple dates. The data looks as below:

 

ID ICD VERSION ICD CODE DIAGNOSIS DATE
M0058 ICD-9 733.42 19-06-14
M0058 ICD-9 733.42 19-09-14
M0058 ICD-9 733.42 18-12-14
M0058 ICD-9 733.42 03-12-15
M0058 ICD-9 733.42 24-12-15
M0058 ICD-9 733.42 31-12-15

 

I would like to create a new SAS dataset with the list of observations along with their first ICD diagnosis. For the above case, I would like to retain only "M000000058" "ICD-9" "733.42" "19-06-14"
Thanks in advance

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

See?

You showed us what Excel showed you, and that sparked my comment about 2-digit years. Now we see that the file has in fact very sensible date values in ISO format, as it should be.

Moral: never use Excel to look at text files.

Use this code to import that file:

data import;
infile "your_file_name" dlm="," dsd firstobs=2 truncover;
input
  id :$10.
  icd_version :$10.
  icd_code :$8.
  diagnosis_date :yymmdd10.
;
format diagnosis_date yymmdd10.;
run;

If that throws NOTEs about invalid values for diagnosis_date, post one of those messages (including the listing of the raw data line) as you posted the file data. We can then see what we need to do to avoid that NOTE, as you want a clean log (Maxim 25).

I suspect that all the data was imported as character because PROC IMPORT was misled by the quotes around the date values.

View solution in original post

12 REPLIES 12
Kurt_Bremser
Super User

Sort by id and date, then run

data want;
set have;
by id;
if first.id;
run;

or (no preceding sort needed, but may perform worse)

proc sql;
create table want as
  select *
  from have
  group by id
  having date = min(date)
;
quit;
mantubiradar19
Quartz | Level 8
Thanks, Kurt, After importing the CSV the "DIAGNOSIS_DATE" has a character type. Do I need to change this date format before sorting?
Kurt_Bremser
Super User

@mantubiradar19 wrote:
Thanks, Kurt, After importing the CSV the "DIAGNOSIS_DATE" has a character type. Do I need to change this date format before sorting?

Absolutely. Obvious date values must always be imported as SAS date values to be useful.

 

Also, anybody who sends you dates with two-digit years needs to be beat over the head with a medium-sized reinforced-concrete pillar. After Y2K, I consider anybody doing this as having an IQ lower than the number of years passed since 2000.

 

To get help for correctly reading your data, please open the CSV file with an editor (NOT WITH EXCEL!!) and copy/paste the first 10 lines into a window opened with this button:

Bildschirmfoto 2020-04-07 um 08.32.59.jpg

mantubiradar19
Quartz | Level 8
"ID","ICD VERSION","ICD CODE","DIAGNOSIS DATE"
"M0058","ICD-9","733.42","2014/06/19"
"M0058","ICD-9","733.42","2014/09/19"
"M0058","ICD-9","733.42","2014/12/18"
"M0058","ICD-9","733.42","2015/12/03"
"M0058","ICD-9","733.42","2015/12/24"
"M0058","ICD-9","733.42","2015/12/31"
"M00331","ICD-10-CM","M81.0","2020/04/03"
"M00331","ICD-10-CM","M81.0","2020/04/07"
"M00331","ICD-10-CM","M81.0","2020/04/10"

Here you go! 

Kurt_Bremser
Super User

See?

You showed us what Excel showed you, and that sparked my comment about 2-digit years. Now we see that the file has in fact very sensible date values in ISO format, as it should be.

Moral: never use Excel to look at text files.

Use this code to import that file:

data import;
infile "your_file_name" dlm="," dsd firstobs=2 truncover;
input
  id :$10.
  icd_version :$10.
  icd_code :$8.
  diagnosis_date :yymmdd10.
;
format diagnosis_date yymmdd10.;
run;

If that throws NOTEs about invalid values for diagnosis_date, post one of those messages (including the listing of the raw data line) as you posted the file data. We can then see what we need to do to avoid that NOTE, as you want a clean log (Maxim 25).

I suspect that all the data was imported as character because PROC IMPORT was misled by the quotes around the date values.

mantubiradar19
Quartz | Level 8
Thank you so much for your advice Kurt, let me try this
mantubiradar19
Quartz | Level 8
I was able to import the CSV file using your code and sort by id and diagnosis_date to obtain unique observations with the earliest diagnosis date (using Sort by id and date, then run). For your reference, I used the following import code earlier:

%let path=PATH;
options validvarname=v7;

proc import datafile="&path\My_CSV.csv"
out=MY_CSV
dbms=csv;
guessingrows=max;
run;quit;
mantubiradar19
Quartz | Level 8
Can you please tell me what is the function of "dsd firstobs=2 truncover;"?
Kurt_Bremser
Super User

DSD means "delimiter sensitive data". It has three effects: quotes around values are removed, delimiters within the quoted string a re disregarded, and two or more delimiters in succession mean that there are missing values

 

FIRSTOBS= instructs the data step to skip lines at the beginning, FIRSTOBS=2 means skipping a single header line

 

TRUNCOVER controls what should happen if not enough data is found to fill all columns in the INPUT statement. TRUNCOVER means that any shorter value is still used to fill a column, and any subsequent columns are set to missing.

 

All these options are described in detail in the documentation of the INFILE Statement.

PeterClemmensen
Tourmaline | Level 20

So you still want as many observations as in the original data?

mantubiradar19
Quartz | Level 8
I want one observation per line with the earliest date for each observation

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 12 replies
  • 1337 views
  • 3 likes
  • 3 in conversation