BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sks521
Quartz | Level 8

Hi folks,

 

I am reading a csv file in SAS. 

 

One of the variables has values like, for e.g. 1234567891011121311415 but SAS is reading it as for e.g. '1.50E+18'. Because of this I am having problems. This variable is supposed to have a unique value for each observation. There are 77650 observations in total but when I run a programme to see how many of these values are duplicates, it gives me only 38 unique values which is not true.

 

I have to merge two data sets based on this variable but because of this trouble I cannot as SAS is not reading the values properly. Please can anyone suggest what to do.

 

Ta.

1 ACCEPTED SOLUTION

Accepted Solutions
Amir
PROC Star

In case it has not been made clear from other posts, the answer to your question:

My question would be, do I have to specify dummy for each variable I don't want in my data set?

is: if they are between variables you want then yes, if they are after the last variable you want then no.

 

Where there are multiple consecutive variables that are not required between variables you do require then you can, for example, use the technique shown in the code I previously posted, e.g., (3 * dummy) to skip 3 unwanted variables or (12 * dummy) to skip 12 unwanted variables, etc.

 

You will need to count the columns yourself to tell SAS how many to read past, as it reads left to right under these circumstances.

 

 

Kind regards,

Amir.

View solution in original post

18 REPLIES 18
sks521
Quartz | Level 8

Hi Kurt I am using below data step. The variable I am talking about is apc1_apc2_link;

data APC1;
infile 'I:\CHCBradford\cBradford_106120200120095954\raw_SUS_20200117\raw_APC1rev.CSV' dsd truncover lrecl=32767 FIRSTOBS=2;
length apc1_apc2_link $32 pkid $32 Ethnicity $2 Sex $1 Postcode $5 Org_Code $5 Admin_Cat 3 Patient_Class 3 Admi_Method $2 Dis_Destination $2 Dis_Method $2
Admisource $2 Start_Date 4 Discharge_Date 4 Episode_N 3 Neonatal_care 3 Operation_Status 3 Diagnosis_scheme 3 Diagnosis_01-Diagnosis_06 $6
;

informat
Start_Date
yymmdd8.
Discharge_Date
yymmdd8.
;
input
apc1_apc2_link pkid Ethnicity Sex Postcode Org_Code Admin_Cat Patient_Class Admi_Method Dis_Destination Dis_Method
Admisource Start_Date Discharge_Date Episode_N Neonatal_care Operation_Status Diagnosis_scheme Diagnosis_01-Diagnosis_06
;
run;

Kurt_Bremser
Super User

Then what you said in your first post is not possible, as SAS will NOT convert a string to exponential display on its own:

data APC1;
length apc1_apc2_link $32;
input apc1_apc2_link;
datalines;
1234567891011121311415
;

proc print data=apc1 noobs;
run;

Result:

    apc1_apc2_link

1234567891011121311415
sks521
Quartz | Level 8

You are probably right! I am reading in a CSV. This has values like 1.49512E+18 but when I click on the cell, I am able to read it as '1495123836168990000'. when I bring it into SAS, it reads it as I had mentioned.

 

I tried to change the format in CSV from 'General' to 'Number' but when I save it, the file doesn't save the specified format.

 

Ta

 

Amir
PROC Star

Consider changing the format from 'General' to 'Text', and as mentioned elsewhere, examine the data in a text editor to see what is present in the csv you are reading.

 

Also try running the data step that @Kurt_Bremser supplied and place your values after the datalines statement and see what you get.

 

 

Kind regards,

Amir.

Kurt_Bremser
Super User

NEVER (as in not even when hell freezes over so cold that the brimstone is superconducting) inspect a csv file you get with Excel. Use a proper text editor instead (like notepad++). And if you save the data again from Excel, you'll have lost some, as Excel has the same limitations on numbers that SAS has (maximum precision 15 to 16 decimal digits).

 

If your data originates from Excel initially, format it as number with no decimals, then export. This will keep all digits and omit the scientific notation.

Amir
PROC Star

Hi @sks521, double check one of the "wrong" values in the csv, by opening it in a text editor (e.g., Notepad++, Notepad, etc.), as it might have previously been opened in Excel which converted the data to scientific notation when saving - as well as possibly removing any leading zeros in your data.

 

I was about to post a simple data step like @Kurt_Bremser as I was going to make the same point as him.

 

 

Kind regards,

Amir.

sks521
Quartz | Level 8

Thanks Amir, reading data into SAS got a bit complicated here!

 

I have a bigger problem than what I have stated in my questions previously.

 

The data file is a CSV and its the most raw form I have ever used or seen.

 

The data set has more than 200 variables with names like 'Present On Admission Indicator Diag 21' etc. having spaces and special characters.

 

I only need a few variables for analysis and don't want to bring all variables into SAS. What's the best way to go about it?

 

If I remove variables from CSV using excel, it doesn't save the formats as in original data like it converts long values into for e.g. 145E+18 etc.

 

Your suggestions will me much appreciated.

Ta

Amir
PROC Star

Hi @sks521, breaking down your post:

I have a bigger problem than what I have stated in my questions previously.

1) Does this mean your original problem is solved?

 

The data file is a CSV and its the most raw form I have ever used or seen.

2) You can see the csv data and are finding it difficult. We cannot see the CSV data so it will be difficult for us too! Please open the csv in a text editor (e.g., Notepad++, notepad, etc.), not Excel, and copy and paste (not screenshot) a few rows of the data into a post.

 

The data set has more than 200 variables with names like 'Present On Admission Indicator Diag 21' etc. having spaces and special characters.

3) The names of the variables shouldn't matter as your code already uses firstobs=2 to ignore the header row.

 

I only need a few variables for analysis and don't want to bring all variables into SAS. What's the best way to go about it?

4) I would try using a "dummy" variable to input and ignore the fields you don't want ("dummy" can be dropped)., For example you can try running the following, and then change it to work with your data:

 

data want(drop = dummy);
   infile datalines firstobs=2 dsd;

   input
      apc1_apc2_link : $char32. /* read 1st field            */
      dummy          : $char1.  /* ignore 2nd field          */
      pkid           : $char32. /* read 3rd field            */
      (3 * dummy)    (:$char1.) /* ignore fields 4, 5 & 6    */ 
      admin_cat      : 8.       /* read 7th field as numeric */
   ;

   datalines;
first column is wanted,second column isn't wanted,third column is wanted,fourth column isn't wanted,fifth column isn't wanted,sixth column isn't wanted,seventh column is wanted,eighth column isn't wanted
1234567890123456789012,2,abc,4,5,6,701,8
0987654321098765432109,2,abc,4,5,6,702,8
1122334455667788990011,2,abc,4,5,6,703,8
;

 

If I remove variables from CSV using excel, it doesn't save the formats as in original data like it converts long values into for e.g. 145E+18 etc.

5) I would leave the csv as it is and use SAS to read what is required and how it is required.

 

Hope that helps.

 

 

Kind regards,

Amir.

sks521
Quartz | Level 8

Thanks Amir,

 

1) Does this mean your original problem is solved?

 

No, it's not 

 

2) You can see the CSV data and are finding it difficult. We cannot see the CSV data so it will be difficult for us too! Please open the csv in a text editor (e.g., Notepad++, notepad, etc.), not Excel, and copy and paste (not screenshot) a few rows of the data into a post.

 

I am pasting a few data lines here

1495123836168993500,80863,200612,,,,,A,"",,1,,,,,,,,,,,LS28,03C,03C,,,8,20170120,20170120,,,,,,,,,,,,,101005445,1,1,21,19,1,19,20170120,09:00:00,20170120,16:49:00,9,1,,1,8,8,,20170120,09:00:00,20170120,16:49:00,0,0,XXXXXX,RAE,9,RAE,,02R,,C5195478,160,160,160,,,2,S611,,S6250,,V498,,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2,S662,20170120,,,,,Z895,20170120,,,,,Z942,20170120,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,002,,4,RAE,,,,,,,002,,1,RAE,,,,,,,20170120,09:00:00,20170120,16:49:00,"",,,"",,,,,,,,"",,"","",,,"",,,,,,,,"",,"","",,,"",,,,,,,,"",,"",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1516883626006995200,890558,200102,,,,,J,"",,2,,,,,,,,,,,BD7 ,02R,02R,,,9,20170906,20170906,,,,,,,,,,,,,101086017,1,1,21,19,1,19,20170906,01:01:00,20170906,21:44:00,N,1,,1,8,8,,20170906,01:01:00,20170906,21:44:00,0,0,XXXXXX,RAE,9,RAE,,02R,,C4729819,302,300,300,,,2,R568,,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,601,,4,RAE,,,,,,,601,,1,RAE,,,,,,,20170906,01:01:00,20170906,21:44:00,"",,,"",,,,,,,,"",,"","",,,"",,,,,,,,"",,"","",,,"",,,,,,,,"",,"",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1559124729002961400,600284,201703,,,2,,A,"",,1,,,,,,,,,,,BD18,02R,02R,,,8,20180926,20180926,,,,,,,,,,,,,CE:35485409,1,1,22,19,1,19,20180926,13:22:00,20180926,15:37:00,9,1,,1,8,8,,20180926,13:22:00,20180926,15:37:00,0,0,XXXXXX,"",9,RAE,,02R00,,C3459982,420,420,420,,,2,J039,,B000,,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,32 CCDA BRI,,4,RAE01,,,,,,,32 CCDA BRI,,1,RAE01,,,,,,,20180926,13:22:00,20180926,15:37:00,"",,,"",,,,,,,,"",,"","",,,"",,,,,,,,"",,"","",,,"",,,,,,,,"",,"",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1516802315006998500,238139,201110,,,,,99,"",,1,,,,,,,,,,,BD2 ,02R,02R,,,8,20170602,20170602,,,,,,,,,,,,,101051887,1,2,11,19,1,19,20170602,07:30:00,20170602,15:40:00,N,1,,1,8,8,,20170602,07:30:00,20170602,15:40:00,0,0,XXXXXX,RAE,9,RAE,,02R,,C2304809,120,120,120,,,2,G473,,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2,F341,20170602,,,,,E201,20170602,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,002,,4,RAE,,,,,,,002,,1,RAE,,,,,,,20170602,07:30:00,20170602,15:40:00,"",,,"",,,,,,,,"",,"","",,,"",,,,,,,,"",,"","",,,"",,,,,,,,"",,"",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1526736714007997400,42290,200408,,,,,J,"",,2,,,,,,,,,,,BD3 ,02W,02W,,,9,20171005,20171005,,,,,,,,,,,,,CE:35017268,1,1,21,19,1,19,20171005,18:02:00,20171009,19:00:00,N,1,,2,8,8,,20171005,18:02:00,20171005,18:46:00,0,0,XXXXXX,"",9,RAE,,02W,,C3459982,420,420,420,,,2,R55X,,R064,,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"",,,"",,,,,,,"",,,"",,,,,,,,"",,"","",,,"",,,,,,,,"",,"","",,,"",,,,,,,,"",,"","",,,"",,,,,,,,"",,"",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1526736714007997400,42290,200408,,,,,J,"",,2,,,,,,,,,,,BD3 ,02W,02W,,,9,20171005,20171009,,,,,,,,,,,,,CE:35017268,1,1,21,19,1,19,20171005,18:02:00,20171009,19:00:00,N,2,,1,8,1,,20171005,18:46:00,20171009,19:00:00,0,0,XXXXXX,"",9,RAE,,02W,,C4527804,420,420,420,,,2,R55X,,R064,,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2,U052,20171006,,,,,Y981,20171006,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,32 CCDA BRI,,4,RAE01,,,,,,,32 CCDA BRI,,1,RAE01,,,,,,,20171005,18:46:00,20171006,15:18:00,30 CYPU BRI,,1,RAE01,,,,,,,20171006,15:18:00,20171009,19:00:00,"",,,"",,,,,,,,"",,"","",,,"",,,,,,,,"",,"",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1558443886001977300,187675,200408,,,,,N,"",,1,,,,,,,,,,,BD9 ,02W,02W,,,8,20190214,20190215,,,,,,,,,,,,,CE:35668648,1,1,22,19,1,19,20190214,11:57:00,20190215,12:12:00,9,1,,1,8,8,,20190214,11:57:00,20190215,12:12:00,0,0,XXXXX=,"",9,RAE,,02W00,,C4617556,420,420,420,,,2,J029,,E559,,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,32 CCDA BRI,,4,RAE01,,,,,,,32 CCDA BRI,,1,RAE01,,,,,,,20190214,11:57:00,20190215,12:12:00,"",,,"",,,,,,,,"",,"","",,,"",,,,,,,,"",,"","",,,"",,,,,,,,"",,"",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1526737932007990500,1076578,200506,,,,,A,"",,2,,,,,,,,,,,HX2 ,02T,02T,,,9,20171122,20171122,,,,,,,,,,,,,CE:35083945,1,1,21,19,1,19,20171122,01:36:00,20171122,13:12:00,N,1,,2,8,8,,20171122,01:36:00,20171122,02:07:00,0,0,XXXXX=,"",9,RAE,,02T,,C4617556,420,420,420,,,2,R458,,Z915,,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"",,,"",,,,,,,"",,,"",,,,,,,,"",,"","",,,"",,,,,,,,"",,"","",,,"",,,,,,,,"",,"","",,,"",,,,,,,,"",,"",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1526737932007988e3,1076578,200506,,,,,A,"",,2,,,,,,,,,,,HX2 ,02T,02T,,,9,20171122,20171122,,,,,,,,,,,,,CE:35083945,1,1,21,19,1,19,20171122,01:36:00,20171122,13:12:00,N,2,,1,8,8,,20171122,02:07:00,20171122,13:12:00,0,0,XXXXX=,"",9,RAE,,02T,,C3668300,420,420,420,,,2,R458,,Z915,,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,"",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,"",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,32 CCDA BRI,,4,RAE01,,,,,,,32 CCDA BRI,,1,RAE01,,,,,,,20171122,02:07:00,20171122,13:12:00,"",,,"",,,,,,,,"",,"","",,,"",,,,,,,,"",,"","",,,"",,,,,,,,"",,"",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

3) The names of the variables shouldn't matter as your code already uses firstobs=2 to ignore the header row.

 

So, are you suggesting to not change variable names and use them as it is in further code?

 

4) 4) I would try using a "dummy" variable to input and ignore the fields you don't want ("dummy" can be dropped)., For example you can try running the following, and then change it to work with your data:

 

Apologies but I am unable to understand the 'dummy' code

 

Ta

Tom
Super User Tom
Super User

3) The names of the variables shouldn't matter as your code already uses firstobs=2 to ignore the header row.

 

So, are you suggesting to not change variable names and use them as it is in further code?

 

4) 4) I would try using a "dummy" variable to input and ignore the fields you don't want ("dummy" can be dropped)., For example you can try running the following, and then change it to work with your data:

 

You can use whatever names you want for the variables you create when reading the text file. What values they have in the header row is just want the system that created the CSV file used.  You don't have to use the same names in your SAS dataset.  Use names that make sense for the data and work as SAS variable names.

 

The purpose of the dummy variable is just for SAS to have some place to put the data for the columns you don't want to keep.  Since your file is a series of delimited lines you cannot read the value for the 10th field in a line without first reading the 9 values before it.

 

200 variables is a lot of variables.  Is there some pattern to the variables?  For example does the data consist first some key variables and have series of say 10 or so variables that are repeated multiple times.  What is the pattern of repetition?  If you have 3 variables that repeat 4 times is the pattern AAAABBBBCCCC or is is ABCABCABCABC ?

 

To get a sense of what is in the file it might just be easiest to read it as 200 character variables and then look at the values to figure out what each variable contains.  For example just name the variables X1, X2, etc.

data test;
  infile txt dsd truncover firstobs=2;
  length x1-x200 $100;
  input x1-x200;
run;
Amir
PROC Star

Thanks for the response and data.

 

1) I tried reading your data with the code you posted and can see the data in SAS variable apc1_apc2_link is the same as the csv.

 

2) The first field in the last record has an "e" in it, is this correct? If not then it looks like you need to inform whoever gave you the csv.

 

3) I think this point needs clarification, you originally said:

The data set has more than 200 variables with names like 'Present On Admission Indicator Diag 21' etc. having spaces and special characters.

Are you referring to the field headings in the csv or the SAS variable names? SAS variable names don't have to be the same as csv field heading names.

 

4) As I understand it, there are some csv fields you want to read and some you want to ignore (correct me if I'm wrong). SAS will read the csv data from left to right, in the example code the first field is read as apc1_apc2_link, the second field is not required so is read as dummy, the third field is read as pkid, the next 3 fields are not required so each one is read as dummy (3 x dummy), the next field is read as admin_cat, and the remaining fields are not input as they are not needed. The dummy variable is dropped in the output data set. The datalines statement just allows raw data to be placed in the data step. If this still does not make sense then please be specific about which statements / lines are unclear.

 

 

Kind regards,

Amir.

sks521
Quartz | Level 8

Thanks Amir,

 

1) I tried reading your data with the code you posted and can see the data in SAS variable apc1_apc2_link is the same as the csv.

 

I don't know why mine isn't working

 

2) The first field in the last record has an "e" in it, is this correct? If not then it looks like you need to inform whoever gave you the csv.

 

I don't know but I'll have to check in with the person who gave me the data

 

3) I think this point needs clarification, you originally said:

The data set has more than 200 variables with names like 'Present On Admission Indicator Diag 21' etc. having spaces and special characters.

Are you referring to the field headings in the csv or the SAS variable names? SAS variable names don't have to be the same as csv field heading names.

 

Yes, these are the field heading in the CSV. 

 

4) As I understand it, there are some csv fields you want to read and some you want to ignore (correct me if I'm wrong). SAS will read the csv data from left to right, in the example code the first field is read as apc1_apc2_link, the second field is not required so is read as dummy, the third field is read as pkid, the next 3 fields are not required so each one is read as dummy (3 x dummy), the next field is read as admin_cat, and the remaining fields are not input as they are not needed. The dummy variable is dropped in the output data set. The datalines statement just allows raw data to be placed in the data step. If this still does not make sense then please be specific about which statements / lines are unclear.

 

In your dummy code, do I have to specify the format and length of each variable that I don't need?

 

Thanks

 

 

Amir
PROC Star

Thanks for numbering your responses, it helps.

 

1) Check the log for any messages. Perhaps show the log by copying and pasting it using the "Insert Code" button "{i}".

 

2) OK, awaiting response.

 

3) In that case, you can use any SAS variable name you think sensible to read in the fields you want.

 

4) I just used a format of $char1. each time as it only reads one character. If you specify a numeric format and the data was non-numeric then this would cause a problem. After you have input the last variable you need you do not have to read any more "dummy" variables.

 

5) Already addressed.

 

 

Kind regards,

Amir.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 18 replies
  • 1676 views
  • 1 like
  • 4 in conversation