Solved: Re: After proc import of csv file some columns in resulting SAS data s...

wlierman · Posted 06-10-2020 12:02 PM

Hello,

I am putting together a project for Patient Centered Primary Care Homes (PCPCH). So I import a lot of information from csv files using the usual Proc import coding.

On the last import which is not huge at all (664 obs - but will serve as my "go to" file for subsequent coding) the first 279 obs are missing some crucial CCO organization (Coordinated Care Organization) information. I did a lot of find / replace from a "master list" of CCO name, organization, practice_name ... and so on to fill in as many of the missing CCOs in the first 279 obs that I could.

After the import from csv to SAS I examined the data set and found that the file looks like the attached example (this is from excel not SAS but illustrates my problem). THe first little block of obs is what the data looks like in the csv file before import. The second is how the data appears in the SAS data file after import. The shaded rows are rows with missing CCOs, the unshaded are complete obs, for our purposes. As you can see only the first 'byte' of the both the numeric (CCO_TIN, Organization_TIN, Practice_TIN) and text str (CCO_unique_ID, Organization, Practice)

show in the data set. Recall, that the obs from 280 through 664 have/had all these column values populated; now rows 280 through 664 have only one byte showing.

When I do a 'View Columns' on the data set this is what is shown:

Name Type Length Format Informat

cco_unique_id text 4 4. 4.

cco_tin numeric 8 10. 10.

organization text 30 30. 30.

organization_tin numeric 8 10. 10.

practice text 63 65. 65.

practice_tin numeric 8 8. 8.

Your help in the past has always been right on the mark. I can certainly use your

expertise on this.

Thank you in advance.

Walt Lierman

Tom · Posted 06-10-2020 01:11 PM

PROC IMPORT has to guess how to define the variables based only on the information in this particular version of the file.

If you write your own data step you get to define the variables. This also useful when you identifiers that LOOK like numbers (like you *_TIN columns) that you should read as character strings because you will never actually use them with any arithmetic operations.

28 variables is not really much harder than 6. Just copy the column headers and convert them into the variable names you want to use. Could be as simple as just a single INPUT statement.

data want;
  infile 'myfile.csv' dsd truncover firstobs=2;
  input 
    cco_unique_id  :$4.
    cco_tin :$10.
    organization :$30.
    organization_tin :$10.
    practice :$63.
    practice_tin   :$10.
  ;
run;

If you have any DATE , TIME or DATETIME values then you will want to add a FORMAT statement to attach a format to those variables so they display in a human readable way. But normla character variables and numbers do not need to have formats attached to them. SAS already knows how to display those.

View solution in original post

Kurt_Bremser · Posted 06-10-2020 12:12 PM

If you need help with a CSV file, we need to see that file, not what Excel thinks of it. Excel does all kinds of crap when opening CSV files, so screenshots from it or Excel files are mostly useless.

Please open the CSV file with an editor (e.g. Notepad++) and copy/paste the contents into a window opened with this button:

Alternatively, rename the file with a .txt extension and attach it to your post.

Next, to reliably read text files, do not use PROC IMPORT, but a custom-written DATA step where you define the columns as documented.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

wlierman · Posted 06-10-2020 12:32 PM

Hello Kurt,
I liked your response. But I will send the entire csv file as a text.

Tom · Posted 06-10-2020 12:13 PM

Why are you using PROC IMPORT to read a text file that appears to have only 6 or 7 columns?

Just write a data step to read the file.

wlierman · Posted 06-10-2020 12:27 PM

Hi Tom,
There are about 22 columns in the file. The problems are centered on the 6 columns that are shown the post. The remainder of the columns lie to right.

Thanks.
Walt

Tom · Posted 06-10-2020 01:11 PM

PROC IMPORT has to guess how to define the variables based only on the information in this particular version of the file.

If you write your own data step you get to define the variables. This also useful when you identifiers that LOOK like numbers (like you *_TIN columns) that you should read as character strings because you will never actually use them with any arithmetic operations.

28 variables is not really much harder than 6. Just copy the column headers and convert them into the variable names you want to use. Could be as simple as just a single INPUT statement.

data want;
  infile 'myfile.csv' dsd truncover firstobs=2;
  input 
    cco_unique_id  :$4.
    cco_tin :$10.
    organization :$30.
    organization_tin :$10.
    practice :$63.
    practice_tin   :$10.
  ;
run;

If you have any DATE , TIME or DATETIME values then you will want to add a FORMAT statement to attach a format to those variables so they display in a human readable way. But normla character variables and numbers do not need to have formats attached to them. SAS already knows how to display those.

wlierman · Posted 06-10-2020 01:51 PM

Thank you Tom.

Great solution. Kurt is looking over the entire csv file. He has the same concerns (and perhaps a

few more) about using the Proc Import route.

I appreciate your time and help.

Walt

wlierman

wlierman · Posted 06-10-2020 01:03 PM

Hello Kurt,

I took the second method that you suggested: renaming the file with a .txt extension and attaching it.

Thank you for your time and effort. I appreciate it.

Walt Lierman

Tom · Posted 06-10-2020 01:37 PM

I really think that using PROC IMPORT is the issue for this file. Because the first few columns have a lot missing values so any attempt by PROC IMPORT to guess at how to define them is going to be wrong, especially if you ultimate goal is to replace the empty values with real values via some algorithm. Using the GUESSINGROWS=MAX statement in PROC IMPORT will help a little, but it will still only be able to guess how to define the variables based on the values seen. So just take the five minutes it will take to copy the column header and turn it into an INPUT or LENGTH statement so you can properly define the variables.

Minor point your file isn't technically a Comma Separated Values file since the delimiter being used is a TAB instead of comma, but that doesn't really impact how to deal with the file other that you will need to specify the delimiter on your INFILE statement.

wlierman · Posted 06-10-2020 01:57 PM

Tom,
Thank you for your insights and corrections.
Walt Liermanwlierman

After proc import of csv file some columns in resulting SAS data set missing complete text strings

Re: After proc import of csv file some columns in resulting SAS data set missing complete text strin

Re: After proc import of csv file some columns in resulting SAS data set missing complete text strin

Re: After proc import of csv file some columns in resulting SAS data set missing complete text strin

Re: After proc import of csv file some columns in resulting SAS data set missing complete text strin

Re: After proc import of csv file some columns in resulting SAS data set missing complete text strin

Re: After proc import of csv file some columns in resulting SAS data set missing complete text strin

Re: After proc import of csv file some columns in resulting SAS data set missing complete text strin

Re: After proc import of csv file some columns in resulting SAS data set missing complete text strin

Re: After proc import of csv file some columns in resulting SAS data set missing complete text strin

Re: After proc import of csv file some columns in resulting SAS data set missing complete text strin

Registration is open

Registration is open

SAS Training: Just a Click Away