Hello,
I am putting together a project for Patient Centered Primary Care Homes (PCPCH). So I import a lot of information from csv files using the usual Proc import coding.
On the last import which is not huge at all (664 obs - but will serve as my "go to" file for subsequent coding) the first 279 obs are missing some crucial CCO organization (Coordinated Care Organization) information. I did a lot of find / replace from a "master list" of CCO name, organization, practice_name ... and so on to fill in as many of the missing CCOs in the first 279 obs that I could.
After the import from csv to SAS I examined the data set and found that the file looks like the attached example (this is from excel not SAS but illustrates my problem). THe first little block of obs is what the data looks like in the csv file before import. The second is how the data appears in the SAS data file after import. The shaded rows are rows with missing CCOs, the unshaded are complete obs, for our purposes. As you can see only the first 'byte' of the both the numeric (CCO_TIN, Organization_TIN, Practice_TIN) and text str (CCO_unique_ID, Organization, Practice)
show in the data set. Recall, that the obs from 280 through 664 have/had all these column values populated; now rows 280 through 664 have only one byte showing.
When I do a 'View Columns' on the data set this is what is shown:
Name Type Length Format Informat
cco_unique_id text 4 4. 4.
cco_tin numeric 8 10. 10.
organization text 30 30. 30.
organization_tin numeric 8 10. 10.
practice text 63 65. 65.
practice_tin numeric 8 8. 8.
Your help in the past has always been right on the mark. I can certainly use your
expertise on this.
Thank you in advance.
Walt Lierman
PROC IMPORT has to guess how to define the variables based only on the information in this particular version of the file.
If you write your own data step you get to define the variables. This also useful when you identifiers that LOOK like numbers (like you *_TIN columns) that you should read as character strings because you will never actually use them with any arithmetic operations.
28 variables is not really much harder than 6. Just copy the column headers and convert them into the variable names you want to use. Could be as simple as just a single INPUT statement.
data want;
infile 'myfile.csv' dsd truncover firstobs=2;
input
cco_unique_id :$4.
cco_tin :$10.
organization :$30.
organization_tin :$10.
practice :$63.
practice_tin :$10.
;
run;
If you have any DATE , TIME or DATETIME values then you will want to add a FORMAT statement to attach a format to those variables so they display in a human readable way. But normla character variables and numbers do not need to have formats attached to them. SAS already knows how to display those.
If you need help with a CSV file, we need to see that file, not what Excel thinks of it. Excel does all kinds of crap when opening CSV files, so screenshots from it or Excel files are mostly useless.
Please open the CSV file with an editor (e.g. Notepad++) and copy/paste the contents into a window opened with this button:
Alternatively, rename the file with a .txt extension and attach it to your post.
Next, to reliably read text files, do not use PROC IMPORT, but a custom-written DATA step where you define the columns as documented.
Why are you using PROC IMPORT to read a text file that appears to have only 6 or 7 columns?
Just write a data step to read the file.
PROC IMPORT has to guess how to define the variables based only on the information in this particular version of the file.
If you write your own data step you get to define the variables. This also useful when you identifiers that LOOK like numbers (like you *_TIN columns) that you should read as character strings because you will never actually use them with any arithmetic operations.
28 variables is not really much harder than 6. Just copy the column headers and convert them into the variable names you want to use. Could be as simple as just a single INPUT statement.
data want;
infile 'myfile.csv' dsd truncover firstobs=2;
input
cco_unique_id :$4.
cco_tin :$10.
organization :$30.
organization_tin :$10.
practice :$63.
practice_tin :$10.
;
run;
If you have any DATE , TIME or DATETIME values then you will want to add a FORMAT statement to attach a format to those variables so they display in a human readable way. But normla character variables and numbers do not need to have formats attached to them. SAS already knows how to display those.
Thank you Tom.
Great solution. Kurt is looking over the entire csv file. He has the same concerns (and perhaps a
few more) about using the Proc Import route.
I appreciate your time and help.
Walt
wlierman
Hello Kurt,
I took the second method that you suggested: renaming the file with a .txt extension and attaching it.
Thank you for your time and effort. I appreciate it.
Walt Lierman
I really think that using PROC IMPORT is the issue for this file. Because the first few columns have a lot missing values so any attempt by PROC IMPORT to guess at how to define them is going to be wrong, especially if you ultimate goal is to replace the empty values with real values via some algorithm. Using the GUESSINGROWS=MAX statement in PROC IMPORT will help a little, but it will still only be able to guess how to define the variables based on the values seen. So just take the five minutes it will take to copy the column header and turn it into an INPUT or LENGTH statement so you can properly define the variables.
Minor point your file isn't technically a Comma Separated Values file since the delimiter being used is a TAB instead of comma, but that doesn't really impact how to deal with the file other that you will need to specify the delimiter on your INFILE statement.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.