BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
wlierman
Lapis Lazuli | Level 10

Hello,

I am putting together a project for Patient Centered Primary Care Homes (PCPCH).  So I import a lot of information from csv files using the usual Proc import coding.

On the last import which is not huge at all (664 obs - but will serve as my "go to" file for subsequent coding) the first 279 obs are missing some crucial CCO organization (Coordinated Care Organization) information.  I did a lot of find / replace from a "master list" of CCO name, organization, practice_name ... and so on to fill in as many of the missing CCOs in the first 279 obs that I could.

 

After the import from csv to SAS I examined the data set and found that the file looks like the attached example (this is from excel not SAS but illustrates my problem). THe first little block of obs is what the data looks like in the csv file before import.  The second is how the data appears in the SAS data file after import. The shaded rows are rows with missing CCOs, the unshaded are complete obs, for our purposes.  As you can see only the first 'byte' of the both the numeric (CCO_TIN, Organization_TIN, Practice_TIN) and text str (CCO_unique_ID, Organization, Practice)

show in the data set.  Recall, that the obs from 280 through 664 have/had all these column values populated; now rows 280 through 664 have only one  byte showing.

When I do a 'View Columns' on the data set this is what is shown:

Name                    Type                  Length     Format       Informat

cco_unique_id       text                       4               4.                4.

cco_tin                   numeric                8              10.             10.

organization           text                     30              30.             30.

organization_tin     numeric                8              10.             10.

practice                  text                     63              65.             65.

practice_tin            numeric                8                8.               8.

 

Your help in the past has always been right on the mark.  I can certainly use your

expertise on this.

 

Thank you in advance.

 

Walt Lierman

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

PROC IMPORT has to guess how to define the variables based only on the information in this particular version of the file.  

If you write your own data step you get to define the variables.  This also useful when you identifiers that LOOK like numbers (like you *_TIN columns) that you should read as character strings because you will never actually use them with any arithmetic operations.

28 variables is not really much harder than 6.  Just copy the column headers and convert them into the variable names you want to use. Could be as simple as just a single INPUT statement.

data want;
  infile 'myfile.csv' dsd truncover firstobs=2;
  input 
    cco_unique_id  :$4.
    cco_tin :$10.
    organization :$30.
    organization_tin :$10.
    practice :$63.
    practice_tin   :$10.
  ;
run;

If you have any DATE , TIME or DATETIME values then you will want to add a FORMAT statement to attach a format to those variables so they display in a human readable way.  But normla character variables and numbers do not need to have formats attached to them.  SAS already knows how to display those.

View solution in original post

9 REPLIES 9
Kurt_Bremser
Super User

If you need help with a CSV file, we need to see that file, not what Excel thinks of it. Excel does all kinds of crap when opening CSV files, so screenshots from it or Excel files are mostly useless.

 

Please open the CSV file with an editor (e.g. Notepad++) and copy/paste the contents into a window opened with this button:

Bildschirmfoto 2020-04-07 um 08.32.59.jpg

Alternatively, rename the file with a .txt extension and attach it to your post.

Next, to reliably read text files, do not use PROC IMPORT, but a custom-written DATA step where you define the columns as documented.

wlierman
Lapis Lazuli | Level 10
Hello Kurt,
I liked your response.  But I will send the entire csv file as a text.

Tom
Super User Tom
Super User

Why are you using PROC IMPORT to read a text file that appears to have only 6 or 7 columns?

Just write a data step to read the file.

wlierman
Lapis Lazuli | Level 10
Hi Tom,
There are about 22 columns in the file.  The problems are centered on the 6 columns that are shown the post. The remainder of the columns lie to right.

Thanks.
Walt


Tom
Super User Tom
Super User

PROC IMPORT has to guess how to define the variables based only on the information in this particular version of the file.  

If you write your own data step you get to define the variables.  This also useful when you identifiers that LOOK like numbers (like you *_TIN columns) that you should read as character strings because you will never actually use them with any arithmetic operations.

28 variables is not really much harder than 6.  Just copy the column headers and convert them into the variable names you want to use. Could be as simple as just a single INPUT statement.

data want;
  infile 'myfile.csv' dsd truncover firstobs=2;
  input 
    cco_unique_id  :$4.
    cco_tin :$10.
    organization :$30.
    organization_tin :$10.
    practice :$63.
    practice_tin   :$10.
  ;
run;

If you have any DATE , TIME or DATETIME values then you will want to add a FORMAT statement to attach a format to those variables so they display in a human readable way.  But normla character variables and numbers do not need to have formats attached to them.  SAS already knows how to display those.

wlierman
Lapis Lazuli | Level 10

Thank you Tom.

Great solution.  Kurt is looking over the entire csv file.  He has the same concerns (and perhaps a

few more) about using the Proc Import route.

 

I appreciate your time and help.

 

Walt

wlierman

wlierman
Lapis Lazuli | Level 10

Hello Kurt,

I took the second method that you suggested: renaming the file with a .txt extension and attaching it.

 

Thank you for your time and effort.  I appreciate it.

 

Walt Lierman

Tom
Super User Tom
Super User

I really think that using PROC IMPORT is the issue for this file.  Because the first few columns have a lot missing values so any attempt by PROC IMPORT to guess at how to define them is going to be wrong, especially if you ultimate goal is to replace the empty values with real values via some algorithm.   Using the GUESSINGROWS=MAX statement in PROC IMPORT will help a little, but it will still only be able to guess how to define the variables based on the values seen. So just take the five minutes it will take to copy the column header and turn it into an INPUT or LENGTH statement so you can properly define the variables.  

 

Minor point your file isn't technically a Comma Separated Values file since the delimiter being used is a TAB instead of comma, but that doesn't really impact how to deal with the file other that you will need to specify the delimiter on your INFILE statement.

wlierman
Lapis Lazuli | Level 10
Tom,
Thank you for your insights and corrections.
Walt Liermanwlierman

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 4274 views
  • 1 like
  • 3 in conversation