BookmarkSubscribeRSS Feed
broomtc
Calcite | Level 5

I am running into a situation where proc import dbms=csv is failing to identify numbers. Every column comes back as type 'CHAR' despite the CSV passing validation with flying colors. 

 

filename datain url 'https://data.cms.gov/resource/csi5-w7vr.csv';

proc import datafile=datain out=dataout dbms=csv replace;
getnames=yes;
guessingrows=max;
run;

 

Alphabetic List of Variables and Attributes# Variable Type Len Format Informat2321142592048182162215172426272829303132333435363738394041153101112137196

american_indian_or_alaskaChar5$5.$5.
asian_pacific_islanderChar5$5.$5.
average_ageChar4$4.$4.
average_hcc_scoreChar6$6.$6.
average_length_of_stay_daysChar6$6.$6.
black_beneficiariesChar5$5.$5.
cityChar21$21.$21.
distinct_beneficiaries_perChar5$5.$5.
dual_beneficiariesChar5$5.$5.
facility_nameChar52$52.$52.
female_beneficiariesChar5$5.$5.
hispanic_beneficiariesChar5$5.$5.
male_beneficiariesChar5$5.$5.
nondual_beneficiariesChar5$5.$5.
other_unknown_beneficiariesChar4$4.$4.
percent_of_beneficiariesChar6$6.$6.
percent_of_beneficiaries_1Char5$5.$5.
percent_of_beneficiaries_2Char5$5.$5.
percent_of_beneficiaries_3Char6$6.$6.
percent_of_beneficiaries_4Char5$5.$5.
percent_of_beneficiaries_5Char5$5.$5.
percent_of_beneficiaries_6Char5$5.$5.
percent_of_beneficiaries_7Char5$5.$5.
percent_of_beneficiaries_8Char5$5.$5.
percent_of_beneficiaries_9Char5$5.$5.
percent_of_beneficiaries_10Char5$5.$5.
percent_of_beneficiaries_11Char5$5.$5.
percent_of_beneficiaries_12Char6$6.$6.
percent_of_beneficiaries_13Char5$5.$5.
percent_of_beneficiaries_14Char5$5.$5.
percent_of_beneficiaries_15Char6$6.$6.
provider_idChar7$7.$7.
stateChar4$4.$4.
street_addressChar42$42.$42.
total_snf_charge_amountChar10$10.$10.
total_snf_medicare_allowedChar10$10.$10.
total_snf_medicare_paymentChar10$10.$10.
total_snf_medicare_standardChar9$9.$9.
total_staysChar5$5.$5.
white_beneficiariesChar5$5.$5.
zip_codeChar7$7.$7.

 

The file has specifications with several numbers. Documentation for file is here. https://dev.socrata.com/foundry/data.cms.gov/csi5-w7vr

6 REPLIES 6
ballardw
Super User

I suggest downloading that file and examining or reading from a local version.

 

There are a number of things that may be going on with the HTTPS connection. And we can't see any of them or the actual data because of security.

What does the generated data step code look like?

I would suggest, especially if you are going to read other files with the same layout, that you copy the generated code from the log to the editor, change the character informats for the variables that should be numeric or dates to an appropriate informat, save the code and rerun the data step instead of Proc Import.

 

Also look in the log for any messages related to "transcoding" or "encoding" issues.

 

 

broomtc
Calcite | Level 5

Took a look and they are putting a " " around everything. 

ballardw
Super User

@broomtc wrote:

Took a look and they are putting a " " around everything. 


You can modify the data step as mentioned then add something like:

 

numvar = input(charvar,6.);

drop charvar;

 

The 6 would come from the length of the informat associated with the character version.

If you have currency values then use a COMMA format.

If the variable should be a date then use an appropriated date informat instead.

 

Kurt_Bremser
Super User

It seems asterisks are being used to denote missing values. proc import can't deal with that.

Write your own data step, according to the file specification, and you may have to do extra logic to handle the asterisks in a graceful manner.

broomtc
Calcite | Level 5

Looks like between the more complicated * inclusions and dealing with " " where there shouldn't be I am in for a much longer code base.

Kurt_Bremser
Super User

The quotes are handled by using the dsd option in the infile statement. Replacing the single asterisks with nothing can be done by applying the tranwrd() function to _infile_ (the input buffer), and then doing the actual read.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 4634 views
  • 2 likes
  • 3 in conversation