BookmarkSubscribeRSS Feed
UniversitySas
Quartz | Level 8

So let's say I have the following fields:

 

DATA HAVE;
INFILE DATALINES ;
input ID NAME $ STREET $ CITY $ STATE $ POSTCODE $ RELATIONSHIP $ STATUS $ PURPOSE $ DONATION $;
DATALINES;
201 AAA Market Philadelphia PA 4109 Parent Open Counselling 10000
201 ABC Chestnut Arlington TX 1093 None Open General 1500
201 BCD Walnut Walnut Sidney NY 3201 None Open General
1999 201 EFG Cross Kansas TX 1091 Parent Close Sports
1491 202 EFG Cluedo Street Phoenix AZ 2012 Close General 1900
;
RUN;

Which gives me the following output: 

 

13.JPG

 

You can see there are three problems here:

1) The street "walnut" has been imported twice, shifting the values in the columns by 1 extra space incorrectly.

2) The street "Cluedo Street" has been imported over two lines, instead of just one line, causing a similar problem to what was mentioned above.

3) There is an omission for "Relationship" in the final row. Where the incorrectly imported data should read "none", it missing altogether, and reads "close", so even in the absence of the first two errors, the "Relationship" Column here would read "close" instead of "none".

 

Let's suppose there are 1000's of issues similar, but not identical to the ones above, in a data set with millions of observations. They will be similar in the sense that it's usually a random or repeated omission for some finite number of fields, OR, duplicate values have been entered, or values have spanned more columns than they should have.

Assuming the exact same column names as above, is it plausible that one could create some kind of criteria, or program that could reasonably adjust most of these issues? Or do I simply need to request a cleaner data set?

 

 

 

1 REPLY 1
Kurt_Bremser
Super User

If you have blanks in data items, you need to use a delimiter other than blank, or have the data items enclosed in quotes, so the dsd option can be used.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 1 reply
  • 573 views
  • 0 likes
  • 2 in conversation