DATA Step, Macro, Functions and more

How to remove inconsistent data rows?

Reply
Contributor
Posts: 72

How to remove inconsistent data rows?

[ Edited ]

 

Hi SAS users,

 

I am trying to combine dozens of csv files usng a wildcard code (INFILE, INFORMAT, INPUT). Although I don't get any error for my code, I get a warning that makes me concerend. Looking at the log and my csv files, it seems that (at least part of the issue) is due to inconsistant data type in some rows (see the screenshot below).

 

Now, my question is how to ask SAS to delete/remove those rows (having inconsistant data type) while running the program?

 

Thank you for help,

 

 

1.jpg

Occasional Contributor
Posts: 11

Re: How to remove inconsistent data rows?

Posted in reply to almmotamedi

I am sure there is a cleaner way to do this but I would import all of the fileds as characters, then remove the ones you don't want then convert the variables to be the correct formats.

 

it might take a few steps but this way you have more control over what you are getting.

Super User
Posts: 5,424

Re: How to remove inconsistent data rows?

Posted in reply to almmotamedi
Is this an adhoc operation or a regular/production like operation?
If the later, establish a file specification with your data supplier that states data types, lengths, names, valid values etc. When data fail to follow this spec, report back and demand a file that is corrected.
Data never sleeps
Super User
Posts: 11,343

Re: How to remove inconsistent data rows?

Posted in reply to almmotamedi

If this is truly affectin many variables the easiest might be using the CMISS function. Cmiss will tell you how many of the variables have missing values. So if the number is "large" you could delete them from a file.

 

data want;

   set have;

   if cmiss(var1, var2, var3 , ...,varn) < (some critical number);

run;

 

If you have lots of rows like this it may be worth going back to your raw csv files using a TEXT program to see if you have rows at the ends of files that are all commas. This is a common occurence when converting spreadsheets to CSV. Some forms of use will "touch" rows or columns so that they get included in CSV files when there is no data.

 

Something else to look for are inserted "page breaks" such a row between blocks of data.

Super User
Posts: 10,020

Re: How to remove inconsistent data rows?

Posted in reply to almmotamedi

use operator ?? to set them all missing .

 

input a : ?? ddmmyy10. b : ?? best32. c : ?? $20. ............

Ask a Question
Discussion stats
  • 4 replies
  • 252 views
  • 0 likes
  • 5 in conversation