DATA Step, Macro, Functions and more

Data quality and management help required?

Accepted Solution Solved
Reply
Contributor
Posts: 72
Accepted Solution

Data quality and management help required?

Hi,

Data-warehouse contains small to very large datasets with over 225 different variables in each data sets. Business requirement says there is only one type of error is expected and that is there would be a missing or blank variable value of any variable in a row, or in many rows, and if found the entire row/rows must be deleted. The catch is the blank could occur in any of the 225 variables and in any row,

is there a function or logic in datstep to do it or what do you guys think about Proc Datasets?

It's an appending table monthly, quarterly and yearly, therefore the preference is to use automated simple solutions.


Accepted Solutions
Solution
‎05-07-2013 03:39 PM
Valued Guide
Posts: 2,175

Re: Data quality and management help required?

And if you have a mixture of numeric and string values combine tge cmiss with Nmiss something like

Append base= collection data= new_rows(

Where=( NOT sum( 0,  cmiss( of _character_),  nmiss( of _numeric_) ) ) ) ;

Run ;

View solution in original post


All Replies
Trusted Advisor
Posts: 1,624

Re: Data quality and management help required?

In a data step, you could use the CMISS function, to find a row with any missing values.

Then, delete the row and write the table back to the data warehouse

Super Contributor
Posts: 418

Re: Data quality and management help required?

So the way I read this was that if ANY of the rows in your dataset have 1 blank value at all, then drop the entire row. Is that correct, or did you mean the entire row has to be blank in order for it to be dropped?

I guess I wil just give you examples of both ways to be safe...
First: If any columns have a blank variable, then drop that entire row.

data new;

set old;

array varsyay(*) firstcolumn--lastcolumn;

if cmiss(of cvars(*))=0;

run;

Now drop the row ONLY if the entire row is blank.

data new;

set old;

array varsyay(*) firstcolumn--lastcolumn;

if not(cmiss(of varsyay(*)) eq dim(varsyay(*));

run;

Solution
‎05-07-2013 03:39 PM
Valued Guide
Posts: 2,175

Re: Data quality and management help required?

And if you have a mixture of numeric and string values combine tge cmiss with Nmiss something like

Append base= collection data= new_rows(

Where=( NOT sum( 0,  cmiss( of _character_),  nmiss( of _numeric_) ) ) ) ;

Run ;

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 174 views
  • 6 likes
  • 4 in conversation