BookmarkSubscribeRSS Feed
Sanjay_M
Obsidian | Level 7

I have special missing values from .A to .Z in my SAS datasets and we are in the process of migrating them to HADOOP.

 

The data will be stored on HADOOP and will be accessed through other tools.

 

Does anyone have knowledge of how are these values are supposed to be stored in HADOOP or other databases.

 

How should they be treated when stored on other databases?

5 REPLIES 5
ChrisNZ
Tourmaline | Level 20

They should be stored as NULL. But you then lose some information as all missing values become the same.

If you want to keep the detail, you have to decide on how to address that. Add an extra column? Store as invalid values like -9999?

Sanjay_M
Obsidian | Level 7

Thanks Chris. I think that is one way to represent your special missing values.

 

Sanjay_M
Obsidian | Level 7

Hi

 

Is there a way to get a unique list of special missing values from the data?

I have 300 variables in my dataset and want to identify the unique values each variable has.

ChrisNZ
Tourmaline | Level 20

Since you seem to not know much about this data, and have no control over the source, I would go the other route: create a special value for each missing value.

There are 28 possible missing values. Create a mapping for each and you'll never be surprised if the data includes more values in the future.

 

Patrick
Opal | Level 21

@Sanjay_M wrote:

Hi

 

Is there a way to get a unique list of special missing values from the data?

I have 300 variables in my dataset and want to identify the unique values each variable has.


Below an idea how you could create such a report without having to code much.

data test;
  set sashelp.class;
  if _n_=5 then age=.a;
  if _n_=3 then age=.x;
  if _n_=7 then 
    do;
      age=.a;
      height=.x;
    end;
  if _n_=1 then height=.y;
run;

ods _all_ close;
ods output MissingValues=want(keep= varname missvalue count where=(missvalue ne 0));
proc univariate data=test;
run;
ods output close;

ods listing; 
title 'Numerical vars with special missings';
proc print data=want;
run;
title;

As for special missing values in Hadoop for consumption also by 3rd party applications: 

SAS special missings are a SAS thing so I guess you would have to recode them if not storing the data in SAS SPDE format in Hadoop (which only SAS can read). You could create a format for such recoding and then just pre-process the SAS table before loading into Hadoop.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 774 views
  • 1 like
  • 3 in conversation