BookmarkSubscribeRSS Feed

Using PROC DQSCHEME Part 2: APPLYing for Success

Started ‎10-30-2023 by
Modified ‎10-30-2023 by
Views 411

Data cleansing can feel like a chore when out-of-the-box tools don’t quite work for your data. There’s a simple solution for that problem: SAS Data Quality! SAS Data Quality enables you to customize or create brand new tools for data cleanup without ever leaving a programming window. In my last post, I introduced you to the DQSCHEME procedure and demonstrated how to create custom schemes with the CREATE statement.

 

Now, it’s time to put those schemes to use! I’ll show you how to use the APPLY statement to apply a scheme to an input variable with a variety of application methods.

 

For a refresher on the DQSCHEME procedure and its capabilities, review What is the DQSCHEME procedure? from part 1 or view the SAS documentation.

 

Using the APPLY statement

 

The DQSCHEME procedure APPLY statement is used to apply a scheme to an input variable. The only required arguments for the APPLY statement are the scheme name and the input variable name, but several optional arguments are available for controlling how the scheme is applied. By default, the values in the input variable will be transformed only if they match exactly with the pre-transformation values stored in the scheme. However, you could choose to ignore the case of the input values when transforming them, or you could choose to apply transformations after comparing input values and scheme values with match codes.

 

In this post, I’ll use the APPLY statement to apply the schemes I created in part 1 of this series. I’ll apply the scheme to a few different data sets, and I’ll use multiple scheme application methods.

 

APPLY a scheme in SAS format

 

In the previous post, I created a SAS scheme based on the CARS data set, which contains car model names. After modifying the generated scheme to clean up the default results, my final scheme was named CARS_SCHEME_EDIT. I’ll apply this to the CARS data set and save the results in an output data set named CARS_STD_SAS. Recall our original data:

 

01_grbarn_create2_1.png

Select any image to see a larger version.

Mobile users: To view the images, select the "Full" version at the bottom of the page.

 

Since I’m using a SAS scheme, I’ll need to include the NOQKB option on the PROC DQSCHEME statement.

 

proc dqscheme data=cars out=cars_std_sas noqkb;
apply scheme=cars_scheme_edit var=model;
run;

 

Performing this basic application of this scheme yields my desired results.

 

02_grbarn_apply_1.png

 

I have another data set named CARS_2 that contains the same car model names, but the values have varied casing that isn’t reflected in my original scheme.

 

03_grbarn_apply_2.png

 

No worries! I’ll add the SCHEME_LOOKUP=IGNORE_CASE option to the APPLY statement, which will ignore any capitalization when applying the scheme.

 

proc dqscheme data=cars_2 out=cars_2_std_sas noqkb;
apply scheme=cars_scheme_edit var=model scheme_lookup=IGNORE_CASE;
run;

 

The results show that my scheme has been applied flawlessly despite casing differences.

 

04_grbarn_apply_3.png

 

I have a third data set CARS_3 that, again, contains the same car model names, but there are typos in nearly every value (most of which aren’t accounted for in my scheme).

 

05_grbarn_apply_4.png

 

This is where the match code scheme application method comes in handy. Since an equal comparison won’t work for applying the scheme here, we can instead compare input value match codes against scheme value match codes. Input values will be transformed based on equal match code values. I’ll add the SCHEME_LOOKUP=USE_MATCHDEF option to the APPLY statement. Then, I’ll add the match definition and locale for generating match codes (I’ll skip the optional sensitivity argument since I want to use the default value, 85). 

 

proc dqscheme data=cars_3 out=cars_3_std_sas noqkb;
apply scheme=cars_scheme_edit var=model scheme_lookup=USE_MATCHDEF
      matchdef='Text' locale='ENUSA';
run;

 

Once more, my results show that my scheme was applied to this data with no problems despite the variation in input values.

 

06_grbarn_apply_5.png

 

APPLY a scheme in QKB format

 

In part 1, I also created a scheme in QKB format and stored it in the scheme file cars.sch.qkb. I noted then that QKB schemes cannot be edited programmatically. Since this series focuses on the programmatic use of schemes, I did not edit my QKB scheme with other methods. This means my QKB scheme will have the same problems my default SAS scheme had. I’ll come back to that issue shortly.

 

To apply a QKB scheme, I’ll add the QKB option on the PROC DQSCHEME statement (though this is the default scheme type option). I’ll also add my QKB scheme name and my input variable name to the APPLY statement. Remember that the easiest way to work with a QKB scheme file is to give it a fileref first with the FILENAME statement.

 

filename cars "/home/student/cars.sch.qkb";
proc dqscheme data=cars out=cars_std_qkb qkb;
apply scheme=cars var=model;
run;

 

The output data shows that the scheme was applied successfully. Because I haven’t edited my QKB scheme, the results differ from what I want. I want to edit my scheme to have a consistent casing standard and include rules for the different ‘Toyota Corolla’ values, like I did with my SAS scheme in part 1.

 

07_grbarn_apply_6.png

 

So, what can we do? Are we stuck with a scheme that’s not quite right until we plug it into the SAS QKB Definition Editor for editing? The answer is no! You can use the CONVERT statement to aid in creating and editing your QKB schemes programmatically, which I’ll explore in my next blog.

 

To get a head start on learning about the CONVERT statement, check out the documentation.

 

Considerations for applying schemes

 

One DQSCHEME procedure step can contain any combination of CREATE, APPLY, and CONVERT statements. While schemes can be created and applied in the same PROC DQSCHEME step, this is not recommended. By creating and applying a scheme in one go, you forfeit the chance to view and edit your scheme before applying it. If you don’t like the scheme, you will have transformed your data to an undesired result—which means more work for you. Always create and apply schemes in separate steps.

 

When applying a scheme, be wary that the match code application method will be more resource intensive than the default method. Depending on the size of your data, your processing time could be significantly increased. Instead, updating your original scheme and then applying the updated scheme with the default method may be more efficient.

 

Summary

 

The APPLY statement can be used to apply SAS or QKB schemes to an input variable. The SCHEME_LOOKUP argument offers you flexibility in applying your custom schemes. This optional argument helps to ensure that your input data will be transformed even if it doesn’t perfectly align with your scheme data. For more information on the APPLY statement, including optional arguments that weren’t covered in this post, view the documentation. Look out for part 3, which will cover the PROC DQSCHEME CONVERT statement.

 

 

Find more articles from SAS Global Enablement and Learning here.

Version history
Last update:
‎10-30-2023 03:16 PM
Updated by:
Contributors

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags