DATA Step, Macro, Functions and more

Creation of a Numeric to Numeric Informat Using CNTLIN - Problem and Solution.

Reply
PROC Star
Posts: 288

Creation of a Numeric to Numeric Informat Using CNTLIN - Problem and Solution.

First of all, it might seem odd to post a solution along with the question, but since my attempts to find a solution online were futile, I figured others might benefit as well from my learning experience. I'd also like to see what comments other might have on the topic.

 

Since this post is long, the “tl;dr” version is: To create a numeric to numeric informat using a cntlin data set with proc format, be sure to include the HLO column with the value “I” for all rows not associated with the start value “OTHER.” For the “OTHER” start value – if applicable – set HLO to the letter “O.”

 

For the two or three interested in the details, the general task was to create a binary flag for holidays based on a date. For a number of reasons, I decided that using a numeric to numeric informat seemed like the best approach. An example of how this works is shown in the code below, where I’ve hard-coded Easter and Labor Day along with some non-holiday dates:

 

 

data have;
    format date date9.;
    do date = '27MAR2016'd, '10JUN2016'd, '05SEP2016'd, '15OCT2016'd;
    output;
    end;
run;
 
proc format;
    invalue dt2hol
        '27MAR2016'd = 1
        '05SEP2016'd = 1
        OTHER = 0;
run;
 
data want;
    set have;
    holiday = input(date, dt2hol.);
    put date= holiday=;
run;

The above code produces the desired result:

 

NOTE: Numeric values have been converted to character values at the places given by: (Line)Smiley SadColumn).

      39:21  

date=27MAR2016 holiday=1

date=10JUN2016 holiday=0

date=05SEP2016 holiday=1

date=15OCT2016 holiday=0

 

However, in practice the list of dates will often to span multiple years and there will be more than just those two holidays. This is where the “HOLIDAY” function and the “CNTLIN” data set come to the rescue, but the path there was not as straightforward as I’d hoped. For the holidays, I modified someone else’s code (my apologies for not remembering the source) to generate a list of holidays (with the holidays limited for illustration purposes):

 

data holidays;
     format date date9.;
     input holiday : $18.;
     do year = 2016 to 2017;
           date = holiday(holiday, year);
           output;
     end;
datalines;
EASTER
LABOR
;
run;


 So far, so good. From this list of holiday dates I then created the cntlin data set as follows:

 

proc sql ;
CREATE TABLE holidayFmt AS
SELECT 'dt2hol' AS fmtName
, 'I' AS type /* I for numeric informat. */
, strip(put(date, best.)) AS start
, 1 AS label
, ' ' AS hlo
FROM holidays;
/* Insert a row to account for "OTHER" */
INSERT INTO holidayFmt
VALUES ('dt2hol', 'I', 'OTHER', 0, 'O');
quit;

 

I’ve included the HLO variable, since I’ve been tripped by that before. I’m thinking I should be good, so I create the format and test it. 

 

proc format cntlin = holidayFmt;
run;
 
data want;
    set have;
    holiday = input(date, dt2hol.);
    put date= holiday=;
run;

 

The log tells me the informat has been created, but the application of the informat does not flag the holidays as desired:

 

NOTE: Numeric values have been converted to character values at the places given by: (Line)Smiley SadColumn).
42:21
date=27MAR2016 holiday=0
date=10JUN2016 holiday=0
date=05SEP2016 holiday=0
date=15OCT2016 holiday=0

 

I wonder if it's related to the HLO variable, so I try it without worrying about "OTHER," but now it returns all missing values. I’ve used this approach dozens of times in the past to create formats, so I’m not sure I what’s going on. To check my sanity, I use the data to control a format instead:

 

 

proc sql ;
    CREATE TABLE holidayFmt AS
    SELECT 'dt2hol' AS fmtName
           , 'N' AS type /* N for numeric format. */
           , strip(put(date, best.)) AS start
           , 1 AS label
           , ' ' AS hlo
    FROM holidays;
    INSERT INTO holidayFmt
    VALUES ('dt2hol', 'N', 'OTHER', 0, 'O');
quit;
 
proc format cntlin = holidayFmt;
run;
 
data want;
    set have;
    holiday = put(date, dt2hol.);
    put date= holiday=;
run;

 

date=27MAR2016 holiday=1

date=10JUN2016 holiday=0

date=05SEP2016 holiday=1

date=15OCT2016 holiday=0

 

Success! Sort of…. The result is a character variable. While I could easily transform it to numeric, it is not a satisfying solution, particularly when the informat works as desired when created without the cntlin dataset. At this point I started to search for solutions online. While I found plenty of examples of creating formats with a cntlin data set, I never found anything that would solve my particular issue. Most examples focus on numeric to character conversion or vice versa. In the end, the solution came from one particular line in the otherwise limited documentation: “A common source for an input control data set is the output from the CNTLOUT= option of another PROC FORMAT step.” Based on this tip, I created a version A of the informat without using cntlin and a version B using the cntlin. I then compared the cntlout data set for those two versions:

 

 

proc format;
    invalue dt2holA
        '27MAR2016'd = 1
        '05SEP2016'd = 1
        OTHER = 0;
run;
 
data want;
    set have;
    holiday = input(date, dt2holA.);
    put date= holiday=;
run;
 
proc sql ;
    CREATE TABLE holidayFmt AS
    SELECT 'dt2holB' AS fmtName
           , 'I' AS type /* I for numeric informat. */
           , strip(put(date, best.)) AS start
           , 1 AS label
           , ' ' AS hlo
    FROM holidays
    WHERE year = 2016; /* To make equivalent the Version A. */
    /* Insert a row to account for "OTHER" */
    INSERT INTO holidayFmt
    VALUES ('dt2holB', 'I', 'OTHER', 0, 'O');
quit;
 
proc format cntlin = holidayFmt;
run;
 
proc format cntlout = compareInfmts;
    select @dt2holA @dt2holB;
run;

A quick inspection of the cntlout data set showed that the A version had an “I” in the hlo column. The quick addition of this to my cntlin data set and the issue was resolved: 

 

proc sql ;
    CREATE TABLE holidayFmt AS
    SELECT 'dt2hol' AS fmtName
           , 'I' AS type /* I for numeric informat. */
           , strip(put(date, best.)) AS start
           , 1 AS label
           , 'I' AS hlo
    FROM holidays;
    /* Insert a row to account for "OTHER" */
    INSERT INTO holidayFmt
    VALUES ('dt2hol', 'I', 'OTHER', 0, 'O');
quit;

The solution was easy, but getting there was not, so I figured I’d share it. And now is the time when others here can chime on something obvious I’ve missed or point out better solutions for what I wanted to do. Thanks!

 

 

Frequent Contributor
Posts: 114

Re: Creation of a Numeric to Numeric Informat Using CNTLIN - Problem and Solution.

Hi @collinelliot,

 

thanks for sharing! If only I had read this a few weeks ago, when I stumbled upon the topic myself.

 

For me, however, the hlo = 'O' in the 'OTHER' row does suffice. The proc format cntlin takes care of the rest and does fill the other rows with hlo = 'I'

 

I am wondering, why it does not do that for you?!

 

The following code works without problems for me:

data a;
retain fmtname 'test_fm'
       type 'I';
start = 1; label = 10; output;
start = 2; label = 20; output;
start = 3; label = 30; output;
start = 4; label = 40; output;
start = 5; label = 50; output;
start = 6; label = 60; output;
start = .; label = 99; hlo = 'O'; output; /* hlo is set only in the 'OTHER' row. Previous rows have hlo missing */
run;

proc format cntlin=work.a library=work;
run;
/* the cntlin does set the hlo for all rows */
proc format cntlout = work.compareInfmts; select @test_fm; run; /* applying the format works with no problems */ data b; b = input(3, test_fm.); output; b = input(5, test_fm.); output; b = input(7, test_fm.); output; run;

Cheers,

Michael

PROC Star
Posts: 288

Re: Creation of a Numeric to Numeric Informat Using CNTLIN - Problem and Solution.

@mfab - Thanks for the response. I ran your code and, sure enough, it worked, so I just assumed I had done something idiotic. I went back and checked my code and I still saw the same results, so I went back and set up a number of test cases to see if I could isolate what was causing the issue. The fundamental difference is that my example is using a character variable for the "start" values in the cntlin data set. Rather than bombard you with all the details, I think the best way to demonstrate the issue is to replicate it using your example. When I run the code below, the informat does not work as it should (though you should confirm):

 

data a;
retain fmtname 'test_fm'
       type 'I';
    start = '1'; label = 10; output;
    start = '2'; label = 20; output;
    start = '3'; label = 30; output;
    start = '4'; label = 40; output;
    start = '5'; label = 50; output;
    start = '6'; label = 60; output;
    start = ' '; label = 99; hlo = 'O'; output; 
run;

proc format cntlin=work.a library=work;
run;


data b;
b = input(3, test_fm.); output;
b = input(5, test_fm.); output;
b = input(7, test_fm.); output;
run;

If I add the "I" to the hlo column, as per my original post, it does work, as in the code below.

 


data a;
retain fmtname 'test_fm'
       type 'I';
    start = '1'; label = 10; hlo = 'I';output;
    start = '2'; label = 20; hlo = 'I';output;
    start = '3'; label = 30; hlo = 'I';output;
    start = '4'; label = 40; hlo = 'I';output;
    start = '5'; label = 50; hlo = 'I';output;
    start = '6'; label = 60; hlo = 'I';output;
    start = ' '; label = 99; hlo = 'O'; output; 
run;

proc format cntlin=work.a library=work;
run;


data b;
b = input(3, test_fm.); output;
b = input(5, test_fm.); output;
b = input(7, test_fm.); output;
run;


I suppose one response is that one create these informats based on a numeric start column. The reason that I usually use text is that I'm often feeding many informats and formats in the same cntlin data set and the start values often contain text strings.

 

Anyway, I don't know if this fully explains why our results differed, but I figured I'd throw it out there. 

 

Thanks again,

 

Collin

 

 

Frequent Contributor
Posts: 114

Re: Creation of a Numeric to Numeric Informat Using CNTLIN - Problem and Solution.

[ Edited ]

@collinelliot: it seems, I have fallen for the pits in format usage again.

your modification of my example will work, if you not only change the format start to character, but also change the values to which you apply the format to character:

 

 

data a;
retain fmtname 'test_fm'
       type 'I';
    start = '1'; label = 10; hlo = 'I';output;
    start = '2'; label = 20; hlo = 'I';output;
    start = '3'; label = 30; hlo = 'I';output;
    start = '4'; label = 40; hlo = 'I';output;
    start = '5'; label = 50; hlo = 'I';output;
    start = '6'; label = 60; hlo = 'I';output;
    start = ' '; label = 99; hlo = 'O'; output; 
run;

proc format cntlin=work.a library=work;
run;


/* notice the quotation around the numbers. So we don't have numbers here, but characters. Just the same as in the format that was created above */
data b;
b = input('3', test_fm.); output;
b = input('5', test_fm.); output;
b = input('7', test_fm.); output;
run;

 

In fact, INPUT always requires a character value. That is why you would not want to input a number there. SAS will then do a type converison by itself.

 

My apologies for not reading your code thoroughly, and giving it enough thought at first.

 

As a matter of fact, I never had the use case when I would need to apply a format to create a numerical value from a given numerical value. It has always been either character to character, numeric to character or vice versa.

I would suggest you change your holiday flag to a character value (which also uses less space when you use length 1 compared to length 8 of a numerical field).

 

By the way: This paper provides pretty good descriptions for creating formats from datasets.

 

Hope this helps.

Cheers,

Michael

Ask a Question
Discussion stats
  • 3 replies
  • 229 views
  • 3 likes
  • 2 in conversation