BookmarkSubscribeRSS Feed
collinelliot
Barite | Level 11

First of all, it might seem odd to post a solution along with the question, but since my attempts to find a solution online were futile, I figured others might benefit as well from my learning experience. I'd also like to see what comments other might have on the topic.

 

Since this post is long, the “tl;dr” version is: To create a numeric to numeric informat using a cntlin data set with proc format, be sure to include the HLO column with the value “I” for all rows not associated with the start value “OTHER.” For the “OTHER” start value – if applicable – set HLO to the letter “O.”

 

For the two or three interested in the details, the general task was to create a binary flag for holidays based on a date. For a number of reasons, I decided that using a numeric to numeric informat seemed like the best approach. An example of how this works is shown in the code below, where I’ve hard-coded Easter and Labor Day along with some non-holiday dates:

 

 

data have;
    format date date9.;
    do date = '27MAR2016'd, '10JUN2016'd, '05SEP2016'd, '15OCT2016'd;
    output;
    end;
run;
 
proc format;
    invalue dt2hol
        '27MAR2016'd = 1
        '05SEP2016'd = 1
        OTHER = 0;
run;
 
data want;
    set have;
    holiday = input(date, dt2hol.);
    put date= holiday=;
run;

The above code produces the desired result:

 

NOTE: Numeric values have been converted to character values at the places given by: (Line):(Column).

      39:21  

date=27MAR2016 holiday=1

date=10JUN2016 holiday=0

date=05SEP2016 holiday=1

date=15OCT2016 holiday=0

 

However, in practice the list of dates will often to span multiple years and there will be more than just those two holidays. This is where the “HOLIDAY” function and the “CNTLIN” data set come to the rescue, but the path there was not as straightforward as I’d hoped. For the holidays, I modified someone else’s code (my apologies for not remembering the source) to generate a list of holidays (with the holidays limited for illustration purposes):

 

data holidays;
     format date date9.;
     input holiday : $18.;
     do year = 2016 to 2017;
           date = holiday(holiday, year);
           output;
     end;
datalines;
EASTER
LABOR
;
run;


 So far, so good. From this list of holiday dates I then created the cntlin data set as follows:

 

proc sql ;
CREATE TABLE holidayFmt AS
SELECT 'dt2hol' AS fmtName
, 'I' AS type /* I for numeric informat. */
, strip(put(date, best.)) AS start
, 1 AS label
, ' ' AS hlo
FROM holidays;
/* Insert a row to account for "OTHER" */
INSERT INTO holidayFmt
VALUES ('dt2hol', 'I', 'OTHER', 0, 'O');
quit;

 

I’ve included the HLO variable, since I’ve been tripped by that before. I’m thinking I should be good, so I create the format and test it. 

 

proc format cntlin = holidayFmt;
run;
 
data want;
    set have;
    holiday = input(date, dt2hol.);
    put date= holiday=;
run;

 

The log tells me the informat has been created, but the application of the informat does not flag the holidays as desired:

 

NOTE: Numeric values have been converted to character values at the places given by: (Line):(Column).
42:21
date=27MAR2016 holiday=0
date=10JUN2016 holiday=0
date=05SEP2016 holiday=0
date=15OCT2016 holiday=0

 

I wonder if it's related to the HLO variable, so I try it without worrying about "OTHER," but now it returns all missing values. I’ve used this approach dozens of times in the past to create formats, so I’m not sure I what’s going on. To check my sanity, I use the data to control a format instead:

 

 

proc sql ;
    CREATE TABLE holidayFmt AS
    SELECT 'dt2hol' AS fmtName
           , 'N' AS type /* N for numeric format. */
           , strip(put(date, best.)) AS start
           , 1 AS label
           , ' ' AS hlo
    FROM holidays;
    INSERT INTO holidayFmt
    VALUES ('dt2hol', 'N', 'OTHER', 0, 'O');
quit;
 
proc format cntlin = holidayFmt;
run;
 
data want;
    set have;
    holiday = put(date, dt2hol.);
    put date= holiday=;
run;

 

date=27MAR2016 holiday=1

date=10JUN2016 holiday=0

date=05SEP2016 holiday=1

date=15OCT2016 holiday=0

 

Success! Sort of…. The result is a character variable. While I could easily transform it to numeric, it is not a satisfying solution, particularly when the informat works as desired when created without the cntlin dataset. At this point I started to search for solutions online. While I found plenty of examples of creating formats with a cntlin data set, I never found anything that would solve my particular issue. Most examples focus on numeric to character conversion or vice versa. In the end, the solution came from one particular line in the otherwise limited documentation: “A common source for an input control data set is the output from the CNTLOUT= option of another PROC FORMAT step.” Based on this tip, I created a version A of the informat without using cntlin and a version B using the cntlin. I then compared the cntlout data set for those two versions:

 

 

proc format;
    invalue dt2holA
        '27MAR2016'd = 1
        '05SEP2016'd = 1
        OTHER = 0;
run;
 
data want;
    set have;
    holiday = input(date, dt2holA.);
    put date= holiday=;
run;
 
proc sql ;
    CREATE TABLE holidayFmt AS
    SELECT 'dt2holB' AS fmtName
           , 'I' AS type /* I for numeric informat. */
           , strip(put(date, best.)) AS start
           , 1 AS label
           , ' ' AS hlo
    FROM holidays
    WHERE year = 2016; /* To make equivalent the Version A. */
    /* Insert a row to account for "OTHER" */
    INSERT INTO holidayFmt
    VALUES ('dt2holB', 'I', 'OTHER', 0, 'O');
quit;
 
proc format cntlin = holidayFmt;
run;
 
proc format cntlout = compareInfmts;
    select @dt2holA @dt2holB;
run;

A quick inspection of the cntlout data set showed that the A version had an “I” in the hlo column. The quick addition of this to my cntlin data set and the issue was resolved: 

 

proc sql ;
    CREATE TABLE holidayFmt AS
    SELECT 'dt2hol' AS fmtName
           , 'I' AS type /* I for numeric informat. */
           , strip(put(date, best.)) AS start
           , 1 AS label
           , 'I' AS hlo
    FROM holidays;
    /* Insert a row to account for "OTHER" */
    INSERT INTO holidayFmt
    VALUES ('dt2hol', 'I', 'OTHER', 0, 'O');
quit;

The solution was easy, but getting there was not, so I figured I’d share it. And now is the time when others here can chime on something obvious I’ve missed or point out better solutions for what I wanted to do. Thanks!

 

 

9 REPLIES 9
mfab
Quartz | Level 8

Hi @collinelliot,

 

thanks for sharing! If only I had read this a few weeks ago, when I stumbled upon the topic myself.

 

For me, however, the hlo = 'O' in the 'OTHER' row does suffice. The proc format cntlin takes care of the rest and does fill the other rows with hlo = 'I'

 

I am wondering, why it does not do that for you?!

 

The following code works without problems for me:

data a;
retain fmtname 'test_fm'
       type 'I';
start = 1; label = 10; output;
start = 2; label = 20; output;
start = 3; label = 30; output;
start = 4; label = 40; output;
start = 5; label = 50; output;
start = 6; label = 60; output;
start = .; label = 99; hlo = 'O'; output; /* hlo is set only in the 'OTHER' row. Previous rows have hlo missing */
run;

proc format cntlin=work.a library=work;
run;
/* the cntlin does set the hlo for all rows */
proc format cntlout = work.compareInfmts; select @test_fm; run; /* applying the format works with no problems */ data b; b = input(3, test_fm.); output; b = input(5, test_fm.); output; b = input(7, test_fm.); output; run;

Cheers,

Michael

collinelliot
Barite | Level 11

@mfab - Thanks for the response. I ran your code and, sure enough, it worked, so I just assumed I had done something idiotic. I went back and checked my code and I still saw the same results, so I went back and set up a number of test cases to see if I could isolate what was causing the issue. The fundamental difference is that my example is using a character variable for the "start" values in the cntlin data set. Rather than bombard you with all the details, I think the best way to demonstrate the issue is to replicate it using your example. When I run the code below, the informat does not work as it should (though you should confirm):

 

data a;
retain fmtname 'test_fm'
       type 'I';
    start = '1'; label = 10; output;
    start = '2'; label = 20; output;
    start = '3'; label = 30; output;
    start = '4'; label = 40; output;
    start = '5'; label = 50; output;
    start = '6'; label = 60; output;
    start = ' '; label = 99; hlo = 'O'; output; 
run;

proc format cntlin=work.a library=work;
run;


data b;
b = input(3, test_fm.); output;
b = input(5, test_fm.); output;
b = input(7, test_fm.); output;
run;

If I add the "I" to the hlo column, as per my original post, it does work, as in the code below.

 


data a;
retain fmtname 'test_fm'
       type 'I';
    start = '1'; label = 10; hlo = 'I';output;
    start = '2'; label = 20; hlo = 'I';output;
    start = '3'; label = 30; hlo = 'I';output;
    start = '4'; label = 40; hlo = 'I';output;
    start = '5'; label = 50; hlo = 'I';output;
    start = '6'; label = 60; hlo = 'I';output;
    start = ' '; label = 99; hlo = 'O'; output; 
run;

proc format cntlin=work.a library=work;
run;


data b;
b = input(3, test_fm.); output;
b = input(5, test_fm.); output;
b = input(7, test_fm.); output;
run;


I suppose one response is that one create these informats based on a numeric start column. The reason that I usually use text is that I'm often feeding many informats and formats in the same cntlin data set and the start values often contain text strings.

 

Anyway, I don't know if this fully explains why our results differed, but I figured I'd throw it out there. 

 

Thanks again,

 

Collin

 

 

mfab
Quartz | Level 8

@collinelliot: it seems, I have fallen for the pits in format usage again.

your modification of my example will work, if you not only change the format start to character, but also change the values to which you apply the format to character:

 

 

data a;
retain fmtname 'test_fm'
       type 'I';
    start = '1'; label = 10; hlo = 'I';output;
    start = '2'; label = 20; hlo = 'I';output;
    start = '3'; label = 30; hlo = 'I';output;
    start = '4'; label = 40; hlo = 'I';output;
    start = '5'; label = 50; hlo = 'I';output;
    start = '6'; label = 60; hlo = 'I';output;
    start = ' '; label = 99; hlo = 'O'; output; 
run;

proc format cntlin=work.a library=work;
run;


/* notice the quotation around the numbers. So we don't have numbers here, but characters. Just the same as in the format that was created above */
data b;
b = input('3', test_fm.); output;
b = input('5', test_fm.); output;
b = input('7', test_fm.); output;
run;

 

In fact, INPUT always requires a character value. That is why you would not want to input a number there. SAS will then do a type converison by itself.

 

My apologies for not reading your code thoroughly, and giving it enough thought at first.

 

As a matter of fact, I never had the use case when I would need to apply a format to create a numerical value from a given numerical value. It has always been either character to character, numeric to character or vice versa.

I would suggest you change your holiday flag to a character value (which also uses less space when you use length 1 compared to length 8 of a numerical field).

 

By the way: This paper provides pretty good descriptions for creating formats from datasets.

 

Hope this helps.

Cheers,

Michael

NovGetRight
Obsidian | Level 7

thank you, but why SAS log occurs:

 

NOTE: Numeric values have been converted to character values at the places given by:
(Line):(Column).

 

the logic and syntax should be correct. 

NovGetRight
Obsidian | Level 7
43         proc format cntlout = work.compareInfmts;
44             select @test_fm;
45         run;

NOTE: PROCEDURE FORMAT used (Total process time):
      real time           0.05 seconds
      cpu time            0.03 seconds
      
NOTE: The data set WORK.COMPAREINFMTS has 7 observations and 21 variables.

46         
47         /* applying the format works with no problems */
48         data b;
49         b = input(3, test_fm.); output;
50         b = input(5, test_fm.); output;
51         b = input(7, test_fm.); output;
52         run;

NOTE: Numeric values have been converted to character values at the places given by: (Line):(Column).
      49:11   50:11   51:11   
NOTE: The data set WORK.B has 3 observations and 1 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds
Kurt_Bremser
Super User

The INPUT function expects a string as first argument:

source

specifies a character constant, variable, or expression to which you want to apply a specific informat.

 

But you provided numbers, so the SAS data step compiler first needs to convert these to character for the function to work.

By supplying strings you will prevent the NOTE:

data b;
b = input("3", test_fm.); output;
b = input("5", test_fm.); output;
b = input("7", test_fm.); output;
run;

You will note that column 11 in the NOTE points exactly to the beginning of the number.

NovGetRight
Obsidian | Level 7

So,  in this case,  there is no difference to use "type 'N';"  and "type 'I';  "  ? 

My purpose is to convert numeric value to numeric value (based on the format label), I don't want to convert my source value to character first. 

 

 

 

FreelanceReinh
Jade | Level 19

@NovGetRight wrote:

So,  in this case,  there is no difference to use "type 'N';"  and "type 'I';  "  ? 

My purpose is to convert numeric value to numeric value (based on the format label), I don't want to convert my source value to character first. 


Numeric informats (TYPE='I' in the CNTLIN/CNTLOUT datasets) convert text to numeric values.
Numeric formats (TYPE='N') convert numeric values to text.

 

So, if you want to convert numeric values to numeric values, neither formats nor informats are ideal because there's always text involved that requires an unwanted (automatic or manual) conversion.

 

Functions are more suitable for this purpose.

 

Example:

proc fcmp outlib=work.funcs.test;
function testf(x);
y =      if x in (2 4 8)  then 0
    else if x in (3 6 9)  then 1
    else if x in (5 10 )  then 2
    else if 11 <= x <= 20 then 3
    else .;
return(y);
endsub;
run;

options cmplib=work.funcs;

data test;
do i=1 to 12;
  z=testf(i);
  output;
end;
run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 5272 views
  • 4 likes
  • 5 in conversation