BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jacob_klimek
Obsidian | Level 7

Hello,

 

I am trying to take a dataset that I want to use to create multiple formats in a different dataset with proc format. Below is a sample of my data:

 

 

CNTLIN dataset:

 

FMTNAME               START          END           LABEL

Question_1                 1                    1               TEST 

Question_1                 2                    2               NO TEST    

Question_2                 1                    1               MALE TEST

Question_2                 2                    2               FEMALE TEST 

Question_A                 1                    1               YES

Question_A                 2                    2               NO

 

data have=

 

ID             Question_1               Question_2             Question_A
1                    1                                1                             2

2                    2                                2                             2

3                    1                                1                             1

4                    1                                1                             1

 

 

And what I want is:

ID             Question_1               Question_2             Question_A     
1                    TEST                      MALE TEST            NO

2                   NO TEST                 FEMALE TEST       NO

3                    TEST                      MALE TEST            YES

4                    TEST                      MALE TEST            YES

 

 

I put the numbers intentionally at the end because some of my data do have numbers, and I know how to do this one by one using proc format and cntlin if all the FMTNAME are the same, but it's a monthly process that we would like to automate that includes hundreds of formats.

 

Any help would be great, thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

IF you mean that EVERY variable has its own format then you use that cntrlin data set to construct a format statement. However there would have to be a pattern to the naming of the formats so that you can strip the bit from the name to get back to the variable name.

 

The following constructs a string for the numeric formats.

data cntlin;
   informat fmtname $25. type $1. label $25.;
   input FMTNAME type  START     END  LABEL;
datalines;
Question_1_Fmt        N    1                    1               TEST 
Question_1_Fmt        N    2                    2               NO TEST    
Question_2_Fmt        N    1                    1               MALE TEST
Question_2_Fmt        N    2                    2               FEMALE TEST 
Question_A_Fmt        N    1                    1               YES
Question_A_Fmt        N    2                    2               NO
;
run;

proc sql noprint;
   select distinct catx(' ',transtrn(fmtname,'_Fmt',''),cats(FMTName,'.')) as string
         Into : NumFormatAssign separated by ' '
   From cntlin
   where type='N'
   ;
quit;

%put &NumFormatAssign;

You would use that by adding a format assignment as needed. I would recommend to the step you create the data but thats your choice. One example would be:

proc print data=yourdata;
   format &NumFormatAssign;
run;

View solution in original post

5 REPLIES 5
ballardw
Super User

Do you mean that you are attempting to build a Cntlin data set to build the formats?

If so there are a couple of things to pay attention to. First the Name you supply for the format cannot end in a digit. If you have a lot of these then you may want to end all of the Fmtnames with something like FMT.

FMTNAME               START          END           LABEL
Question_1_Fmt            1                    1               TEST 
Question_1_Fmt            2                    2               NO TEST    
Question_2_Fmt            1                    1               MALE TEST
Question_2_Fmt            2                    2               FEMALE TEST 
Question_A_Fmt            1                    1               YES
Question_A_Fmt            2                    2               NO

 

 

Second is to make sure that you name character starting with $. Makesure that you have Length for the FMTNAME variable large enough to hold the longest expected value, don't forget to count $.

If your dataset is going to create mixed numeric and character formats the the start and end variables with have to be character. Ensure that the length is large enough for the longest expected value.

 

Third, before use of the cntlin data set you need to sort the cntlin dataset by Fmtname. 

You should also include a TYPE variable with values of N (Numeric value),  C (character value), I (numeric Informat) or J(character informat).

 

If you suspect that you may get out of range values you might want to include an "OTHER" value to indicate out or range or such. If so you woud include a character variable name HLO that has the value of O (for other).

 

Anything fancier you might make a short example Proc format statement and use the CNTLOUT option to see all of the variables and options set.

jacob_klimek
Obsidian | Level 7
I think I have the right set up for the dataset. Where I am struggling is just in the application part. The only time I have ever used formatting has been for one format at a time applied to one variable at a time, but I want this to do it for hundreds of variables each with their own independent formatting.
jacob_klimek
Obsidian | Level 7
I think I have the right set up for the dataset. Where I am struggling is just in the application part. The only time I have ever used formatting has been for one format at a time applied to one variable at a time, but I want this to do it for hundreds of variables each with their own independent formatting.
ballardw
Super User

IF you mean that EVERY variable has its own format then you use that cntrlin data set to construct a format statement. However there would have to be a pattern to the naming of the formats so that you can strip the bit from the name to get back to the variable name.

 

The following constructs a string for the numeric formats.

data cntlin;
   informat fmtname $25. type $1. label $25.;
   input FMTNAME type  START     END  LABEL;
datalines;
Question_1_Fmt        N    1                    1               TEST 
Question_1_Fmt        N    2                    2               NO TEST    
Question_2_Fmt        N    1                    1               MALE TEST
Question_2_Fmt        N    2                    2               FEMALE TEST 
Question_A_Fmt        N    1                    1               YES
Question_A_Fmt        N    2                    2               NO
;
run;

proc sql noprint;
   select distinct catx(' ',transtrn(fmtname,'_Fmt',''),cats(FMTName,'.')) as string
         Into : NumFormatAssign separated by ' '
   From cntlin
   where type='N'
   ;
quit;

%put &NumFormatAssign;

You would use that by adding a format assignment as needed. I would recommend to the step you create the data but thats your choice. One example would be:

proc print data=yourdata;
   format &NumFormatAssign;
run;
PGStats
Opal | Level 21

You could use a single table driven format if you put your data in long form. The advantage would be that you wouldn't have to associate a different format to every table column. For example:

 

data QA;
length SET QUESTION $4 ANSWER $3 LABEL $20;
input SET QUESTION ANSWER LABEL &;
datalines;
1 1  1  TEST 
1 1  2  NO TEST    
1 2  1  MALE TEST
1 2  2  FEMALE TEST 
1 A  1  YES
1 A  2  NO
;

data QA_CNTLIN;
set QA end=done;
FMTNAME = "$QAfmt";
length START $15;
START = cats( "S", SET, "Q", QUESTION, "_", ANSWER, "X");
output;
if done then do;
    HLO = "O";
    LABEL = "---";
    output;
    end;
run;

proc format cntlin=QA_CNTLIN; run;

data RESULTS;
SET = 1;
input ID Question_1 Question_2 Question_A;
array Ans Question_:;
array Q{3} $ _temporary_ ("1" "2" "A");
do i = 1 to dim(Ans);
    QUESTION = Q{i};
    ANSWER = Ans{i};
    result = cats( "S", SET, "Q", QUESTION, "_", ANSWER, "X");
    output;
    end;
keep SET ID QUESTION ANSWER result;
format result $QAfmt.;
datalines;
1   1   1  2
2   2   2  2
3   1   1  1
4   1   1  1
5   1   3  1
;

proc transpose data=results out=rtable(drop=_name_) prefix=Question_;
by set id;
id question;
var result;
run;

proc print data=rtable noobs; run;
PG

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 2490 views
  • 0 likes
  • 3 in conversation