BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
MelissaM
Obsidian | Level 7

I imported two csv files with survey response data into SAS 9.4 M8. The files are identical with one exception: one has the responses coded, and one has the actual formats. I need assistance with creating a format library. I currently have 359 (inefficiently created) data sets, one for each question, and all sets have been named DATA001, DATA002 ... DATA359. I dedup’d them, so they all look like the one below (except with no blank rows). If it matters, the numeric responses are not always sequential, nor do they begin at 1, nor are they intuitively assigned. By the way, this was messy without the Access to PC Files.

 

    gain gainf  
    1 Yes  
    2 No  
    3 I don't know  
         
    gender Genderf  
    1 Male  
    2 Female  
    3 Prefer not to answer  
    marital status marital statusf  
    2 Divorced  
    3 Married  
    4 Single  
    5 Unknown  
    6 Widowed  
1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

Sticking with what you have this should work. 

 

libname projfmt "/home/fkhurshed/CLASS1";
options obs=max fmtsearch=(projfmt);

data data001;
    infile cards dsd truncover;
    input gain gainf $20.;
    cards;
1, Yes
2, No
3, I Don't Know
;



data data002;
    infile cards dsd truncover;
    input gender genderf $25.;
    cards;
1, Female
2, Male
3, Prefer not to answer
;



    %macro uploadfmt();
        %do i=1 %to 2;
            %let dsnin= data%sysfunc(putn(&i, z3));
            *get list of variable names;

        proc contents data=&dsnin out=varlist noprint;
        run;

        *rename variables;

        data _null_;
            set varlist end=eof;

            if _n_=1 then
                do;
                    call execute("data _&dsnin.; set &dsnin;");
                    call execute(cat("rename ", name, '=start;'));
                end;

            if _n_=2 then
                do;
                    call execute(cat("rename ", name, '=label;'));
                    call execute(catt("fmtname='", name, "';"));
                end;

            if eof then
                call execute('run;quit;');
        run;

        *create format;

        proc format cntlin=_&dsnin. lib=projfmt;
        run;

        proc sql;
            drop table varlist;
            drop table _&dsnin.;
        quit;

    %end;
%mend;

%uploadfmt();

proc format cntlout=want lib=projfmt;
run;

View solution in original post

7 REPLIES 7
Reeza
Super User
Is the format name not filled in as the example? and the extra rows??

I'm assuming you're planning to use CNTLIN, so you can create the format?
MelissaM
Obsidian | Level 7

Hi, Reeza. Yes to the CNTLIN. As for the format name, they provided me one data set with 1 2 3 and one data set with Yes No IDK. I merged them to get what we see there. There are a lot of questions (some with the same question numbers) so I'll just number the questions sequentially (except for YN since that's repeated). No extra lines.

MelissaM
Obsidian | Level 7
Open to suggestions, by the way.
Tom
Super User Tom
Super User

Your description is unclear but I think you are saying that you have two CSV files with the same number of columns and the same number of lines and the same header row (variable names).  But in one of the files the values are the CODES and the other file the values are the DECODES.  And you want to use this to derive the formats that were used to convert the CDOES into the DECODES.

 

It might be simplest to ignore the variable names right now. 

But it does helps to know how many columns (variables) there are.

 

So let's first make a simple example pair of CSV files with 3 variables.

filename codes temp;
filename decodes temp;

options parmcards=codes;
parmcards4;
name,sex,age
Alfred,M,14
Alice,F,13
Barbara,F,13
;;;;

options parmcards=decodes;
parmcards4;
name,sex,age
Alfred,Male,Older
Alice,Female,Younger
Barbara,Female,Younger
;;;;

Now we just need to read in both files at once and make some VALUE/LABEL pairs and then summarize.

data code_decode ;
  row +1 ;
  do col=1 to 3;
     infile codes dsd firstobs=2 truncover ;
     input value :$100. @ ;
     infile decodes dsd firstobs=2 truncover ;
     input label :$100. @ ;
     output;
  end;
run;

proc freq data=code_decode ;
  tables col*value*label / missing list noprint out=formats ;
run;

Results

Obs    col    value      label      COUNT    PERCENT

 1      1     Alfred     Alfred       1      11.1111
 2      1     Alice      Alice        1      11.1111
 3      1     Barbara    Barbara      1      11.1111
 4      2     F          Female       2      22.2222
 5      2     M          Male         1      11.1111
 6      3     13         Younger      2      22.2222
 7      3     14         Older        1      11.1111

At this point you will want to make some decisions about which variables need to have formats defined.

You might also need to decide whether you need to make FORMATs so you can display the CODE values as the DECODE values.

Or if you want to create INFORMATs do you can read the CODE values and generate the DECODE values into the dataset. (Or perhaps the reverse to read the DECODE values and generate the CODE values into the dataset.)

Reeza
Super User
Can you assume all variables are numeric with a character label? If not, you may want to add a type column as well.

I would read all CSV in one data step and use the file name as the format name instead. Otherwise you need to go through and rename each data step, which can be automated but is annoying for sure.

Stack them all into one and then pass that to CNTLIN.
Reeza
Super User

Sticking with what you have this should work. 

 

libname projfmt "/home/fkhurshed/CLASS1";
options obs=max fmtsearch=(projfmt);

data data001;
    infile cards dsd truncover;
    input gain gainf $20.;
    cards;
1, Yes
2, No
3, I Don't Know
;



data data002;
    infile cards dsd truncover;
    input gender genderf $25.;
    cards;
1, Female
2, Male
3, Prefer not to answer
;



    %macro uploadfmt();
        %do i=1 %to 2;
            %let dsnin= data%sysfunc(putn(&i, z3));
            *get list of variable names;

        proc contents data=&dsnin out=varlist noprint;
        run;

        *rename variables;

        data _null_;
            set varlist end=eof;

            if _n_=1 then
                do;
                    call execute("data _&dsnin.; set &dsnin;");
                    call execute(cat("rename ", name, '=start;'));
                end;

            if _n_=2 then
                do;
                    call execute(cat("rename ", name, '=label;'));
                    call execute(catt("fmtname='", name, "';"));
                end;

            if eof then
                call execute('run;quit;');
        run;

        *create format;

        proc format cntlin=_&dsnin. lib=projfmt;
        run;

        proc sql;
            drop table varlist;
            drop table _&dsnin.;
        quit;

    %end;
%mend;

%uploadfmt();

proc format cntlout=want lib=projfmt;
run;
MelissaM
Obsidian | Level 7

Thank you @Tom and @Reeza for joining your superpowers together and not only providing the ideal solution (which I couldn't come up with even on my best day this efficiently), but also for both offering up some new skills. Perfect!

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 602 views
  • 1 like
  • 3 in conversation