This question looks like it should be simple and my answer may seem a little long, but SAS and Excel do not play well when Excel has Merged cells. However the even that can be overcome. I am using SAS V9.4 (TS1M1) with the "SAS/ACCESS Interface to PC Files" software installed. I am also using Excel 2013. The problem is to get around the merged cells. I like to use the libname statement to read most Excel files. In this case I would also use a SAS option called Firstobs and set the value to two. Your real problem in this Question is how to I create a SAS dataset that I can submit to PROC FREQ that will have the correct number of rows columns and responses. I have created that dataset for you, but you need to either read on or skip to the bottom of this post to find it. William E Benjamin Jr SAS Press - William E. Benjamin, Jr. Author Page = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = libname test "c:\temp\respondent_classes.xlsx"; options firstobs=2; For example if you set firstobs = 1 (the default) then print the Excel file you get the following (The type face here is proportional, So I lined up the columns) : proc print data=test."Sheet1$"n; run; Obs Respondent Topics F3 F4 F5 F6 1 . Art Science History Spanish French 2 1 Art History French 3 2 Art 4 3 Art Spanish 5 4 Art Science History Spanish French 6 5 Spanish 7 6 Art Science Spanish French Notice OBS #1 it has a missing value for the value of the variable Respondent. This is because the Variable names and the first row of data (OBS 1 and 2) are from the merged header from Excel. Variables F3, F4, F5, and F6 are generated from the blank cells in the first row of the Excel worksheet. The "OBS 1" record above is really the data in the second row of the Excel worksheet (part 2 of the merged header - you would see them as the "Names" of the variables), but when you set the firstobs=2 the printout looks like this. Obs Respondent Topics F3 F4 F5 F6 2 1 Art History French 3 2 Art 4 3 Art Spanish 5 4 Art Science History Spanish French 6 5 Spanish 7 6 Art Science Spanish French So, why do this. Well look at the variable names. The first two variable names are from the top of the Excel set of merged cells, The rest are just sequentially numbered. That looks like a pattern to me. If you can find a pattern then you can write a program to reproduce it. Take the following code: %macro get_class(col,class); data &class (keep=respondent &class); set test.'Sheet1$'n (rename=(&col=&class) where=(upcase(&class) = "&class")) ; run; %mend get_class; %get_class(TOPICS ,ART ); %get_class(F3 ,SCIENCE ); %get_class(F4 ,HISTORY ); %get_class(F5 ,SPANISH ); %get_class(F6 ,FRENCH ); run; The above code will create 5 output SAS datasets with two variables each. Respondent and one other, see the SAS log excerpt below. I have removed some blank lines. Of course your first statement (as noted elsewhere in these responses) will be I have too many columns and do not want to code them by hand. That of course can be fixed by using either a Proc Contents to an output file and a data step to create the list of macro calls, or using PROC SQL to generate the list of "%get_class" commands. I will not write that code here. 35 %get_class(TOPICS ,ART ); NOTE: There were 5 observations read from the data set TEST.'Sheet1$'n. WHERE UPCASE(ART)='ART'; NOTE: The data set WORK.ART has 5 observations and 2 variables. NOTE: DATA statement used (Total process time): real time 0.14 seconds cpu time 0.01 seconds 36 %get_class(F3 ,SCIENCE ); NOTE: There were 2 observations read from the data set TEST.'Sheet1$'n. WHERE UPCASE(SCIENCE)='SCIENCE'; NOTE: The data set WORK.SCIENCE has 2 observations and 2 variables. NOTE: DATA statement used (Total process time): real time 0.10 seconds cpu time 0.01 seconds 37 %get_class(F4 ,HISTORY ); NOTE: There were 2 observations read from the data set TEST.'Sheet1$'n. WHERE UPCASE(HISTORY)='HISTORY'; NOTE: The data set WORK.HISTORY has 2 observations and 2 variables. NOTE: DATA statement used (Total process time): real time 0.08 seconds cpu time 0.03 seconds 38 %get_class(F5 ,SPANISH ); NOTE: There were 4 observations read from the data set TEST.'Sheet1$'n. WHERE UPCASE(SPANISH)='SPANISH'; NOTE: The data set WORK.SPANISH has 4 observations and 2 variables. NOTE: DATA statement used (Total process time): real time 0.04 seconds cpu time 0.01 seconds 39 %get_class(F6 ,FRENCH ); NOTE: There were 3 observations read from the data set TEST.'Sheet1$'n. WHERE UPCASE(FRENCH)='FRENCH'; NOTE: The data set WORK.FRENCH has 3 observations and 2 variables. NOTE: DATA statement used (Total process time): real time 0.02 seconds cpu time 0.00 seconds 40 41 run; The following code will merge the files with no problem, sorting is not required unless the values of respondent are not in order in the original file. data full_data; merge ART SCIENCE HISTORY SPANISH FRENCH ; by Respondent ; run; Proc freq can then do the counting and will produce the following: proc freq data=full_data; table respondent*ART / list; table respondent*SCIENCE / list; table respondent*HISTORY / list; table respondent*SPANISH / list; table respondent*FRENCH / list; run; Cumulative Cumulative Respondent ART Frequency Percent Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 1 Art 1 20.00 1 20.00 2 Art 1 20.00 2 40.00 3 Art 1 20.00 3 60.00 4 Art 1 20.00 4 80.00 6 Art 1 20.00 5 100.00 Frequency Missing = 1 Cumulative Cumulative Respondent SCIENCE Frequency Percent Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 4 Science 1 50.00 1 50.00 6 Science 1 50.00 2 100.00 Frequency Missing = 4 Cumulative Cumulative Respondent HISTORY Frequency Percent Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 1 History 1 50.00 1 50.00 4 History 1 50.00 2 100.00 Frequency Missing = 4 Cumulative Cumulative Respondent SPANISH Frequency Percent Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 3 Spanish 1 25.00 1 25.00 4 Spanish 1 25.00 2 50.00 5 Spanish 1 25.00 3 75.00 6 Spanish 1 25.00 4 100.00 Frequency Missing = 2 Cumulative Cumulative Respondent FRENCH Frequency Percent Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 1 French 1 33.33 1 33.33 4 French 1 33.33 2 66.67 6 French 1 33.33 3 100.00 Frequency Missing = 3 The final SAS table looks like the following: options firstobs=1; proc print data=full_data; run; libname test close; Once again I aligned the columns because of proportional spacing of the letters. Obs Respondent ART SCIENCE HISTORY SPANISH FRENCH 1 1 Art History French 2 2 Art 3 3 Art Spanish 4 4 Art Science History Spanish French 5 5 Spanish 6 6 Art Science Spanish French The SAS dataset generated here (as Full_data) and shown as the printed output above should be the answer to your question. I have also attached the code.
... View more