SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

inconsistent number of variables per subject - reading the data in?

Accepted Solution Solved
Reply
Frequent Learner
Posts: 1
Accepted Solution

inconsistent number of variables per subject - reading the data in?

I've got data in excel files that looks like this:

_recordId QID122_TEXTQID124_TEXT 
Response IDNumber of ResponsesQ122 - What is the name of this item?Q124 - In the space below, list as many creative uses as you can think of for this...Category
R_3imYisBaKj1DltD1barstretching 4
R_2q7PvEtAwpPZf9F2barreballet2
   clothes 0
R_3QVKnlbMIE7CHZO4a barto hold while squatting1
   to balance your leg1
   to steady your balance1
   to use as a prop9
R_eRGndjYN0wEIFkR3barrehorse tie station10
   clothes rack, 5
   barricade, 8
R_2zBKJc4IOSw1bhb2dance bar stretching4
   dancing 2
R_1CaC6q2dVvHAtyN4balance beando stretches4
   squats4
   help hold balance1
    clothes rack 5

 

The number of responses to the stimulus is a variable.

Based on the number of responses each subject has that many rows of data.

Two sheets per excel file with the same subjects responding to different stimuli.

What is the easiest way to get this data from excel to SAS?

SAS 9.4


Accepted Solutions
Solution
‎01-20-2017 01:17 PM
Super User
Posts: 11,343

Re: inconsistent number of variables per subject - reading the data in?

What is the result supposed to look like? Does each Excel file represent a different topic? Does each only have the response identification, number of responses, two text fields and a category or do they change content layout between files?

Are you planning on combining this data together? Are the response IDs the same across the data sets?

 

If yes to all of those then it may be possible to generate a similar program to read each of those files but the first step is going to be to save them as CSV files so you control reading them. The Number of responses and categores may need a different variable name in each to differentiate when combined. You will also want to identify the longest (most characters used) in each of the columns response id, and the two text columns. The second row would provide LABELS for the variables.

The generic plan would be to read all of the files and then merge or join on the response ID field.

 

data QID122;
   infile "path/qid122Page1.csv" dlm=',' lrecl=500 obs=3;
   informat _recordid   $25.
            QID122_Num  best5.
            QID122_text $50.   /* the 50 and 100 are "guesses" as to possible maximum lengths*/
            QID124_text $100.  /* for these to values, set as needed */
            QID122_Cat  best5.
   ;
   input
      _recordid  
      QID122_Num 
      QID122_text
      QID124_text
      QID122_Cat 
   ;
   label
      _recordid    ='Response Id'
      QID122_Num   ="Number of Q122 Responses"
      QID122_text  ="Q122 - What is the name of this item?"
      QID124_text  ="Q124 - In the space below, list as many creative uses as you can think of for this..."
      QID122_Cat   ="Q122 category"
   ;
run;

Replace the dataset name, infile name to reflect the saved CSV file and various 122 and 124 pieces as needed to read each file. This is the part that is simplified by identifying the longest value across the files.

 

One way to combine the data (if the response Ids are matching between the sheets):

Proc sort data=QID122;
   by responseId;
run;
/* repeat sort for each data set*/
data combined;
   merge
       QID122
       QID125
       /* place all of the data set names here*/
   ;
   by responseid;
run;

View solution in original post


All Replies
Solution
‎01-20-2017 01:17 PM
Super User
Posts: 11,343

Re: inconsistent number of variables per subject - reading the data in?

What is the result supposed to look like? Does each Excel file represent a different topic? Does each only have the response identification, number of responses, two text fields and a category or do they change content layout between files?

Are you planning on combining this data together? Are the response IDs the same across the data sets?

 

If yes to all of those then it may be possible to generate a similar program to read each of those files but the first step is going to be to save them as CSV files so you control reading them. The Number of responses and categores may need a different variable name in each to differentiate when combined. You will also want to identify the longest (most characters used) in each of the columns response id, and the two text columns. The second row would provide LABELS for the variables.

The generic plan would be to read all of the files and then merge or join on the response ID field.

 

data QID122;
   infile "path/qid122Page1.csv" dlm=',' lrecl=500 obs=3;
   informat _recordid   $25.
            QID122_Num  best5.
            QID122_text $50.   /* the 50 and 100 are "guesses" as to possible maximum lengths*/
            QID124_text $100.  /* for these to values, set as needed */
            QID122_Cat  best5.
   ;
   input
      _recordid  
      QID122_Num 
      QID122_text
      QID124_text
      QID122_Cat 
   ;
   label
      _recordid    ='Response Id'
      QID122_Num   ="Number of Q122 Responses"
      QID122_text  ="Q122 - What is the name of this item?"
      QID124_text  ="Q124 - In the space below, list as many creative uses as you can think of for this..."
      QID122_Cat   ="Q122 category"
   ;
run;

Replace the dataset name, infile name to reflect the saved CSV file and various 122 and 124 pieces as needed to read each file. This is the part that is simplified by identifying the longest value across the files.

 

One way to combine the data (if the response Ids are matching between the sheets):

Proc sort data=QID122;
   by responseId;
run;
/* repeat sort for each data set*/
data combined;
   merge
       QID122
       QID125
       /* place all of the data set names here*/
   ;
   by responseid;
run;
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 1 reply
  • 239 views
  • 0 likes
  • 2 in conversation