Learning SAS? Welcome to the exclusive online community for all SAS learners.

Importing data issue

Reply
New Contributor
Posts: 3

Importing data issue

I have data in which some of the variables have more than 1 answer. It is in an Excel spreadsheet. Here's an example: the subject was asked which social media he/she uses.  Several respondents replied 'facebook, twitter, instagram.'  How do I import these variables into SAS and have SAS recognize that each of the answers belong to the same respondent?  

Super User
Posts: 17,776

Re: Importing data issue

When you import it will become a single variable with the value of 'facebook, twitter, instragram'. 

 

You'll need to split it out manually afterwards, which is a relatively easy process using the SCAN() and COUNTW() functions. 

 

The majority of Survey Applications will export data with multiple choice questions separated already into individual fields. 

It would end up being something like variables

Q3_Facebook 

Q3_Twitter

Q3_Instagram

Super User
Posts: 10,476

Re: Importing data issue

My approach would be to save the Excel data as a CSV file and then use proc import or the data import wizard to import that CSV file. Use options to specify a guessingrows value equal to the number of rows in the data or 32767 which ever is smaller.

The result may be useable but likely wants a few more steps. The import will generate a datastep program to read the file that you can copy into the program editor and modify. Likely modifications would be setting informats to generate numeric or character variables that the "guesses" made by import don't match you needs. You can also add labels and do some initial data manipulation. One of the likely ways would be to split out that text into multiple variables that indicate whether that word or choice appeared in the response.

 

Something like:

data example;
   x= 'facebook, twitter, instagram.' ;
   /* see if the word "facebook" appears*/
   Facebook = (index(upcase(x),'FACEBOOK') > 0);
   /* upcase so you can search for a single value instead of Facebook, FaceBook, fACEBook or whatever*/
   /* if you have lots of responses like "face book" you might search for just "FACE" if that works*/
   /* Index returns the position in the searced string value (the variable X) where the target*/
   /* FACEBOOK occurs or 0 if not found. Putting () around the whole thing has SAS return the */
   /* logical value True, a 1, or False, a 0. Note that you can then SUM the value of the */
   /* variable Facebook to get a count of records that had the response include Facebook */
   /* and a Mean will get a percentage in decimal form:  .135 = 13.5% */
   /* create additional variables for other text responses*/
   Twitter = (index(upcase(x),'TWITTER') > 0);
   Instagram = (index(upcase(x),'INSTAGRAM') > 0);
   /* after you have all of the values you care for pulled out you can set an Other respons*/
   OtherResponse = (sum(Facebook,Twitter,Instagram) = 0 );
run;
Super User
Posts: 17,776

Re: Importing data issue

If the file is converted to CSV make sure the field is quoted so it's read properly. Since it will be variable length and comma delimited it's goimg to cause issues otherwise. 

New Contributor
Posts: 3

Re: Importing data issue

I'm fairly new to SAS.  I'm using the university edition to learn more about how to use it.  The dataset is actually a list of people on a medication.  The variable with multiple answers is 'reason for discontinuation'. The answers can be 0, 1, 2, and/or 3.  

 

Here's the code to import and generate a table of the data:

 

SQL;
CREATE TABLE WORK.query AS
SELECT Pt_No , 'Name'n , DOB , Gender , IBD , Disease_duration__years_ , Previous_Biologic , Reason_for_Discontinuation , Previous_Qx_IBD , Steroid_use , Adverse_effects_of_Entyvio , Remission , Provider FROM WORK.ENTYVIO;
RUN;
QUIT;

PROC DATASETS NOLIST NODETAILS;
CONTENTS DATA=WORK.query OUT=WORK.details;
RUN;

PROC PRINT DATA=WORK.details;

 

How can I modify it to recognize the multiple reasons for discontinuation?

Super User
Posts: 10,476

Re: Importing data issue

How can I modify it to recognize the multiple reasons for discontinuation?

 

Are you saying you get responses for a single person for the variable of something like "0,3"?

If so, I can't provide a good variable name but you could create something such as

(find(Reason_for_Discontinuation,'0')> 0) as Reason0, (find(Reason_for_Discontinuation,'1')> 0) as Reason1, <continue the obvious pattern for 2 and 3>

 

If "1" means "Sideeffects" then that would make more sense for a variable name then Reason1.

 

 

 

Why are you using 'Name'n . Name alone should be sufficient as a variable.

New Contributor
Posts: 3

Re: Importing data issue

I'm using the import wizard to import the spreadsheet.  It includes an option for SAS to create it's own variable names.  The one it chose for name is 'Name'n.

 

And yes, I get responses for a single person for the variable of something like "0,3".

 

So, if I wanted to make a bar chart of the number of people that discontinued the medication for each reason and then do a chi^2 analysis on this variable, would I be able to use '(find(Reason_for_Discontinuation,'0')> 0) as Reason0, (find(Reason_for_Discontinuation,'1')> 0) as Reason1, <continue the obvious pattern for 2 and 3>' to do it?

Super User
Posts: 10,476

Re: Importing data issue

You need to know what you want for analysis. If I want to to a barchart that counts every occurence of "3" including the ones that appear in a response like "0,3" then I have to separate them some how. For Charting I would actually tend to create 2 observations.

Something like:

data have;
   input id r $;
datalines;
1 0
2 1
3 0,3
4 0,1,2
;
run;

data plot (keep=Id Plotval);
   set have;
   do i= 0 to 3;
      PlotVal=i;
      if index(r,put(i,1.))>0 then output;
   end;
run;

ods graphics on;
proc sgplot data=plot;
   vbar plotval/stat=freq;
run;

 

To make more meaningful charts use custom formats and labels to indicate the meaning :

proc format library=work;
value plotval
0='Cost'
1='Side effects'
2='Doc orders'
3='Something else'
;
run;
proc sgplot data=plot;
   vbar plotval/stat=freq;
   format plotval plotval.;
   label Plotval='Discontinuation Reason';
run;

 

Something similar to plot data could also work for some uses of your hypothetical chi-square ir you were looking at total choices. Decisions as to whether numbers of people or responses are yours to make. If People are the critical element then you might create new category for your response that indicates "multiple choices were made". If the number of these is very small in relation to the overall number of responses the multiple (or specific combinations of multiple ) then you deal with it. If the question was NOT supposed to have multiple responses you might make another analysis variable with only the values of the single responses and the multiples set to missing. Compare the results of the analyis broken down both ways.

The percentage of respondents with multiple responses and the overall number of respondents would go into making a decision on recoding the data. The multiple index variables would be used in the analysis or summaries of "How are responses to question 5 related to response of 3 to the reason_for_dicontinuation".

Ask a Question
Discussion stats
  • 7 replies
  • 343 views
  • 1 like
  • 3 in conversation