Hi,
I have some messy raw data as follows:
Question
What is the Capital of Malawi?
A.
Paris
B.
London
C.
Lilongwe
Question
How many prime numbers are there between 1 - 100?
A.
Three
B.
Four
C.
five
D.
Can't be determined
How would I write a program which will extract a variable question by reading what comes after the observation "Question", and similarly variables A, B, C, D by reading what comes after each of those.
Pardon me if I have not illustrated my question clearly.
Result would look like what it is in the attachment:
Jijil Ramakrishnan
Message was edited by: Jijil Ramakrishnan
If it is already in SAS dataset, the following quick_n_dirty code may help:
data have;
infile cards4 truncover;
input var;
cards4;
Question
What is the Capital of Malawi?
A.
Paris
B.
London
C.
Lilongwe
Question
How many prime numbers are there between 1 - 100?
A.
Three
B.
Four
C.
five
D.
Can't be determined
;;;;
data want;
merge have have(firstobs=2 rename=var=_var) end=last;
length Question $ 100
A $ 100
B $ 100
C $ 100
D $ 100
;
retain A B C D question;
if _n_>1 and var='Question' or last then do;output;call missing (a, b, c, d);end;
select (var);
when("Question") do;question=_var;end;
when ("A.") do; A=_var;end;
when ("B.") do; B=_var;end;
when ("C.") do; C=_var;end;
when ("D.") do; D=_var;end;
otherwise;
end;
drop _var var;
run;
Haikuo
You've illustrated the situation very clearly. But you need to provide a little more guidance on your intended outcome. For example, you could want an outcome that includes one observation per question, and these variables:
question
answer_a
answer_b
answer_c
answer_d
In that case, you might want to provide the largest possible number of answers per question.
Or you might want an outcome that includes many observations per question, one for each possible answer. The variables might be:
question
answer_code (a, b, c, d ...)
answer_text
Or you might want to design something slightly different from either of these.
That sort of information would be helpful.
Ooooops ... looks like your diagram already maps that out. So assuming that you want one observation per question, and that four answers will be the maximum ...
data want;
infile rawdata truncover end=alldone;
length test $ 8;
input test;
length question answer_a answer_b answer_c answer_d $ 100;
informat question answer_a answer_b answer_c answer_d $char100.;
retain question answer_a answer_b answer_c answer_d;
keep question answer_a answer_b answer_c answer_d;
if test = 'Question' then do;
if _n_ > 1 then output;
answer_a = ' ';
answer_b = ' ';
answer_c = ' ';
answer_d = ' ';
input question;
end;
else if test='A.' then input question_a;
else if test='B.' then input question_b;
else if test='C.' then input question_c;
else if test='D.' then input question_d;
if alldone then output;
run;
It's untested code, but should be at least approximatley right.
Good luck.
Dear Astounding,
This is works. Hats off to your writing untested code with near perfection.
This, however, has a problem I can't fix. When it reads a string with more than one word (such as a question), it reads only the first word.
Please tell me how can I read the whole line..
Thanks in advance,
Jijil
For the original problem, reading just a single word, I thought the INFORMAT would take care of that. I guess SAS needs more explicit instructions. The INFORMAT statement isn't needed, but change 5 INPUT statements:
input question_a $char100.;
...
input question_d $char100.;
The issue about already having a SAS data set to begin is a different animal entirely. What is the variable in your SAS data set?
There is only one variable named 'record', all of the above are strings are observations in it.
Jijil
If no one has explained it, have a look at the documentation of the INPUT statement and give particular attention to "the trailing @ sign"
Ooops... I have one more doubt:
Suppose all of rawdata is in SAS format under work.have (sas7bdaat format)
Question
What is the Capital of Malawi?
A.
Paris
B.
London
C.
Lilongwe
Question
How many prime numbers are there between 1 - 100?
A.
Three
B.
Four
C.
five
D.
Can't be determined
How do I use a "set" (instead of "infile") to achieve the same?
Thanks in advance,
Jijil Ramakrishnan
If it is already in SAS dataset, the following quick_n_dirty code may help:
data have;
infile cards4 truncover;
input var;
cards4;
Question
What is the Capital of Malawi?
A.
Paris
B.
London
C.
Lilongwe
Question
How many prime numbers are there between 1 - 100?
A.
Three
B.
Four
C.
five
D.
Can't be determined
;;;;
data want;
merge have have(firstobs=2 rename=var=_var) end=last;
length Question $ 100
A $ 100
B $ 100
C $ 100
D $ 100
;
retain A B C D question;
if _n_>1 and var='Question' or last then do;output;call missing (a, b, c, d);end;
select (var);
when("Question") do;question=_var;end;
when ("A.") do; A=_var;end;
when ("B.") do; B=_var;end;
when ("C.") do; C=_var;end;
when ("D.") do; D=_var;end;
otherwise;
end;
drop _var var;
run;
Haikuo
Hi ,
if the raw file is the exactly in the form as you have mentioned
you could try below code :
date temp;
infile '/path/temp.txt' DLM='09'x DSD MISSOVER;
input mydata $ 50;
run;
data temp_new;
lenght question $ 50 a $ 25 b $ 25 c $ 25 d $ 25 ;
do i=1 to nobs/10;
m=2+m;
set temp point=m nobs=nobs;
question=mydata;
m=m+2;
set temp point=m nobs=nobs;
a=mydata;
m=m+2;
set temp point=m nobs=nobs;
b=mydata;
m=m+2;
set temp point=m nobs=nobs;
c=mydata;
m=m+2;
set temp point=m nobs=nobs;
d=mydata;
retain m;
output;
if nobs then;
stop
run;
i believe this will do
you can try the below which is quite simple
data temp;
infile '/path/file.txt' scanover;
input @'question' question $char100.;
input @'a' a $char20.;
input @'b' b $char20.;
input @'c' c $char20.;
input @'d' d $char20.;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.