Can someone please help me on how to read the below text file format to SAS?
John
Sex=M
Age=31
Country=US
Allergy=False
Mark
Sex=M
Age=32
Country=US
Allergy=True
Allergy1
Allergy2
Allergy3
Chase
Sex=F
Age=34
Country=UK
Allergy=False
I tried simple infile statement and using / but that works if data is consistent. In my case the pattern changes slightly if Allergy field is true. Appreciate your help.
Nope, text file does not have any blank lines.
You need a way to figure out how many allergies to read or to read until you find the next NAME. This assumes that each allergy begins with ALLERGY. Show data this is more realistic to your actual data.
filename FT15F001 temp;
data test;
retain;
infile FT15F001 length=l;
length Name $32 Sex $1 Age 8 Country $8 Allergy $5;
input name / sex= / age= / country= / allergy=;
if first(allergy) eq 'T' then do;
input atype:&$32. @;
do while(atype eq: 'Allergy');
output;
input / atype:&$32. @1 @@;
end;
call missing(atype);
end;
else output;
return;
parmcards;
John
Sex=M
Age=31
Country=US
Allergy=False
Mark
Sex=M
Age=32
Country=US
Allergy=True
Allergy1
Allergy2
Allergy3
Chase
Sex=F
Age=34
Country=UK
Allergy=False
;;;;
run;
Is there any rule when this file was built that determines that your (likely) name is separated from the last allergy?
Is there a limit on the number of allergies per person?
Please show what a SAS data set from this example should look like after it is read.
Assuming that you always have NAME/SEX/AGE/COUNTRY/ALLERGY for each subject here is one way.
Read the file once looking for lines that start with SEX= to find the number of subjects and where they start.
Then on reading the second time you can know how many lines to read for the allergans list.
filename text temp;
options parmcards=text;
parmcards;
John
Sex=M
Age=31
Country=US
Allergy=False
Mark
Sex=M
Age=32
Country=US
Allergy=True
Allergy1
Allergy2
Allergy3
Chase
Sex=F
Age=34
Country=UK
Allergy=False
;
data list ;
if eof then call symputx('nlines',_n_);
infile text end=eof;
retain id 0 start_row -1;
input;
start_row+1;
if _infile_=: 'Sex=' then do; id+1; output; end;
run;
data list;
set list end=eof;
if eof then last_row=&nlines;
else set list(firstobs=2 keep=start_row rename=start_row=last_row);
last_row=last_row-1;
run;
data want;
set list ;
infile text truncover ;
length name $30 sex $1 age 8 country $20 Allergy $5 Allergy_list $200 ;
input name / @5 sex / @5 age/ @9 country / @9 Allergy;
do _n_=start_row+5 to last_row;
input; allergy_list=catx('|',allergy_list,_infile_);
end;
run;
proc print;
run;
start_ Obs id row last_row name sex age country Allergy Allergy_list 1 1 1 5 John M 31 US False 2 2 6 13 Mark M 32 US True Allergy1|Allergy2|Allergy3 3 3 14 18 Chase F 34 UK False
data test;
infile cards truncover;
input temp $100.;
cards;
John
Sex=M
Age=31
Country=US
Allergy=False
Mark
Sex=M
Age=32
Country=US
Allergy=True
Allergy1
Allergy2
Allergy3
Chase
Sex=F
Age=34
Country=UK
Allergy=False
;
data temp;
merge test test(firstobs=2 rename=(temp=_temp));
if _temp =: 'Sex=' then group+1;
run;
data temp;
set temp;
by group;
length name $ 40;
if first.group then do;n=0;name='Name';value=temp;end;
else if findc(temp,'=') then do;name=scan(temp,1,'=');value=scan(temp,-1,'=');end;
else do;n+1;name=cats('Allergy',n);value=temp;end;
keep name value group;
run;
proc transpose data=temp out=want(drop=_name_);
by group;
id name;
var value;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.