Could someone please help me with the following situation:
I have a raw data file, which has the following content:
Name:Tom
Age:24
Gender:Male
Email:Tom@test.com
Name:Dick
Age:32
Gender:Male
Name: Harry
Age:30
Gender:Male
Email:Harry@test.com
/*note that the second observation does not have Email */
My code (below) reads this:
data Basic1;
length Record $100;
infile "D:\Analytics\Data\Address.txt" ;
input @'Name:' name : $200. @@;
if not missing(name);
input @'Age:' age : 3. @@;
if not missing(age);
input @'Gender:' gender : $1. @@;
if not missing(gender);
input @'Email:' email : $20. @@;
if not missing(email);
drop record;
run;
SAS reads only two observations, and the email for the second observation is taken from the third.
Obs name age gender email
1 Tom 24 M Tom@test.com
2 Dick 32 M Harry@test.com
Could someone help me to fix this. Ideally, the email variable for the second observation, as it is missing, should be "".
Thanking in advance,
Jijil Ramakrishnan
Curiosity killed the cat with me and I came up with the method I would use:
data Basic1;
length Rec $ 1000 Name $ 200 Age 8 Gender $ 1 Email $ 20;
drop rec;
infile "...your path...\Address.txt" end=last;
input ;
rec = _infile_;
If index(rec,'Name:')>0 then do;
Name = strip(scan(rec,2,":"));
Age = .;
Gender = "";
Email = "";
end;
else if index(rec, 'Age:')>0 then Age = put(scan(rec,2,":"),8.);
else if index(rec, 'Gender:')>0 then Gender = strip(scan(rec,2,":"));
else if index(rec, 'Email:')>0 then Email = strip(scan(rec,2,":"));
retain Name Age Gender Email;
if rec = "" or last = 1 then output;
run;
I think this accomplishes what you need.
EJ
I believe your problem is your having sas looking for "Email:" (input @'Email:'), so for the second obs it jumps to the third observation which is the next occurrence of that string.
Im sure there is an elegant way in the input statement to handle this, I would probably just read the data into a single variable and parse it with data step logic to get what you want.
EJ
Dear EJ,
Thanks for the tip. Before I try to parse, I thought we we could fix it in the input statement itself.
If someone could help me to do the following, it will be great:
If the string "Name" appears before "@" character, email ="".
I searched on Google but failed. Hard to believe that it is a rare problem...:(
Regards,
Jijil
Some remarks:
1/ The named list approach would be easier. It is pity you have the : and no = as separation. Perhaps something smart to change : = roles? SAS(R) 9.3 Statements: Reference (list input)
2/ I do not understand your approach.
As you have all: if not missing(...); statements you are actually coding you do not one record where some variable is missing
You got exactly what you have coded. 2 records where nothing is missing.
The drop record ; statement is coding to drop the variable record. As not being used yet it creates it and immediate drop it.
I assume some basic concept of the data-step is failing. SAS data-steps are different to normal record processing of other languages that it does all "next record: processing automatic. You just code what should happen with a current record.
For a smart read process, reading the lines.
- is the blank line always there en indicating en of record? (name age gender mail)
- is the order Always the same?
- can you intrpreted line/for line the values?
3/ To remember when alle data would be delivered as one record.
SAS(R) 9.3 Statements: Reference (missover == flowover scanover stopover truncover)
Would suggest missover when records are at the end missing
Dear Mr. Karman,
Thank you.
Here is how I can answer your questions
2.if not missing(..);is not necessary. The actual code is
data Basic1;
infile "D:\Analytics\Data\Address.txt" ;
input @'Name:' name : $200. @@;
input @'Age:' age : 3. @@;
input @'Gender:' gender : $1. @@;
input @'Email:' email : $20. @@;
run;
(everything else was sort of my attempts to circumvent this problem. apologies)
- in the raw data, blank line is the end of a record
- order is always the same
- except email, other fields are all mandatory
(I am sorry, I do not know how to respond to other points)
Regards,
Jijil
try,
data Basic1;
infile "D:\Analytics\Data\Address.txt" ;
input @'Name:' name : $200. /
@'Age:' age : 3. /
@'Gender:' gender : $1. / tmpvar $1. @ ;
IF length(tmpvar) > 1 Then input @'Email:' email : $20. / ;
run;
(code not checked)
Thinking the approach:
The "/" first 3 records can be processed in one input statement multiple lines at once.
One @ is enough to hold the iteration within this record
It will read the blank/non-blank line.
When blank next iteration.... otherwhise process mail and proceed one line.
This is playing with the line-orders and input.
Exercise to learn inputprocessing.
probably Jaap's solution would be the quickest and easiest and quite straightforward
data Basic1;
infile 'C:\Users\Trajce\Contacts\Desktop\test.txt' missover;
input @'Name:' name : $200. /
@'Age:' age : 3. /
@'Gender:' gender : $1. /
@'Email:' email : $20.
#4;
proc print;run;
Curiosity killed the cat with me and I came up with the method I would use:
data Basic1;
length Rec $ 1000 Name $ 200 Age 8 Gender $ 1 Email $ 20;
drop rec;
infile "...your path...\Address.txt" end=last;
input ;
rec = _infile_;
If index(rec,'Name:')>0 then do;
Name = strip(scan(rec,2,":"));
Age = .;
Gender = "";
Email = "";
end;
else if index(rec, 'Age:')>0 then Age = put(scan(rec,2,":"),8.);
else if index(rec, 'Gender:')>0 then Gender = strip(scan(rec,2,":"));
else if index(rec, 'Email:')>0 then Email = strip(scan(rec,2,":"));
retain Name Age Gender Email;
if rec = "" or last = 1 then output;
run;
I think this accomplishes what you need.
EJ
Thank you EJ!
Jijil Ramakrishnan
EJ,
Running yr code for 'Missing Data', We are getting only last observation. Is that our ojective to get. Else, how to get all (3) observations using yr code.? could you please help us.
-Thanks
_Kamal
Im afraid I dont understand the problem. The code works with the supplied sample data.
If your data is different then yes you would have to edit the code to allow for the differences. My guess is that you dont have empty rows in your txt file between observations like the sample data does so the output line only occurs on the last line of the input file where end of file flag is set (end = last .... last = 1 then output).
EJ
You may want to start a new thread to solve any new issues.
EJ
EJ,
I'm exactly taking about the supplied sample data posted by JAR. I just want to know how to get all 3 observations (one observation with missing email) with your code.
Following is the desired output..
Obs name age gender email
1 Tom 24 M Tom@test.com
2 Dick 32 M
3 Harry 30 M Harry@test.com
=Kamal
Im not sure how to respond ... the posted code when used with the sample provides exactly what you show your desired output to be:
So Im not sure how to help you.
EJ
EJ,
You are absolutely right, I don't know what went wrong last time when I ran the code . Today , I ran again and is perfect.Thanks
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.