- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Assume I'm reading a .csv file to create one dataset. It has variables like ID,Sex and Age. Now I want to perform checks like ID cannot be missing, Sex should not be other than 'M' or 'F' and Age value should not be character. In case if the CSV file has data which doesn't comply with the above rules then I've to capture that record into another dataset.
Assume I've the CSV file like below.
ID | Sex | Age |
1 | M | 21 |
. | F | 23 |
23 | M | H |
In the above example, second and third record doesn't comply with the above said rules, so I want to capture those records in WANT dataset as follows. Data in faulty record should be seperated by spaces.
Err_Message | Faulty_Record |
ID variable cannot be missing | . F 23 |
Age variable cannot be character | 23 M H |
Can anyone provide me some guidance on this exercise?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Are you looking for a data set or a report? I'm pretty sure that SAS would import the age column as a character variable if it detects an "H" in it, so your second error message wouldn't occur. I imported a .csv with the same data and that's what happened--SAS imported the field as character.
Other than that, you can do this?
data have;
input id $ sex $ age $;
datalines;
1 M 21
. F 23
23 M H
;
run;
data want;
set have;
if id = " " then do;
err_message = "ID Variable cannot be missing";
end;
run;
proc print data=want;
where err_message ^= " ";
run;
This obviously provides both a data set and a report.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
What should happen if a record has more than one error? The following code writes one obs for each error:
filename bob "%sysfunc(pathname(work))\test.csv";
data _null_;
file bob;
infile datalines;
input;
put _infile_;
datalines;
ID,Sex,Age
1,M,21
.,F,23
23,M,H
;
run;
data work.cleandata(keep=Id Sex Age)
work.errors(keep=Err_Message Faulty_Record)
;
length
Id 8 Sex $ 1 Age 8
Err_Message $ 100
Faulty_Record $ 250
;
infile bob firstobs=2;
input;
Id = input(scan(_infile_, 1, ','), ?? best.);
Sex = scan(_infile_, 2, ',');
Age = input(scan(_infile_, 3, ','), ?? best.);
if missing(Id) then do;
Faulty_Record = _infile_;
Err_Message = 'ID variable cannot be missing';
output work.errors;
end;
if Sex not in ('M', 'F') then do;
end;
if missing(Age) then do;
Faulty_Record = _infile_;
Err_Message = 'Age is not numeric';
output work.errors;
end;
if missing(Faulty_Record) then do;
output work.cleandata;
end;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I might suggest incorporating the line number of the file read into the error record data set as well. Usually the automatic variable _n_ helps provide that.