BookmarkSubscribeRSS Feed
David_Billa
Rhodochrosite | Level 12

Assume I'm reading a .csv file to create one dataset. It has variables like ID,Sex and Age. Now I want to perform checks like ID cannot be missing, Sex should not be other than 'M' or 'F' and Age value should not be character. In case if the CSV file has data which doesn't comply with the above rules then I've to capture that record into another dataset.

 

Assume I've the CSV file like below.

 

IDSexAge
1M21
.F         23
        23MH

 

In the above example, second and third record doesn't comply with the above said rules, so I want to capture those records in WANT dataset as follows. Data in faulty record should be seperated by spaces.

 

Err_MessageFaulty_Record
ID variable cannot be missing. F 23
Age variable cannot be character23 M H

 

Can anyone provide me some guidance on this exercise?

3 REPLIES 3
maguiremq
SAS Super FREQ

Are you looking for a data set or a report? I'm pretty sure that SAS would import the age column as a character variable if it detects an "H" in it, so your second error message wouldn't occur. I imported a .csv with the same data and that's what happened--SAS imported the field as character.

 

Other than that, you can do this?

 

data have;
input id $ sex $ age $;
datalines;
1 M 21
. F 23
23 M H
;
run;

data want;
set have;
if id = " " then do;
	err_message = "ID Variable cannot be missing";
end;
run;

proc print data=want;
where err_message ^= " ";
run;

 This obviously provides both a data set and a report.

andreas_lds
Jade | Level 19

What should happen if a record has more than one error? The following code writes one obs for each error:

 

filename bob "%sysfunc(pathname(work))\test.csv";

data _null_;
   file bob;
   infile datalines;
   input;
   put _infile_;
   datalines;
ID,Sex,Age
1,M,21
.,F,23
23,M,H
;
run;


data work.cleandata(keep=Id Sex Age)
   work.errors(keep=Err_Message Faulty_Record)
;
   length 
      Id 8 Sex $ 1 Age 8
      Err_Message $ 100
      Faulty_Record $ 250
   ;

   infile bob firstobs=2;
   input;

   Id = input(scan(_infile_, 1, ','), ?? best.);
   Sex = scan(_infile_, 2, ',');
   Age = input(scan(_infile_, 3, ','), ?? best.);

   if missing(Id) then do;
      Faulty_Record = _infile_;
      Err_Message = 'ID variable cannot be missing';
      output work.errors; 
   end;

   if Sex not in ('M', 'F') then do;
      
   end;

   if missing(Age) then do;
      Faulty_Record = _infile_;
      Err_Message = 'Age is not numeric';
      output work.errors;
   end;

   if missing(Faulty_Record) then do;
      output work.cleandata;
   end;
run;
ballardw
Super User

I might suggest incorporating the line number of the file read into the error record data set as well. Usually the automatic variable _n_ helps provide that.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 684 views
  • 2 likes
  • 4 in conversation