BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
klchtsk
Calcite | Level 5

Hi all!

I'm trying to build a macro that will read the variables in given dataset based on PROC CONTENTS output. It should check if the variable has invalid variable (that are replaced with ? symbol and interrupt data processing).

klchtsk_0-1649051873741.png

I have built following macro:

 

%macro cleanvar;
	proc contents data=mc1.customers out=varlist noprint;
	proc sql noprint;
		select name into :var1- from varlist where Type=2;
	quit;

	%do i=1 %to &sqlobs;

		data dataset;
			set mc1.customers;
			check&i=notalnum(trim(&&var&i));
		run;

		proc sql noprint;
			select min (check&i) into :flag from dataset;
		quit;

		%if &flag=0 %then
			%do;

				data output;
					set mc1.customers;
					Clean_&&var&i=compress(&&var&i, " ", "f k");
				run;
			%end;
	%end;
%mend cleanvar;

%cleanvar

But it's obvious that this version re-writes the last Clean_var column. I have done another option, which I won't publish (it moves the DO loop in the DATA step) which does COMPRESS to all character variables (columns). 

But in my opinion it should:

- pull the list of variables, where they are type 2 - character;

- check if at least one value in the variable is invalid with NOTALNUM (I think there should be SMALLEST in array, but I don't know how to do it);

- and do COMPRESS only for those variables which contain invalid values.

 

I'm studying SAS using SAS VLE, so the dataset in MC1.CUSTOMERS, which I attach to the post.

UPD: I have removed attached dataset to avoid misconceptions about the content of it. The file is taken from SAS VLE Macro 1 lesson at:

https://vle.sas.com/pluginfile.php/783487/mod_scorm/content/23/coursefiles.zip

folder data/customers SAS Data Set.

 

P.S. Strangely enough, in educational videos and in solutions, which are based on this dataset it doesn't contain invalid chars (those replaced with ?). Maybe it makes better sense to prevent these character to appear in first place, hence I will be glad to receive any kind of hint either how to change the macro or how to avoid the interruption in data processing. Thank you!

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

I suspect you out-thunk yourself.

First, you check a variable for the presence of X, then you run a step to remove X.

If you just used the COMPRESS function, the result would be the same anyway (it only removes something when it finds it).

 

So you can do everything in one step:

data want;
set have;
array vars {*} _character_;
do i = 1 to dim(vars);
  vars{i} = compress(vars{i}," ","f k");
end;
drop i;
run;

If you want to create new variables, you will need some preceding code (read from DICTIONARY.COLUMNS) to create the names for the new variables, and use a second array.

View solution in original post

6 REPLIES 6
andreas_lds
Jade | Level 19

Sorry, but i won't download any files.

First thing you should do is, getting it done without any macro code, because if you can't get the expected results without macro code, it is very likely that you won't succeed by adding macro complexity to the problem.

klchtsk
Calcite | Level 5
Thank you for your reply! I have removed the dataset and included link to SAS VLE, whence I took it, to avoid misconception about the content of it. I will try to find the easier solution!
Kurt_Bremser
Super User

The educational versions of SAS (like SAS On Demand for Academics) use UTF-8 encoding. It seems your SAS runs with the older default of WLATIN-1.

The simples solution to your issue is to start SAS with UTF encoding, so it can handle the multi-byte characters. How you do that depends on your type of SAS installation (client-server vs local).

klchtsk
Calcite | Level 5
Thank you for your reply! I will check how to change encoding and I'm sure it will work. But a curious part of me wants to make a macro solution to clean the data from invalid characters and keeping the cleaned variable from previous iteration in DO loop. May I ask you to take a closer look at the macro statement included in the post? Thank you!
Kurt_Bremser
Super User

I suspect you out-thunk yourself.

First, you check a variable for the presence of X, then you run a step to remove X.

If you just used the COMPRESS function, the result would be the same anyway (it only removes something when it finds it).

 

So you can do everything in one step:

data want;
set have;
array vars {*} _character_;
do i = 1 to dim(vars);
  vars{i} = compress(vars{i}," ","f k");
end;
drop i;
run;

If you want to create new variables, you will need some preceding code (read from DICTIONARY.COLUMNS) to create the names for the new variables, and use a second array.

klchtsk
Calcite | Level 5

Thank you so much! It worked! I suspected it should include array thing, but I didn't know how to use it! Now I know!

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1181 views
  • 1 like
  • 3 in conversation