Hi,
I have a Xlsxfile which I would like to import and create a SAS data set. The excel has 5 columns out of which 3 are numeric. For the 3 numeric columns, I would like to remove the observations if the column has a non numeric value.
It will be great if suggest some way to achieve this?
Thanks in advance,
So the columns really arent numeric if they have other values in them. If they are no missing values to begin with, you can read those columns as numeric and weed out the observations by deleting those that come in as missing (SAS will set non numeric values to missing). This might require an infile statement instead of an import through the wizard or proc depending on how often the values show up in the column (simple explanation - proc import or the wizard sets the column value by scanning part of each column to see what the column needs to be, If it reads in as numeric through this scan the non numeric values will be set to missing through this process as well. IF it reads non numeric it will set the data type to character).
If the list of values is already known you can use a datastep to delete out the observations:
data restricted;
set raw;
if var1 in ("A" "B" C") or var2 in ("D" "C" "A") or .... then delete; /*alternative would be to output to a separate dataset*/
run;
You could combine the two tactics if the variables dont read in all the same.
data restricted;
set raw;
if var1 in ("A" "B" "C") or var2 = . or then delete;
run;
Hopefully this gives you a start at what you need!
EJ
If you know that the variable is SUPPOSED to be numeric and people are accidently creating character values AND you ALWAYS want to remove all of the character data, just do an input to manually convert the values to numeric.
Note you are changing data in doing this, and losing the record for all variables that are not numeric, but this is what you asked for (I still wouldn't recommend this without understanding why your variables aren't numeric if they're supposed to be).
Anyways just do something like this
proc import datafile='myfile' out=newdata replace;
guessingrows=10000;
run;
data newdata_fixed;
set newdata;
NewNumericfield=input(character_Field,8.);
run;
This will convert all of the characters to null values within your numeric fields.
Hi Anotherdream ,
Thanks a lot for the reply.
Yes, the variable in the xlsx is supposed to be numeric. But if people are putting some values by mistake, I would like to identify the records with invalid values and produce a report to the user.
Regards,
Sheeba S
Here's one way:
data invalid;
set have;
length var_name var_value $ 32;
array check {5} A B C D E;
do _n_=1 to 5;
      if check{_n_} > ' ' and input(check{_n_}, ??12.) = . then do;
         var_name = vname(check{_n_});
var_value = check{_n_};
output;
end;
end;
run;
It hard-codes the variable names, and assumes that they are already character. Automating that functionality might be your next step once this part is working.
Good luck.
Hi Astounding,
Thanks a lot for the reply.
I will definitely give it a try.
Regards,
Sheeba Swaminathan
Hi EJ,
Thanks a lot for the reply.
I am trying to create a reusable project to be used by n number of users. The idea is to allow the users to upload an xlsx file. Once the file is in the server, plan is to use proc import. The structure of xlsx is known ( 3 character columns and 2 numeric columns). Numeric columns are cost price and list price. My requirement is to check if the numeric columns ( cost and list price) has any non numeric value like character values or invalid values and produce a report with those observations. so the list of values is not known
Regards,
Sheeba S
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
