I want to range check a dataset and have an external lookup table containing variable names and links to where their valid values are stored.
I want to retrieve the valid values stored in those lookup files and compare those values with my data. I want to create an indicator which equals "yes" when there is a match and "no" when there is no match.
Can anybody help me find a solution? Thanks a million.
Depending on how often the values in the CSV files change, you may be able to use the features of PROC FORMAT to create a user-defined format from your CSV files...
1) read the CSV file into SAS
2) create a dataset tailored to PROC FORMAT CNTLIN
3) run a PROC FORMAT step using the dataset in #2
4) then the new format is available for doing the kind of recoding you want -- you can build the format on a permanent basis if the data doesn't change much or you can build the format each time you run the job that does the validation checking.
5) repeat #1, 2 and 3 for every one of your variables -- probably in a SAS Macro program -- as Chuck indicated.
There's an example at the end of this post that uses SASHELP.SHOES as an example...to create 1 format for Product and create a new variable with the format.
If your data changes hourly or daily, this might not be the preferred approach. but if your CSV files only change weekly or occasionally, then you might be able to work a CNTLIN/PROC FORMAT example into a SAS Macro program solution that would update the permanent formats when the CSV files changed. OR you could set up a periodic refresh.
** make some data to test;
** You would read your CSV file here;
** Only Products listed here will be coded with 'Yes';
length Product $14;
infile datalines dlm=',' dsd;
input Product $ ;
** make a user-defined format for the products;
set work.file1(rename=(product=start)) end=eof;
Retain FMTNAME "PRODF"
Length LABEL $60;
Label = "Yes";
if eof then do;
Proc Format CntlIn = fmt_from_data fmtlib;
** start ODS;
ods listing close;
ods html file='c:\temp\useFmt.html' style=sasweb;
newflag = put(product,$PRODF.);
proc print data=makeflag;
where region = 'Asia';
var region product newflag sales;
title 'Only Print One Region';
** PROC FREQ;
proc freq data=makeflag;
title 'Using Format to Double Check Values for Product';
tables product newflag;
format product $PRODF.;