BookmarkSubscribeRSS Feed
superman1
Fluorite | Level 6

d

5 REPLIES 5
Kurt_Bremser
Super User

Don't clutter up your data step code with data.

I would create a format that catches all incoming values and translates them to correct ones. Create the format from a dataset in CNTLIN format:

data myformat;
infile datalines dlm=',' dsd truncover end=done;
input (start label) (:$20.);
fmtname = "continent_corr";
type = "C";
output;
if done then do;
  start = "Other";
  label = "************";
  hlo = "O";
  output;
end;
datalines;
North America,North America
South America,South America
Europe,Europe
europe,Europe
New York,North America
;

proc format cntlin=myformat;
run;

Then apply this to your data:

data survey_check;
set survey (keep=continent);
length con_check $20;
con_check = put(continent,continent_corr.);
if con_check = "************";
run;

This dataset will contain all values your format didn't catch yet, so you can include them in the DATALINES.

 

 

PS since the OP was rude and disrespectful enough to delete the question, here a rough outline:

A large dataset is received with a variable CONTINENT that should only have "North America", "South America" or "Europe" entered. But, as it goes, lots of other entries are encountered, like "europe" (simple spelling mistake) or "New York" (interesting geography). The method posted above is intended to provide a lasting and easily maintainable remedy.

ballardw
Super User

@superman1 wrote:

d


Rude to delete the question.

 

Afraid someone else in your class will get the right answer?

superman1
Fluorite | Level 6
no I figured it out a different way an didnt want others to spend time to try and figure it out.
SASKiwi
PROC Star

@superman1  - If you've found an answer you just mark your post as answered by you then others won't spend any more time on it.

mkeintz
PROC Star

@superman1:

 

As @SASKiwi comments, just post your own solution and mark the problem as solved.

 

Next time, be @super and satisfy others' curiosity.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 544 views
  • 2 likes
  • 5 in conversation