Help using Base SAS procedures

Messy Raw Data

Reply
Occasional Contributor
Posts: 10

Messy Raw Data

Hi

I am new to SAS. Plz help me out in reading this raw data

John       $20,000 Microsoft

$30,000  George  Oracle

SAS       Daniel   $50,000

Sam       IBM      $60,000

Trusted Advisor
Posts: 1,615

Re: Messy Raw Data

Plz explain how we are supposed to read this data, what is the final result you'd like to get?

Occasional Contributor
Posts: 10

Re: Messy Raw Data

Thanks PaigeMiller

The final result should be

--------------------------------------------------------

Employee Name        Salary        Company

---------------------------------------------------------

John                          $20,000       Microsoft

George                      $30,000       Oracle

Daniel                        $50,000       SAS

Sam                           $60,000       IBM   

Super User
Posts: 17,829

Re: Messy Raw Data

Are they really out of order like that? How can tell what's a company name and what's a person?

Super Contributor
Posts: 543

Re: Messy Raw Data

Smiley Happy

Here is a piece of code that will get what you want, but it obviously needs to be more generalized:

data in;

input temp_name $ temp_salary $ temp_company $10.;

cards;

John       $20,000 Microsoft

$30,000  George  Oracle

SAS       Daniel   $50,000

Sam       IBM      $60,000

;

data want;

length company salary name $10.;

  set in; * your data set example;

  if substr(temp_name, 1, 1) = "$" then salary = temp_name;

  else if substr(temp_company, 1, 1) = "$" then salary = temp_company;

  else if substr(temp_salary, 1, 1) = "$" then salary = temp_salary;

  if scan(temp_name, 1, " ") in ("IBM", "SAS", "Oracle", "Microsoft") then company = temp_name;

  else if scan(temp_company, 1, " ") in ("IBM", "SAS", "Oracle", "Microsoft") then company = temp_company;

  else if scan(temp_salary, 1, " ") in ("IBM", "SAS", "Oracle", "Microsoft") then company = temp_salary;

  if substr(temp_name, 1, 1) ne "$" &

  scan(temp_name, 1, " ") not in ("IBM", "SAS", "Oracle", "Microsoft") then name = temp_name;

  else if substr(temp_company, 1, 1) ne "$" &

  scan(temp_company, 1, " ") not in ("IBM", "SAS", "Oracle", "Microsoft") then name = temp_company;

  else if substr(temp_salary, 1, 1) ne "$" &

  scan(temp_salary, 1, " ") not in ("IBM", "SAS", "Oracle", "Microsoft") then name = temp_salary;

run;

The experts will hopefully use this (poorly written) code and make it awesome!

Good luck!

Anca.

Super User
Posts: 10,500

Re: Messy Raw Data

Considering that I have had data with IBM spelled about 18 different ways, I'm wondering if we've actually seen the full scope of this question.

Ask a Question
Discussion stats
  • 5 replies
  • 215 views
  • 1 like
  • 5 in conversation