Help using Base SAS procedures

Reading text string (variable length)

Reply
New Contributor
Posts: 4

Reading text string (variable length)

I want to read this into two variables: make and model name. E.g. make=chevrolet, model_name=chevelle malibu

Not sure how to do this since the lengths vary.

I tried the code below. But I am left with quotes " in the text variable.

Filename carnames url 'http://www';

Data Carnames;

INFILE Carnames MISSOVER;

LENGTH n1 n2 n3 $40;

INPUT @2 n1 $ n2 $ n3 $;

Run;

"chevrolet chevelle malibu"

"buick skylark 320"

"plymouth satellite"

"amc rebel sst"

"ford torino"

"ford galaxie 500"

"chevrolet impala"

"plymouth fury iii"

"pontiac catalina"

"amc ambassador dpl"

"citroen ds-21 pallas"

"chevrolet chevelle concours (sw)"

"ford torino (sw)"

"plymouth satellite (sw)"

"amc rebel sst (sw)"

Super User
Posts: 9,681

Re: Reading text string (variable length)

Easy.

Data Carnames;
INFILE cards dsd pad MISSOVER length=len;
INPUT x $varying200. len;
make=scan(dequote(x),1);
model=substr(left(dequote(x)),find(dequote(x),' ')+1);
cards;
"chevrolet chevelle malibu"
"buick skylark 320"
"plymouth satellite"
"amc rebel sst"
"ford torino"
"ford galaxie 500"
"chevrolet impala"
"plymouth fury iii"
"pontiac catalina"
"amc ambassador dpl"
"citroen ds-21 pallas"
"chevrolet chevelle concours (sw)"
"ford torino (sw)"
"plymouth satellite (sw)"
"amc rebel sst (sw)"
;
Run;

Ksharp

Respected Advisor
Posts: 3,124

Re: Reading text string (variable length)

Slightly simplified version Smiley Happy;

Data Carnames;

INFILE cards ;

INPUT ;

make=scan(dequote(_infile_),1);

model=substr(dequote(_infile_),length(make)+1);

cards;

"chevrolet chevelle malibu"

"buick skylark 320"

"plymouth satellite"

"amc rebel sst"

"ford torino"

"ford galaxie 500"

"chevrolet impala"

"plymouth fury iii"

"pontiac catalina"

"amc ambassador dpl"

"citroen ds-21 pallas"

"chevrolet chevelle concours (sw)"

"ford torino (sw)"

"plymouth satellite (sw)"

"amc rebel sst (sw)"

;

Run;

Haikuo

Frequent Contributor
Posts: 106

Re: Reading text string (variable length)

Devil's advocate here: what about "Alfa Romeo Giulia" ?

Super User
Posts: 9,681

Re: Reading text string (variable length)

I can't understand what you mean . What output do you expect ?

Frequent Contributor
Posts: 106

Re: Reading text string (variable length)

Sorry Ksharp, seems I was too scarce with my comment. What I'm trying to convey:

  • ^'model' and 'make' are difficult to parse out when the only delimiter between them are blanks
    • examples: 'alfa romeo giulia', 'sunbeam talbot alpine', 'steyr puch haflinger' where 'model' consists of the
      first two words
    • examples with more than two words designating 'model' are imaginable though I can't name one
      off the top of my head
  • values requiring special treatment should not be hard coded in programs but stored in tables much rather
    • i.e. sequences of "if lowcase(scan(_infile_,1)) = 'alfa' then ....'  statements in the program source
      tend to become a maintenance nightmare

Still unclear?

Robert

Super User
Posts: 9,681

Re: Reading text string (variable length)

OK.

  • It is up to OP. Since OP only show us taking the first word as make, we will not be able to image so far situation.
  • But If I were you ,I will use a Hash table to maintain these key words and get what you want.
Frequent Contributor
Posts: 106

Re: Reading text string (variable length)

With regards to the hash table approach this begs the question whether (and I am talking about every day's reuse of the data here) the hash table can be hardened to a storage medium such that keyed access to a single item is possible without reloading the entire table first. Imagine a multimillion rows table with an index on one or more variables. Which one is more effective: accessing the desired row with a keyed access directly from the storage medium or loading the hash table into memory first and then doing a find on it? I guess it boils down to a "it depends".

Note that I have close to zero experience with hashes but from what I've read (code samples in SAS-L mainly) I'm under the impression that they aren't the swiss army knife for every data retrieval problem.

Super User
Super User
Posts: 6,500

Re: Reading text string (variable length)

Use the $QUOTE. informat to eliminate the quotes around the value in the input file.  Use the TRUNCOVER rather than the MISSOVER option to avoid problems when reading past the end of the physical line. With missover the value becomes missing when you read past the end, but with truncover you get the values that are there instead.

Read the line into a character variable and then parse out the model and make from there.  This will also let you code in a list of models that require two words like ALFA ROMEO.

I also added UPCASE() to make it easier to test for specific values.

data carnames;

  infile cards truncover ;

  length line $100 make model $50 ;

  input line $quote100. ;

  line = upcase(line);

  model = scan(line,1,' ');

  if model in ('ALFA') then model=catx(' ',model,scan(line,2,' '));

  make = substr(line,length(model)+2);

  put ( line model make ) (=/);

cards;

"chevrolet chevelle malibu"

"buick skylark 320"

"plymouth satellite"

"amc rebel sst"

"ford torino"

"ford galaxie 500"

"chevrolet impala"

"plymouth fury iii"

"pontiac catalina"

"amc ambassador dpl"

"citroen ds-21 pallas"

"chevrolet chevelle concours (sw)"

"ford torino (sw)"

"plymouth satellite (sw)"

"amc rebel sst (sw)"

"Alfa Romeo Giulia"

run;

Frequent Contributor
Posts: 106

Re: Reading text string (variable length)

Well, it's no doubt doable. That wasn't the goal of the question. Rather to raise the awareness that the problem as it was presented is difficult to solve without some kind of lookup table containing manufacturer's names. Although I can't name one off the top of my head, mergers may have created names consisting of three or more parts.

PROC Star
Posts: 1,093

Re: Reading text string (variable length)

I agree. I think your only option is to collapse the data by the first three or four strings, and from that derive a table of "make" values. After extracting that, the remainder becomes the "model". There are just too many odd possibilities to automate the detection.

However,one optimization might be to determine if there are only two words, and if there are assume that they are make and model. The makes extracted from that can then be used against the remainder, and any matches can be handled in an automated fashion. This should leave a very small residue for manual evaluation.

I've dealt with the same problem with municipality and province, and there's really no count-based solution.

Tom

Ask a Question
Discussion stats
  • 10 replies
  • 365 views
  • 0 likes
  • 6 in conversation