Help using Base SAS procedures

Getting frequencies of phrases separated by double spaces

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 15
Accepted Solution

Getting frequencies of phrases separated by double spaces

Hi everyone,

I have attached a file of my data. I need some help on how to get frequencies of phrases that are separated by double spaces.

From the attached spreadsheet with diagnoses, the observations can consist of multiple phrases separated by double spaces.

Face: triangular  Palpebral fissures: short  Ears: low set both  Mouth: microstomia  Mouth: palate high-arched


I would like to find out how many people in the dataset have “Ears: low set both .” By default, SAS will return a frequency of  “Face: triangular  Palpebral fissures: short  Ears: low set both  Mouth: microstomia  Mouth: palate high-arched” rather than scan through and see how many people have just “Ears: low set both.”


Thanks!


Accepted Solutions
Solution
‎08-12-2014 05:04 PM
Super User
Super User
Posts: 7,039

Re: Getting frequencies of phrases separated by double spaces

Posted in reply to SASbeginner20

I don't see a new file, but in general trying to read a file using two different delimiters is a pain.

You should be able to use read it using both comma and space as a delimiter, but the embedded commas will cause trouble with the & modifier as it will see two delimiters.  Also have to add something to deal with quotes.

data want ;

* infile 'diagnoses.csv' truncover firstobs=2;

  infile cards dlm=' ,' truncover ;

  line+1;

  length id $10 gender $1 line 8 loc 8 word next $200.;

  input id gender word @;

  word = compress(word,'"');

  do until(next=' ');

    input next & @;

    next = compress(next,'"');

    if not index(next,':') then word=catx(', ',word,next);

    else do;

      if word ne ' ' then do; loc=sum(loc,1); output; end;

      word=next;

    end;

  end;

  if word ne ' ' then do; loc=sum(loc,1); output; end;

  drop next;

cards;

12345,M,Back: lordosis  Face: micrognathia

9876,F,"Abdomen: asplenism, Intestinal malrotation  Face: micrognathia"

run;

proc print;

run;

You probably would get better results to read the words as one long string and use a DO loop to scan the string for double spaces.  The CALL SCAN() function should help with this.

View solution in original post


All Replies
Super User
Posts: 11,343

Re: Getting frequencies of phrases separated by double spaces

Posted in reply to SASbeginner20

You may need to provide an example of what you want the output to look like. I can come up with several interpretations of "frequencies ... separated by double spaces".

I think you want to read the file as a single line of text and ignore the delimiter, use the _infile_= <some variable name> to create a variable with the line of text. Then use one of the string search functions such as index to look for the specific phrase. You may want to use upcase as well incase the capitalization isn't consistent.

Super User
Super User
Posts: 7,039

Re: Getting frequencies of phrases separated by double spaces

Posted in reply to SASbeginner20

If your data is in a file then use the & modifier on the INPUT statement.

data want ;

  length line 8 loc 8 word $200.;

  infile 'diagnoses.csv' truncover firstobs=2;

  line+1;

  do loc=1 by 1 until(word=' ');

    input word & @;

    if word ne ' ' then output;

  end;

run;

proc freq;

tables word;

run;

Occasional Contributor
Posts: 15

Re: Getting frequencies of phrases separated by double spaces

Thank you so much Tom! This is amazing.

This is part of the frequency output.

"Back: lordosis10.50
"Face: micrognathia11.00
Abdomen: ascites11.49
Abdomen: asplenism, Intestinal malrotation"11.99

Do you know why there are the quotes in the beginning of those phrases, and quotes at the end of the 4th phrase? There are no such quotes in the dataset. There are more occurrences of these as well later on.

Valued Guide
Posts: 2,177

Re: Getting frequencies of phrases separated by double spaces

Posted in reply to SASbeginner20

remove those quotes by adding just a little to Tom's code

data want ;

  length line 8 loc 8 word $200.;

  infile 'diagnoses.csv' truncover firstobs=2;

  line+1;

* remove quotes ;

INPUT @ ;

_INFILE_ = TRANSLATE(_INFILE_,' ', '"' ) ;

  do loc=1 by 1 until(word=' ');

    input word & @;

    if word ne ' ' then output;

  end;

run;

proc freq;

tables word;

run;

Super User
Super User
Posts: 7,039

Re: Getting frequencies of phrases separated by double spaces

Posted in reply to SASbeginner20

Quotes are in the data, but when you read it before as a CSV file (or in SAS with the DSD option on the INFILE statement) the quotes were automatically stripped.

Look at the DEQUOTE() function and apply it to the whole input line.

Add these two lines before the DO loop.

  input @ ;

  _infile_= dequote(_infile_);

Occasional Contributor
Posts: 15

Re: Getting frequencies of phrases separated by double spaces

Thanks!

I attached a new csv file that includes some member and ID information for each diagnoses. I am really unsure how to add them in using the input statements, as this command is not entirely familiar with me.

Solution
‎08-12-2014 05:04 PM
Super User
Super User
Posts: 7,039

Re: Getting frequencies of phrases separated by double spaces

Posted in reply to SASbeginner20

I don't see a new file, but in general trying to read a file using two different delimiters is a pain.

You should be able to use read it using both comma and space as a delimiter, but the embedded commas will cause trouble with the & modifier as it will see two delimiters.  Also have to add something to deal with quotes.

data want ;

* infile 'diagnoses.csv' truncover firstobs=2;

  infile cards dlm=' ,' truncover ;

  line+1;

  length id $10 gender $1 line 8 loc 8 word next $200.;

  input id gender word @;

  word = compress(word,'"');

  do until(next=' ');

    input next & @;

    next = compress(next,'"');

    if not index(next,':') then word=catx(', ',word,next);

    else do;

      if word ne ' ' then do; loc=sum(loc,1); output; end;

      word=next;

    end;

  end;

  if word ne ' ' then do; loc=sum(loc,1); output; end;

  drop next;

cards;

12345,M,Back: lordosis  Face: micrognathia

9876,F,"Abdomen: asplenism, Intestinal malrotation  Face: micrognathia"

run;

proc print;

run;

You probably would get better results to read the words as one long string and use a DO loop to scan the string for double spaces.  The CALL SCAN() function should help with this.

Occasional Contributor
Posts: 15

Re: Getting frequencies of phrases separated by double spaces

I actually just replaced the file with the same name "diagnoses.csv"

Super User
Posts: 10,023

Re: Getting frequencies of phrases separated by double spaces

Posted in reply to SASbeginner20

Another one is DLMSTR= double blanks .

data temp;
 infile 'c:\temp\diagnoses.csv' dlmstr='  ' ;
 input x : $100. @@;
run;

Xia Keshan

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 370 views
  • 3 likes
  • 5 in conversation