BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SASbeginner20
Calcite | Level 5

Hi everyone,

I have attached a file of my data. I need some help on how to get frequencies of phrases that are separated by double spaces.

From the attached spreadsheet with diagnoses, the observations can consist of multiple phrases separated by double spaces.

Face: triangular  Palpebral fissures: short  Ears: low set both  Mouth: microstomia  Mouth: palate high-arched


I would like to find out how many people in the dataset have “Ears: low set both .” By default, SAS will return a frequency of  “Face: triangular  Palpebral fissures: short  Ears: low set both  Mouth: microstomia  Mouth: palate high-arched” rather than scan through and see how many people have just “Ears: low set both.”


Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

I don't see a new file, but in general trying to read a file using two different delimiters is a pain.

You should be able to use read it using both comma and space as a delimiter, but the embedded commas will cause trouble with the & modifier as it will see two delimiters.  Also have to add something to deal with quotes.

data want ;

* infile 'diagnoses.csv' truncover firstobs=2;

  infile cards dlm=' ,' truncover ;

  line+1;

  length id $10 gender $1 line 8 loc 8 word next $200.;

  input id gender word @;

  word = compress(word,'"');

  do until(next=' ');

    input next & @;

    next = compress(next,'"');

    if not index(next,':') then word=catx(', ',word,next);

    else do;

      if word ne ' ' then do; loc=sum(loc,1); output; end;

      word=next;

    end;

  end;

  if word ne ' ' then do; loc=sum(loc,1); output; end;

  drop next;

cards;

12345,M,Back: lordosis  Face: micrognathia

9876,F,"Abdomen: asplenism, Intestinal malrotation  Face: micrognathia"

run;

proc print;

run;

You probably would get better results to read the words as one long string and use a DO loop to scan the string for double spaces.  The CALL SCAN() function should help with this.

View solution in original post

9 REPLIES 9
ballardw
Super User

You may need to provide an example of what you want the output to look like. I can come up with several interpretations of "frequencies ... separated by double spaces".

I think you want to read the file as a single line of text and ignore the delimiter, use the _infile_= <some variable name> to create a variable with the line of text. Then use one of the string search functions such as index to look for the specific phrase. You may want to use upcase as well incase the capitalization isn't consistent.

Tom
Super User Tom
Super User

If your data is in a file then use the & modifier on the INPUT statement.

data want ;

  length line 8 loc 8 word $200.;

  infile 'diagnoses.csv' truncover firstobs=2;

  line+1;

  do loc=1 by 1 until(word=' ');

    input word & @;

    if word ne ' ' then output;

  end;

run;

proc freq;

tables word;

run;

SASbeginner20
Calcite | Level 5

Thank you so much Tom! This is amazing.

This is part of the frequency output.

"Back: lordosis10.50
"Face: micrognathia11.00
Abdomen: ascites11.49
Abdomen: asplenism, Intestinal malrotation"11.99

Do you know why there are the quotes in the beginning of those phrases, and quotes at the end of the 4th phrase? There are no such quotes in the dataset. There are more occurrences of these as well later on.

Peter_C
Rhodochrosite | Level 12

remove those quotes by adding just a little to Tom's code

data want ;

  length line 8 loc 8 word $200.;

  infile 'diagnoses.csv' truncover firstobs=2;

  line+1;

* remove quotes ;

INPUT @ ;

_INFILE_ = TRANSLATE(_INFILE_,' ', '"' ) ;

  do loc=1 by 1 until(word=' ');

    input word & @;

    if word ne ' ' then output;

  end;

run;

proc freq;

tables word;

run;

Tom
Super User Tom
Super User

Quotes are in the data, but when you read it before as a CSV file (or in SAS with the DSD option on the INFILE statement) the quotes were automatically stripped.

Look at the DEQUOTE() function and apply it to the whole input line.

Add these two lines before the DO loop.

  input @ ;

  _infile_= dequote(_infile_);

SASbeginner20
Calcite | Level 5

Thanks!

I attached a new csv file that includes some member and ID information for each diagnoses. I am really unsure how to add them in using the input statements, as this command is not entirely familiar with me.

Tom
Super User Tom
Super User

I don't see a new file, but in general trying to read a file using two different delimiters is a pain.

You should be able to use read it using both comma and space as a delimiter, but the embedded commas will cause trouble with the & modifier as it will see two delimiters.  Also have to add something to deal with quotes.

data want ;

* infile 'diagnoses.csv' truncover firstobs=2;

  infile cards dlm=' ,' truncover ;

  line+1;

  length id $10 gender $1 line 8 loc 8 word next $200.;

  input id gender word @;

  word = compress(word,'"');

  do until(next=' ');

    input next & @;

    next = compress(next,'"');

    if not index(next,':') then word=catx(', ',word,next);

    else do;

      if word ne ' ' then do; loc=sum(loc,1); output; end;

      word=next;

    end;

  end;

  if word ne ' ' then do; loc=sum(loc,1); output; end;

  drop next;

cards;

12345,M,Back: lordosis  Face: micrognathia

9876,F,"Abdomen: asplenism, Intestinal malrotation  Face: micrognathia"

run;

proc print;

run;

You probably would get better results to read the words as one long string and use a DO loop to scan the string for double spaces.  The CALL SCAN() function should help with this.

SASbeginner20
Calcite | Level 5

I actually just replaced the file with the same name "diagnoses.csv"

Ksharp
Super User

Another one is DLMSTR= double blanks .

data temp;
 infile 'c:\temp\diagnoses.csv' dlmstr='  ' ;
 input x : $100. @@;
run;

Xia Keshan

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1211 views
  • 3 likes
  • 5 in conversation