BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
twildone
Pyrite | Level 9

Hi....I am trying to import a text file containg product names where the delimiter is a comma and Text Qualifier is the double quotation mark. Whenever the product name has double quotation marks inside the the actual product name, the Text Qualifer remains around the product name in the dataset. Here is an example what I get:

 

"New Era Combination "A" Biochemic Tissue Salts"

 

and would like to have:

 

New Era Combination "A" Biochemic Tissue Salts

 

Is there an option that need to include to allows this? Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions
MikeZdeb
Rhodochrosite | Level 12

Hi, in lieu if trying the QUOTE informat or DEQUOTE function, you could try ...

  

data x;
infile datalines dsd;
input x stuff :$50. y;
stuff = substr(stuff,2,length(stuff)-2);
datalines;
34,"New Era Combination "A" Biochemic Tissue Salts",99
35,"Old Era "B" Biochemic Tissue Peppers",99
;

 

data set X ...

Obs     x                        stuff                          y

 1     34    New Era Combination "A" Biochemic Tissue Salts    99
 2     35    Old Era "B" Biochemic Tissue Peppers              99

 

ps  In looking at DEQUOTE in online help, there are no examples of what happens with double quote within double quotes.  If you try this, DEQUOTE did nothing to the input record ... ...

 

data x;
input;
z = dequote(_infile_);
datalines;
34,"New Era Combination "A" Biochemic Tissue Salts",99
35,"Old Era "B"  Biochemic Tissue Peppers",99
;

 

Obs                              z

 1     34,"New Era Combination "A" Biochemic Tissue Salts",99
 2     35,"Old Era "B"  Biochemic Tissue Peppers",99

 

View solution in original post

9 REPLIES 9
PaigeMiller
Diamond | Level 26

In a SAS data step, use the DEQUOTE function

--
Paige Miller
twildone
Pyrite | Level 9

Hi PaigeMiller......I did give the DEQUOTE function a try.....it did remove the double quotation marks but it remove part of the product name....for the example I had submitted.....the product name ended up being New Era Combination

PaigeMiller
Diamond | Level 26

Please be specific. What is the exact code that you tried? What is the result that isn't what you want?

--
Paige Miller
MikeZdeb
Rhodochrosite | Level 12

Hi, in lieu if trying the QUOTE informat or DEQUOTE function, you could try ...

  

data x;
infile datalines dsd;
input x stuff :$50. y;
stuff = substr(stuff,2,length(stuff)-2);
datalines;
34,"New Era Combination "A" Biochemic Tissue Salts",99
35,"Old Era "B" Biochemic Tissue Peppers",99
;

 

data set X ...

Obs     x                        stuff                          y

 1     34    New Era Combination "A" Biochemic Tissue Salts    99
 2     35    Old Era "B" Biochemic Tissue Peppers              99

 

ps  In looking at DEQUOTE in online help, there are no examples of what happens with double quote within double quotes.  If you try this, DEQUOTE did nothing to the input record ... ...

 

data x;
input;
z = dequote(_infile_);
datalines;
34,"New Era Combination "A" Biochemic Tissue Salts",99
35,"Old Era "B"  Biochemic Tissue Peppers",99
;

 

Obs                              z

 1     34,"New Era Combination "A" Biochemic Tissue Salts",99
 2     35,"Old Era "B"  Biochemic Tissue Peppers",99

 

twildone
Pyrite | Level 9

Hi MikeZdeb.....Thanks for your help and suggestions. I did try it but it wouldn't work for me. But I did figure out a way to get what I wanted although maybe not the most efficeint way to achieve the end results. Here is what I ended up doing:

 

DATA PRODUCTS2;
 SET Products;
  NAME1 = SUBSTR(PRODUCT_NAME,1,1);
  NAME2 = SUBSTR(PRODUCT_NAME,2,LENGTH(PRODUCT_NAME)-1);
  NAME3 = SUBSTR(PRODUCT_NAME,1,LENGTH(PRODUCT_NAME)-1);
  NAME4 = SUBSTRN(PRODUCT_NAME,MAX(1,LENGTH(PRODUCT_NAME)-0),1);
  NAME5 = SUBSTR(NAME2,1,LENGTH(NAME2)-1);
RUN;

DATA PRODUCTS3;
 SET PRODUCTS2;

IF NAME1 ^= '"' AND NAME4 = '"' THEN
 NEW_NAME = STRIP(NAME3)||STRIP(NAME4);
ELSE IF NAME1 = '"' AND NAME4 ^= '"' THEN
 NEW_NAME = STRIP(NAME1)||STRIP(NAME2);
ELSE IF NAME1 = '"' AND NAME4 = '"' THEN
 NEW_NAME = STRIP(NAME5);
ELSE IF NAME1 ^= '"' AND NAME4 ^= '"' THEN
 NEW_NAME = PRODUCT_NAME;
ELSE NEW_NAME = ' ';
RUN;

MikeZdeb
Rhodochrosite | Level 12

Hi, curious, what did not work?  Here's a data step with your variable names and the output ...

 

data products;
infile datalines dsd;
input x product_name :$50. y;
product_name = substr(product_name,2,length(product_name)-2);
datalines;
34,"New Era Combination "A" Biochemic Tissue Salts",99
35,"Old Era "B" Biochemic Tissue Peppers",99
;

 

Obs product_name                                    

1   New Era Combination "A" Biochemic Tissue Salts 
2   Old Era "B" Biochemic Tissue Peppers           

 

Using that same data and your method I get ...

 

Obs               NEW_NAME

1   New Era Combination "A" Biochemic Tissue Salts
2   Old Era "B" Biochemic Tissue Peppers

 

That looks the same to me.  Can you give me some examples where my method did not work?

 

Also, I rewrote some of your code so you can see that you can use CAT functions ... and, these statements ...

 

NEW_NAME = STRIP(NAME3)||STRIP(NAME4);

NEW_NAME = STRIP(NAME5);

 

are the same as ...

 

NEW_NAME = STRIP(NAME3)||NAME4;

NEW_NAME = NAME5;

 

since the trailing blanks on NAME4 and on NAME5 are still there in the new variable NEW_NAME.  If NEW_NAME has a length of 50, then the missing portion of the value is always padded on the right with blanks.  Your statements are trying to give NEW_NAME a different length for different observations, not possible.

 

CAT functions (and two data steps combined into one) ...

 

data products2;

length new_name $50.;
set products;

name1 = substr(product_name,1,1);
name2 = substr(product_name,2,length(product_name)-1);
name3 = substr(product_name,1,length(product_name)-1);
name4 = substrn(product_name,max(1,length(product_name)-0),1);
name5 = substr(name2,1,length(name2)-1);

if name1 ^= '"' and name4 = '"'  then new_name = cats(name3,name4);
else
if name1 = '"' and name4 ^= '"'  then new_name = cats(name1,name2);
else
if name1 = '"' and name4 = '"'   then new_name = name5;
else
if name1 ^= '"' and name4 ^= '"' then new_name = product_name;
else call missing(new_name);

run;

twildone
Pyrite | Level 9

Hi MikeZdeb.....When I tried and ran your code, it actually did remove the double quoatation marks that were the first and last character but it also removed the first and last digit/letter to those product names that were not in double quotation marks. Here is the data step that I used and the output that I did get;

 

DATA PRODUCTS;
 SET Products;
new_name = substr(product_name,2,length(product_name)-2);
run;

 

PRODUCT_NAME new_name
Red Bull Energy Drink                 ed Bull Energy Drin
Vitamin D 400 Iu                          itamin D 400 I
Echinacea And Goldenseal        chinacea And Goldensea
Flax Seed Oil 1000 Mg               lax Seed Oil 1000 M
Vitamin D                                    itamin
Allicin Rich Garlic                        llicin Rich Garli
Glucosamine Sulfate                  lucosamine Sulfat
Vitamin C                                   itamin
Glucosamine Chondroitin Complex lucosamine Chondroitin Comple
Vitamin B6                                 itamin B
Vitamin B2                                 itamin B
Vitamin C Timed Release          itamin C Timed Releas
Vitamin C 1000 Mg Timed Release itamin C 1000 Mg Timed Releas
Milk Thistle                                ilk Thistl
Now Wheat Germ Oil Softgels    ow Wheat Germ Oil Softgel
"New Era Combination "S" Biochemic Tissue Salts" New Era Combination "S" Biochemic Tissue Salts
"New Era Combination "I" Biochemic Tissue Salts" New Era Combination "I" Biochemic Tissue Salts
"New Era Combination "J" Biochemic Tissue Salts" New Era Combination "J" Biochemic Tissue Salts
"New Era Combination "K" Biochemic Tissue Salts" New Era Combination "K" Biochemic Tissue Salts
"New Era Combination "L" Biochemic Tissue Salts" New Era Combination "L" Biochemic Tissue Salts

 

MikeZdeb
Rhodochrosite | Level 12

Hi, try this that first checks for the presence of the outside quotes before using the SUBSTR function. Worked fine with the data you posted.  Also, the comments I made in an earlier posting about use of STRIP not being of use in cewrtain situations only applies to its removal of trailing blanks, not leading blanks.  If it's just leading blanks that are the concern, the LEFT function could also be used.

 

data y;
set x;
if char(stuff,1) eq '"' and char(stuff,length(stuff)) eq '"' then
stuff = substr(stuff,2,length(stuff)-2);
run;  

 

data set Y ...

Obs stuff

1   Red Bull Energy Drink
2   Vitamin D 400 Iu
3   Echinacea And Goldenseal
4   Flax Seed Oil 1000 Mg
5   Vitamin D
6   Allicin Rich Garlic
7   Glucosamine Sulfate
8   Vitamin C
9   Glucosamine Chondroitin Complex
10  Vitamin B6
11  Vitamin B2
12  Vitamin C Timed Release
13  Vitamin C 1000 Mg Timed Release
14  Milk Thistle
15  Now Wheat Germ Oil Softgels
16  New Era Combination "S" Biochemic Tissue Salts
17  New Era Combination "I" Biochemic Tissue Salts
18  New Era Combination "J" Biochemic Tissue Salts
19  New Era Combination "K" Biochemic Tissue Salts
20  New Era Combination "L" Biochemic Tissue Salts

twildone
Pyrite | Level 9

Hi MikeZdeb...Thanks once again for your help. I ran both codes and got the exact same results but had to make a slight modification to use a IF/THEN/ELSE as I was only getting output for the new_name to those products that were in double quptation marks and had increase the length of the new_name variable. Thanks once again!!!

 

DATA NHPPRODUCTS2;
LENGTH new_name $180.;
SET Products;
 IF char(PRODUCT_NAME,1) eq '"' AND char(PRODUCT_NAME,length(PRODUCT_NAME)) eq '"' THEN
  new_name = substr(PRODUCT_NAME,2,length(PRODUCT_NAME)-2);
 ELSE new_name = PRODUCT_NAME;
RUN;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1073 views
  • 1 like
  • 3 in conversation