Hi....I am trying to import a text file containg product names where the delimiter is a comma and Text Qualifier is the double quotation mark. Whenever the product name has double quotation marks inside the the actual product name, the Text Qualifer remains around the product name in the dataset. Here is an example what I get:
"New Era Combination "A" Biochemic Tissue Salts"
and would like to have:
New Era Combination "A" Biochemic Tissue Salts
Is there an option that need to include to allows this? Thanks.
Hi, in lieu if trying the QUOTE informat or DEQUOTE function, you could try ...
data x;
infile datalines dsd;
input x stuff :$50. y;
stuff = substr(stuff,2,length(stuff)-2);
datalines;
34,"New Era Combination "A" Biochemic Tissue Salts",99
35,"Old Era "B" Biochemic Tissue Peppers",99
;
data set X ...
Obs x stuff y
1 34 New Era Combination "A" Biochemic Tissue Salts 99
2 35 Old Era "B" Biochemic Tissue Peppers 99
ps In looking at DEQUOTE in online help, there are no examples of what happens with double quote within double quotes. If you try this, DEQUOTE did nothing to the input record ... ...
data x;
input;
z = dequote(_infile_);
datalines;
34,"New Era Combination "A" Biochemic Tissue Salts",99
35,"Old Era "B" Biochemic Tissue Peppers",99
;
Obs z
1 34,"New Era Combination "A" Biochemic Tissue Salts",99
2 35,"Old Era "B" Biochemic Tissue Peppers",99
In a SAS data step, use the DEQUOTE function
Hi PaigeMiller......I did give the DEQUOTE function a try.....it did remove the double quotation marks but it remove part of the product name....for the example I had submitted.....the product name ended up being New Era Combination
Please be specific. What is the exact code that you tried? What is the result that isn't what you want?
Hi, in lieu if trying the QUOTE informat or DEQUOTE function, you could try ...
data x;
infile datalines dsd;
input x stuff :$50. y;
stuff = substr(stuff,2,length(stuff)-2);
datalines;
34,"New Era Combination "A" Biochemic Tissue Salts",99
35,"Old Era "B" Biochemic Tissue Peppers",99
;
data set X ...
Obs x stuff y
1 34 New Era Combination "A" Biochemic Tissue Salts 99
2 35 Old Era "B" Biochemic Tissue Peppers 99
ps In looking at DEQUOTE in online help, there are no examples of what happens with double quote within double quotes. If you try this, DEQUOTE did nothing to the input record ... ...
data x;
input;
z = dequote(_infile_);
datalines;
34,"New Era Combination "A" Biochemic Tissue Salts",99
35,"Old Era "B" Biochemic Tissue Peppers",99
;
Obs z
1 34,"New Era Combination "A" Biochemic Tissue Salts",99
2 35,"Old Era "B" Biochemic Tissue Peppers",99
Hi MikeZdeb.....Thanks for your help and suggestions. I did try it but it wouldn't work for me. But I did figure out a way to get what I wanted although maybe not the most efficeint way to achieve the end results. Here is what I ended up doing:
DATA PRODUCTS2;
SET Products;
NAME1 = SUBSTR(PRODUCT_NAME,1,1);
NAME2 = SUBSTR(PRODUCT_NAME,2,LENGTH(PRODUCT_NAME)-1);
NAME3 = SUBSTR(PRODUCT_NAME,1,LENGTH(PRODUCT_NAME)-1);
NAME4 = SUBSTRN(PRODUCT_NAME,MAX(1,LENGTH(PRODUCT_NAME)-0),1);
NAME5 = SUBSTR(NAME2,1,LENGTH(NAME2)-1);
RUN;
DATA PRODUCTS3;
SET PRODUCTS2;
IF NAME1 ^= '"' AND NAME4 = '"' THEN
NEW_NAME = STRIP(NAME3)||STRIP(NAME4);
ELSE IF NAME1 = '"' AND NAME4 ^= '"' THEN
NEW_NAME = STRIP(NAME1)||STRIP(NAME2);
ELSE IF NAME1 = '"' AND NAME4 = '"' THEN
NEW_NAME = STRIP(NAME5);
ELSE IF NAME1 ^= '"' AND NAME4 ^= '"' THEN
NEW_NAME = PRODUCT_NAME;
ELSE NEW_NAME = ' ';
RUN;
Hi, curious, what did not work? Here's a data step with your variable names and the output ...
data products;
infile datalines dsd;
input x product_name :$50. y;
product_name = substr(product_name,2,length(product_name)-2);
datalines;
34,"New Era Combination "A" Biochemic Tissue Salts",99
35,"Old Era "B" Biochemic Tissue Peppers",99
;
Obs product_name
1 New Era Combination "A" Biochemic Tissue Salts
2 Old Era "B" Biochemic Tissue Peppers
Using that same data and your method I get ...
Obs NEW_NAME
1 New Era Combination "A" Biochemic Tissue Salts
2 Old Era "B" Biochemic Tissue Peppers
That looks the same to me. Can you give me some examples where my method did not work?
Also, I rewrote some of your code so you can see that you can use CAT functions ... and, these statements ...
NEW_NAME = STRIP(NAME3)||STRIP(NAME4);
NEW_NAME = STRIP(NAME5);
are the same as ...
NEW_NAME = STRIP(NAME3)||NAME4;
NEW_NAME = NAME5;
since the trailing blanks on NAME4 and on NAME5 are still there in the new variable NEW_NAME. If NEW_NAME has a length of 50, then the missing portion of the value is always padded on the right with blanks. Your statements are trying to give NEW_NAME a different length for different observations, not possible.
CAT functions (and two data steps combined into one) ...
data products2;
length new_name $50.;
set products;
name1 = substr(product_name,1,1);
name2 = substr(product_name,2,length(product_name)-1);
name3 = substr(product_name,1,length(product_name)-1);
name4 = substrn(product_name,max(1,length(product_name)-0),1);
name5 = substr(name2,1,length(name2)-1);
if name1 ^= '"' and name4 = '"' then new_name = cats(name3,name4);
else
if name1 = '"' and name4 ^= '"' then new_name = cats(name1,name2);
else
if name1 = '"' and name4 = '"' then new_name = name5;
else
if name1 ^= '"' and name4 ^= '"' then new_name = product_name;
else call missing(new_name);
run;
Hi MikeZdeb.....When I tried and ran your code, it actually did remove the double quoatation marks that were the first and last character but it also removed the first and last digit/letter to those product names that were not in double quotation marks. Here is the data step that I used and the output that I did get;
DATA PRODUCTS;
SET Products;
new_name = substr(product_name,2,length(product_name)-2);
run;
PRODUCT_NAME new_name
Red Bull Energy Drink ed Bull Energy Drin
Vitamin D 400 Iu itamin D 400 I
Echinacea And Goldenseal chinacea And Goldensea
Flax Seed Oil 1000 Mg lax Seed Oil 1000 M
Vitamin D itamin
Allicin Rich Garlic llicin Rich Garli
Glucosamine Sulfate lucosamine Sulfat
Vitamin C itamin
Glucosamine Chondroitin Complex lucosamine Chondroitin Comple
Vitamin B6 itamin B
Vitamin B2 itamin B
Vitamin C Timed Release itamin C Timed Releas
Vitamin C 1000 Mg Timed Release itamin C 1000 Mg Timed Releas
Milk Thistle ilk Thistl
Now Wheat Germ Oil Softgels ow Wheat Germ Oil Softgel
"New Era Combination "S" Biochemic Tissue Salts" New Era Combination "S" Biochemic Tissue Salts
"New Era Combination "I" Biochemic Tissue Salts" New Era Combination "I" Biochemic Tissue Salts
"New Era Combination "J" Biochemic Tissue Salts" New Era Combination "J" Biochemic Tissue Salts
"New Era Combination "K" Biochemic Tissue Salts" New Era Combination "K" Biochemic Tissue Salts
"New Era Combination "L" Biochemic Tissue Salts" New Era Combination "L" Biochemic Tissue Salts
Hi, try this that first checks for the presence of the outside quotes before using the SUBSTR function. Worked fine with the data you posted. Also, the comments I made in an earlier posting about use of STRIP not being of use in cewrtain situations only applies to its removal of trailing blanks, not leading blanks. If it's just leading blanks that are the concern, the LEFT function could also be used.
data y;
set x;
if char(stuff,1) eq '"' and char(stuff,length(stuff)) eq '"' then
stuff = substr(stuff,2,length(stuff)-2);
run;
data set Y ...
Obs stuff
1 Red Bull Energy Drink
2 Vitamin D 400 Iu
3 Echinacea And Goldenseal
4 Flax Seed Oil 1000 Mg
5 Vitamin D
6 Allicin Rich Garlic
7 Glucosamine Sulfate
8 Vitamin C
9 Glucosamine Chondroitin Complex
10 Vitamin B6
11 Vitamin B2
12 Vitamin C Timed Release
13 Vitamin C 1000 Mg Timed Release
14 Milk Thistle
15 Now Wheat Germ Oil Softgels
16 New Era Combination "S" Biochemic Tissue Salts
17 New Era Combination "I" Biochemic Tissue Salts
18 New Era Combination "J" Biochemic Tissue Salts
19 New Era Combination "K" Biochemic Tissue Salts
20 New Era Combination "L" Biochemic Tissue Salts
Hi MikeZdeb...Thanks once again for your help. I ran both codes and got the exact same results but had to make a slight modification to use a IF/THEN/ELSE as I was only getting output for the new_name to those products that were in double quptation marks and had increase the length of the new_name variable. Thanks once again!!!
DATA NHPPRODUCTS2;
LENGTH new_name $180.;
SET Products;
IF char(PRODUCT_NAME,1) eq '"' AND char(PRODUCT_NAME,length(PRODUCT_NAME)) eq '"' THEN
new_name = substr(PRODUCT_NAME,2,length(PRODUCT_NAME)-2);
ELSE new_name = PRODUCT_NAME;
RUN;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.