DATA Step, Macro, Functions and more

re: Importing Text file

Accepted Solution Solved
Reply
Regular Contributor
Posts: 222
Accepted Solution

re: Importing Text file

Hi....I am trying to import a text file containg product names where the delimiter is a comma and Text Qualifier is the double quotation mark. Whenever the product name has double quotation marks inside the the actual product name, the Text Qualifer remains around the product name in the dataset. Here is an example what I get:

 

"New Era Combination "A" Biochemic Tissue Salts"

 

and would like to have:

 

New Era Combination "A" Biochemic Tissue Salts

 

Is there an option that need to include to allows this? Thanks.


Accepted Solutions
Solution
‎10-21-2015 09:45 AM
Valued Guide
Posts: 765

Re: re: Importing Text file

[ Edited ]

Hi, in lieu if trying the QUOTE informat or DEQUOTE function, you could try ...

  

data x;
infile datalines dsd;
input x stuff :$50. y;
stuff = substr(stuff,2,length(stuff)-2);
datalines;
34,"New Era Combination "A" Biochemic Tissue Salts",99
35,"Old Era "B" Biochemic Tissue Peppers",99
;

 

data set X ...

Obs     x                        stuff                          y

 1     34    New Era Combination "A" Biochemic Tissue Salts    99
 2     35    Old Era "B" Biochemic Tissue Peppers              99

 

ps  In looking at DEQUOTE in online help, there are no examples of what happens with double quote within double quotes.  If you try this, DEQUOTE did nothing to the input record ... ...

 

data x;
input;
z = dequote(_infile_);
datalines;
34,"New Era Combination "A" Biochemic Tissue Salts",99
35,"Old Era "B"  Biochemic Tissue Peppers",99
;

 

Obs                              z

 1     34,"New Era Combination "A" Biochemic Tissue Salts",99
 2     35,"Old Era "B"  Biochemic Tissue Peppers",99

 

View solution in original post


All Replies
Trusted Advisor
Posts: 1,610

Re: re: Importing Text file

In a SAS data step, use the DEQUOTE function

Regular Contributor
Posts: 222

Re: re: Importing Text file

Hi PaigeMiller......I did give the DEQUOTE function a try.....it did remove the double quotation marks but it remove part of the product name....for the example I had submitted.....the product name ended up being New Era Combination

Trusted Advisor
Posts: 1,610

Re: re: Importing Text file

Please be specific. What is the exact code that you tried? What is the result that isn't what you want?

Solution
‎10-21-2015 09:45 AM
Valued Guide
Posts: 765

Re: re: Importing Text file

[ Edited ]

Hi, in lieu if trying the QUOTE informat or DEQUOTE function, you could try ...

  

data x;
infile datalines dsd;
input x stuff :$50. y;
stuff = substr(stuff,2,length(stuff)-2);
datalines;
34,"New Era Combination "A" Biochemic Tissue Salts",99
35,"Old Era "B" Biochemic Tissue Peppers",99
;

 

data set X ...

Obs     x                        stuff                          y

 1     34    New Era Combination "A" Biochemic Tissue Salts    99
 2     35    Old Era "B" Biochemic Tissue Peppers              99

 

ps  In looking at DEQUOTE in online help, there are no examples of what happens with double quote within double quotes.  If you try this, DEQUOTE did nothing to the input record ... ...

 

data x;
input;
z = dequote(_infile_);
datalines;
34,"New Era Combination "A" Biochemic Tissue Salts",99
35,"Old Era "B"  Biochemic Tissue Peppers",99
;

 

Obs                              z

 1     34,"New Era Combination "A" Biochemic Tissue Salts",99
 2     35,"Old Era "B"  Biochemic Tissue Peppers",99

 

Regular Contributor
Posts: 222

Re: re: Importing Text file

[ Edited ]

Hi MikeZdeb.....Thanks for your help and suggestions. I did try it but it wouldn't work for me. But I did figure out a way to get what I wanted although maybe not the most efficeint way to achieve the end results. Here is what I ended up doing:

 

DATA PRODUCTS2;
 SET Products;
  NAME1 = SUBSTR(PRODUCT_NAME,1,1);
  NAME2 = SUBSTR(PRODUCT_NAME,2,LENGTH(PRODUCT_NAME)-1);
  NAME3 = SUBSTR(PRODUCT_NAME,1,LENGTH(PRODUCT_NAME)-1);
  NAME4 = SUBSTRN(PRODUCT_NAME,MAX(1,LENGTH(PRODUCT_NAME)-0),1);
  NAME5 = SUBSTR(NAME2,1,LENGTH(NAME2)-1);
RUN;

DATA PRODUCTS3;
 SET PRODUCTS2;

IF NAME1 ^= '"' AND NAME4 = '"' THEN
 NEW_NAME = STRIP(NAME3)||STRIP(NAME4);
ELSE IF NAME1 = '"' AND NAME4 ^= '"' THEN
 NEW_NAME = STRIP(NAME1)||STRIP(NAME2);
ELSE IF NAME1 = '"' AND NAME4 = '"' THEN
 NEW_NAME = STRIP(NAME5);
ELSE IF NAME1 ^= '"' AND NAME4 ^= '"' THEN
 NEW_NAME = PRODUCT_NAME;
ELSE NEW_NAME = ' ';
RUN;

Valued Guide
Posts: 765

Re: re: Importing Text file

Hi, curious, what did not work?  Here's a data step with your variable names and the output ...

 

data products;
infile datalines dsd;
input x product_name :$50. y;
product_name = substr(product_name,2,length(product_name)-2);
datalines;
34,"New Era Combination "A" Biochemic Tissue Salts",99
35,"Old Era "B" Biochemic Tissue Peppers",99
;

 

Obs product_name                                    

1   New Era Combination "A" Biochemic Tissue Salts 
2   Old Era "B" Biochemic Tissue Peppers           

 

Using that same data and your method I get ...

 

Obs               NEW_NAME

1   New Era Combination "A" Biochemic Tissue Salts
2   Old Era "B" Biochemic Tissue Peppers

 

That looks the same to me.  Can you give me some examples where my method did not work?

 

Also, I rewrote some of your code so you can see that you can use CAT functions ... and, these statements ...

 

NEW_NAME = STRIP(NAME3)||STRIP(NAME4);

NEW_NAME = STRIP(NAME5);

 

are the same as ...

 

NEW_NAME = STRIP(NAME3)||NAME4;

NEW_NAME = NAME5;

 

since the trailing blanks on NAME4 and on NAME5 are still there in the new variable NEW_NAME.  If NEW_NAME has a length of 50, then the missing portion of the value is always padded on the right with blanks.  Your statements are trying to give NEW_NAME a different length for different observations, not possible.

 

CAT functions (and two data steps combined into one) ...

 

data products2;

length new_name $50.;
set products;

name1 = substr(product_name,1,1);
name2 = substr(product_name,2,length(product_name)-1);
name3 = substr(product_name,1,length(product_name)-1);
name4 = substrn(product_name,max(1,length(product_name)-0),1);
name5 = substr(name2,1,length(name2)-1);

if name1 ^= '"' and name4 = '"'  then new_name = cats(name3,name4);
else
if name1 = '"' and name4 ^= '"'  then new_name = cats(name1,name2);
else
if name1 = '"' and name4 = '"'   then new_name = name5;
else
if name1 ^= '"' and name4 ^= '"' then new_name = product_name;
else call missing(new_name);

run;

Regular Contributor
Posts: 222

Re: re: Importing Text file

Hi MikeZdeb.....When I tried and ran your code, it actually did remove the double quoatation marks that were the first and last character but it also removed the first and last digit/letter to those product names that were not in double quotation marks. Here is the data step that I used and the output that I did get;

 

DATA PRODUCTS;
 SET Products;
new_name = substr(product_name,2,length(product_name)-2);
run;

 

PRODUCT_NAME new_name
Red Bull Energy Drink                 ed Bull Energy Drin
Vitamin D 400 Iu                          itamin D 400 I
Echinacea And Goldenseal        chinacea And Goldensea
Flax Seed Oil 1000 Mg               lax Seed Oil 1000 M
Vitamin D                                    itamin
Allicin Rich Garlic                        llicin Rich Garli
Glucosamine Sulfate                  lucosamine Sulfat
Vitamin C                                   itamin
Glucosamine Chondroitin Complex lucosamine Chondroitin Comple
Vitamin B6                                 itamin B
Vitamin B2                                 itamin B
Vitamin C Timed Release          itamin C Timed Releas
Vitamin C 1000 Mg Timed Release itamin C 1000 Mg Timed Releas
Milk Thistle                                ilk Thistl
Now Wheat Germ Oil Softgels    ow Wheat Germ Oil Softgel
"New Era Combination "S" Biochemic Tissue Salts" New Era Combination "S" Biochemic Tissue Salts
"New Era Combination "I" Biochemic Tissue Salts" New Era Combination "I" Biochemic Tissue Salts
"New Era Combination "J" Biochemic Tissue Salts" New Era Combination "J" Biochemic Tissue Salts
"New Era Combination "K" Biochemic Tissue Salts" New Era Combination "K" Biochemic Tissue Salts
"New Era Combination "L" Biochemic Tissue Salts" New Era Combination "L" Biochemic Tissue Salts

 

Valued Guide
Posts: 765

Re: re: Importing Text file

Hi, try this that first checks for the presence of the outside quotes before using the SUBSTR function. Worked fine with the data you posted.  Also, the comments I made in an earlier posting about use of STRIP not being of use in cewrtain situations only applies to its removal of trailing blanks, not leading blanks.  If it's just leading blanks that are the concern, the LEFT function could also be used.

 

data y;
set x;
if char(stuff,1) eq '"' and char(stuff,length(stuff)) eq '"' then
stuff = substr(stuff,2,length(stuff)-2);
run;  

 

data set Y ...

Obs stuff

1   Red Bull Energy Drink
2   Vitamin D 400 Iu
3   Echinacea And Goldenseal
4   Flax Seed Oil 1000 Mg
5   Vitamin D
6   Allicin Rich Garlic
7   Glucosamine Sulfate
8   Vitamin C
9   Glucosamine Chondroitin Complex
10  Vitamin B6
11  Vitamin B2
12  Vitamin C Timed Release
13  Vitamin C 1000 Mg Timed Release
14  Milk Thistle
15  Now Wheat Germ Oil Softgels
16  New Era Combination "S" Biochemic Tissue Salts
17  New Era Combination "I" Biochemic Tissue Salts
18  New Era Combination "J" Biochemic Tissue Salts
19  New Era Combination "K" Biochemic Tissue Salts
20  New Era Combination "L" Biochemic Tissue Salts

Regular Contributor
Posts: 222

Re: re: Importing Text file

Hi MikeZdeb...Thanks once again for your help. I ran both codes and got the exact same results but had to make a slight modification to use a IF/THEN/ELSE as I was only getting output for the new_name to those products that were in double quptation marks and had increase the length of the new_name variable. Thanks once again!!!

 

DATA NHPPRODUCTS2;
LENGTH new_name $180.;
SET Products;
 IF char(PRODUCT_NAME,1) eq '"' AND char(PRODUCT_NAME,length(PRODUCT_NAME)) eq '"' THEN
  new_name = substr(PRODUCT_NAME,2,length(PRODUCT_NAME)-2);
 ELSE new_name = PRODUCT_NAME;
RUN;

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 302 views
  • 1 like
  • 3 in conversation