BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
BranManderly34
Obsidian | Level 7

I have a text file with comma as delimiter. However, the field values I want to read are further wrapped in pipe characters for 24 out of 26 column variables.

For clarity, please see the attached photo.

 

BranManderly34_0-1657146617136.png

What I want eventually is a SAS dataset in which text fields are displayed without the pipe character. For example, the first row and the first column of my dataset should be just A170015889 without being surrounded by pipe characters. Without those characters, importing is relatively straightforward with the following code:

data all_trans;
infile "F:\UNL_academic\Insider Trading\Insider Trading_PIandEPU\
data_from_open_secrets\PFDtransactions_with_data_definitions.txt"
dlm = ',' dsd firstobs=2 missover;
input ID:best20. Chamber CID CalendarYear ReportType Asset4SJD Asset4Transacted Orgname
Ultorg RealCode Source Asset4Descrip Orgname2 Ultorg2 RealCode2 Source2
Asset4Purchased Asset4Sold Asset4Exchanged Asset4Date Asset4DateText
Asset4TransAmt Asset4ExactAmt CofD TransNotes Dupe;
run;

Because of those pipe characters, when I run the code, SAS returns null (or just .) in the dataset. Can someone please help me overcome this problem? I tried searching online for answers and I couldn't find a solution to my problem. Thanks. 

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User
data all_trans;
infile "F:\UNL_academic\Insider Trading\Insider Trading_PIandEPU\
data_from_open_secrets\PFDtransactions_with_data_definitions.txt"
dlm = ',' dsd firstobs=2 missover;
input ID $ Chamber $ CID $ CalendarYear $ ReportType $ Asset4SJD  $ Asset4Transacted $ Orgname $
Ultorg $ RealCode $ Source  $ Asset4Descrip $ Orgname2 $ Ultorg2 $ RealCode2 $ Source2 $
Asset4Purchased $ Asset4Sold $ Asset4Exchanged $ Asset4Date  $ Asset4DateText $
Asset4TransAmt $ Asset4ExactAmt $ CofD  $TransNotes $ Dupe $;
run;

data all_trans_nopipes;
set all_trans;
array _char(*) $ ID --Dupe;
do i=1 to dim(char);
_char(i) = compress(_char(i), '|');
end;

*do conversions required here;

YearCalender = input(CalenderYear, 8.);
....
run;

I know that won't work, the lengths on variables aren't correct for starters but it may help you get started.

View solution in original post

13 REPLIES 13
Reeza
Super User
I suspect this is done because your file has embedded comma's and/or quotes in the variables....
Can you provide a sample as text? I would also look for a few quotes or , in those text fields and include those in the sample. If you can't post real data, feel free to include fake data.

You can read everything in as text, remove the pipes with COMPRESS() and then convert the desired fields to numeric.
Reeza
Super User
data all_trans;
infile "F:\UNL_academic\Insider Trading\Insider Trading_PIandEPU\
data_from_open_secrets\PFDtransactions_with_data_definitions.txt"
dlm = ',' dsd firstobs=2 missover;
input ID $ Chamber $ CID $ CalendarYear $ ReportType $ Asset4SJD  $ Asset4Transacted $ Orgname $
Ultorg $ RealCode $ Source  $ Asset4Descrip $ Orgname2 $ Ultorg2 $ RealCode2 $ Source2 $
Asset4Purchased $ Asset4Sold $ Asset4Exchanged $ Asset4Date  $ Asset4DateText $
Asset4TransAmt $ Asset4ExactAmt $ CofD  $TransNotes $ Dupe $;
run;

data all_trans_nopipes;
set all_trans;
array _char(*) $ ID --Dupe;
do i=1 to dim(char);
_char(i) = compress(_char(i), '|');
end;

*do conversions required here;

YearCalender = input(CalenderYear, 8.);
....
run;

I know that won't work, the lengths on variables aren't correct for starters but it may help you get started.

BranManderly34
Obsidian | Level 7

Hi,

 

BranManderly34_1-1657220032505.png

Some values of variables contain ',' or ';' and what makes the matter slightly more complicated is that not all fields are wrapped in || characters. 2 out of 2 columns will not have field values wrapped in || characters.

Tom
Super User Tom
Super User

Please post text as text and not as pictures.

It looks like the missing values do not have the pipes added.

Just convert the file to a new file and read that.

data _null_;
   infile 'myfile.txt' dsd truncover ;
   file 'myfile_fixed.txt' dsd ;
   do i = 1 to 20 ;
      input string :$1000. @;
      if string=: '|' then string=substrn(string,2,length(string)-2);
      put string @;
   end;
   put;
run;

If the number of fields is not 20 then just change that.  If the longest field needs more than 1000 bytes then change the informat width.

If the number varies it is possible with some extra INFILE options and logic to copy all of the fields without knowing in advance how many fields will be on a line.

Reeza
Super User

That shouldn't affect the solution I proposed, did you test it?


@BranManderly34 wrote:

Hi,

 

BranManderly34_1-1657220032505.png

Some values of variables contain ',' or ';' and what makes the matter slightly more complicated is that not all fields are wrapped in || characters. 2 out of 2 columns will not have field values wrapped in || characters.


 

BranManderly34
Obsidian | Level 7
data all_trans;
infile "F:\UNL_academic\Insider Trading\Insider Trading_PIandEPU\
data_from_open_secrets\PFDtransactions_with_data_definitions.txt"
dlm = ',' dsd firstobs=2 missover;
input ID: $18. Chamber: $3. CID:$12. CalendarYear:$6. ReportType: $3. Asset4SJD: $3. 
Asset4Transacted: $102. Orgname: $42. Ultorg: $42. RealCode: $7. Source: $7. 
Asset4Descrip: $102. Orgname2: $42. Ultorg2: $42. RealCode2: $7. Source2: $7.
Asset4Purchased: $3. Asset4Sold: $3. Asset4Exchanged: $3. Asset4Date: Best12. 
Asset4DateText: $52. Asset4TransAmt: $6. Asset4ExactAmt: Best24. CofD:$3.  
TransNotes: $102. Dupe: $3.;
run;

data all_trans_nopipes;
set all_trans;
array _char(*) $ ID --Dupe;
do i=1 to dim(char);
_char(i) = compress(_char(i), '|');
end;

@Reeza 

The first block of code can read the data as text, but when I execute the second block, I am getting two errors as follows:

All variables in array list must be the same type, i.e., all numeric or character

The DIM, LBOUND, and HBOUND functions require an array name for the first argument.

Tom
Super User Tom
Super User

Because you did not read them all as character.

Asset4Date: Best12. 

The BEST informat does not really exist but it will be treated as an alias for the normal numeric informat.

So you are defining ASSET4DATE as numeric.

If that variable does not ever contain the pipe characters then the INPUT could work, but in that case you need to exclude that variable from the list of variable in the ARRAY in the second step.

BranManderly34
Obsidian | Level 7

I see. I just started on the SAS and now I kind of understand why the second block doesn't work. Let me work it out.

Reeza
Super User

You read in Asset4Date and Asset4ExactAMT as numeric not character.

 

Try using _character_ to refer to all character variables instead.

 

data all_trans_nopipes;
set all_trans;
array _char(*) $ _character_;
do i=1 to dim(char);
_char(i) = compress(_char(i), '|');
end;
run;

 

 

BranManderly34
Obsidian | Level 7
data all_trans;
infile "mypath\PFDtransactions_with_data_definitions.txt"
dlm = ',' dsd firstobs=2 missover;
input ID: $18. Chamber: $3. CID:$12. CalendarYear:$6. ReportType: $3. Asset4SJD: $3.
Asset4Transacted: $102. Orgname: $42. Ultorg: $42. RealCode: $7. Source: $7.
Asset4Descrip: $102. Orgname2: $42. Ultorg2: $42. RealCode2: $7. Source2: $7.
Asset4Purchased: $3. Asset4Sold: $3. Asset4Exchanged: $3. Asset4Date: $10.
Asset4DateText: $52. Asset4TransAmt: $6. Asset4ExactAmt: $12. CofD:$3.
TransNotes: $102. Dupe: $3.;
run;

data all_trans_nopipes;
set all_trans;
array myarray(*) $ ID --Dupe;
do i=1 to dim(myarray);
myarray(i) = compress(myarray(i), '|');
end;
Asset4Date_dt = input(Asset4Date,MMDDYY10.);
format Asset4Date_dt MMDDYY10.;
run;

This is what I have for now, and it seems to work. Thanks a million!

Tom
Super User Tom
Super User

Why not just fix the lines to look more like a normal CSV file?

If the lines are short enough you can even do it on the fly by modifying the _INFILE_ buffer.  So change the pipes to quotes.  If there are any actual quotes change them to two quotes.  (Make sure to double the quotes before converting the pipes to quotes.)

data all_trans;
  infile "your long filename goes here"
    dsd firstobs=2 truncover 
  ;
  input @;
  _infile_=translate(tranwrd(_infile_,'"','""'),'"','|');
  input ID :$20. Chamber CID CalendarYear ReportType Asset4SJD Asset4Transacted
     Orgname Ultorg RealCode Source Asset4Descrip Orgname2 Ultorg2 RealCode2 Source2
     Asset4Purchased Asset4Sold Asset4Exchanged Asset4Date Asset4DateText
     Asset4TransAmt Asset4ExactAmt CofD TransNotes Dupe
  ;
run;

Make sure to either DEFINE the variables to have the right type (and for character variables the right length) before the INPUT statement by using a LENGTH or ATTRIB statement before the variables are using in other statements, like INPUT.

 

Or add in-line informats in the INPUT statement for the variables that need them.  SAS does NOT need be told to use special informats for most variables.  Although if you have not defined in advance as character you can give SAS the information it needs to guess how long you want character variables defined by adding the $ informat to the input statement.  Then it will guess to make the variable length match the width on the informat.  Like the $20. informat I added for the ID variable in the example above.

 

Make sure to include the colon modifier before the informat specification so that the input continues to use LIST MODE instead of FORMATTED MODE input.

Tom
Super User Tom
Super User

You could also just try using DLMSTR='|,|' instead of DSD and DLM= options.

Then you just need to deal with the leading | on the first variable and the trailing | on the last variable.

The leading pipe you could skip by starting your INFILE statement with @ 2 so it just skips the first byte on the line.  Note this will not work if the first field is ever empty.

BranManderly34
Obsidian | Level 7

I am afraid this won't work because not all variables are wrapped in pipe characters. the dataset has 26 columns, and 2 out of 26 have values without pipe characters.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 1107 views
  • 4 likes
  • 3 in conversation