BookmarkSubscribeRSS Feed
tsai1108_gmail_com
Calcite | Level 5

Dear SAS users, 

this task has been killing me.... Please help me if you can. 

I am trying to develop a customer base by creating a xls out of thousands of vcf. 

Is there any way I can automate this process instead of doing thousands times of copying and pasting?

 

basically a vcf card looks like this in a text format, and I want to write a program that reading N, FN, ORG, TITLE, TEL, ADR

Even if I can't automate the entire process, I will be very appreciative if anyone can share any shortcuts!


BEGIN:VCARD
VERSION:2.1
N:Doe;John;;;
FN:John Doe
ORG:Doe Company, The;
TITLE: President
NOTE;ENCODING=QUOTED-PRINTABLE: This is a note associated with this
contact=0D=0A
TEL;WORK;VOICE:(987) 123-4567
TEL;HOME;VOICE:(987) 765-4321
TEL;CELL;VOICE:(987) 135-8642
TEL;WORK;FAX:(987) 246-1357
ADR;WORK:;;1234 North Street;Anytown;TX 751234;;United States of America
LABEL;WORK;ENCODING=QUOTED-PRINTABLE:1234 North Street=0D=0AAnytown, TX
751234 =0D=0AUnited States of America
URL:
URL:<WWLINK TYPE="GENERIC"
VALUE="http://www.doeweb.com">http://www.doeweb.com</WWLINK>
EMAIL;PREF;INTERNET:jdoe@nowhere.com
REV:19980114T170559Z
END:VCARD 

 Thank you!

3 REPLIES 3
ballardw
Super User

By "thousands of vcf" do you mean you have one or more folders with multiple vcf files or one file with thousands of contacts?

 

Your example shows 4 values for TEL, how do you want those to appear in our data set? Could there be more than those 4 types? Fewer?

Similarly the ADR implies that you might have more than one address. If so how many and what types?

 

What do you want with the rest of the stuff shown? discarded?

 

Something else that may have an impact: is that NOTE on a single line or may it be on more than one line?

 

It will help a bit if you can show how you expect your SAS file to look and then the Excel.

Though I do wonder what you can do with this in Excel you can't in SAS.

tsai1108_gmail_com
Calcite | Level 5

Hi ballardw;

so currently, I have downloaded all vcf manually, as I don't know how to code to download them directly from the web.... 

I put all the downloaded vcfs (thousand of them in separate files) into one single folder categorized by company. 

This is how i'd like to see my output. 

I have to put it in excel for now as I don't know how to make this happen using SAS. I am only familiar with structured data..... no clues of how to manipulate texts. I will be happy with two phone numbers and one address if there are multiple. 

Thanks for your help!

 

 

ballardw
Super User

Here is something that should help you get started.

data example;
   informat n fn $25. org $40. title $25. phonework phonehome phonecell phonefax $20.
        adrwork $100.
   ;
   input @;
   /* see if we have found the beginning of a record*/
   if _infile_=:"BEGIN:VCARD" then do;
      input /    
            /@'N:' n
            /@'FN:' fn
            /@'ORG:' org
            /@'TITLE:' title 
      ;
      /* no into likely conditional reads*/
      do until (_infile_=:'TEL');
         input ;
      end;
      do while (_infile_=:'TEL');
         if index(_infile_,'WORK')>0 then do;
            if index(_infile_,'VOICE')>0 THEN phonework= substr(_infile_,index(_infile_,'('));
            ELSE IF index(_infile_,'FAX')>0 THEN phonefax= substr(_infile_,index(_infile_,'('));
         end;
         if index(_infile_,'HOME')>0 then phonehome= substr(_infile_,index(_infile_,'('));
         if index(_infile_,'CELL')>0 then phonecell= substr(_infile_,index(_infile_,'('));
         input;
      end;
      /* at this point you do something similar to reading the phones to 
         read address(es) */

      /* only output after reading the fields needed*/
      output;
   end;
   else input;
datalines4;
BEGIN:VCARD
VERSION:2.1
N:Doe;John;;;
FN:John Doe
ORG:Doe Company, The;
TITLE: President
NOTE;ENCODING=QUOTED-PRINTABLE: This is a note associated with this
contact=0D=0A
TEL;WORK;VOICE:(987) 123-4567
TEL;HOME;VOICE:(987) 765-4321
TEL;CELL;VOICE:(987) 135-8642
TEL;WORK;FAX:(987) 246-1357
ADR;WORK:;;1234 North Street;Anytown;TX 751234;;United States of America
LABEL;WORK;ENCODING=QUOTED-PRINTABLE:1234 North Street=0D=0AAnytown, TX
751234 =0D=0AUnited States of America
URL:
URL:<WWLINK TYPE="GENERIC"
VALUE="http://www.doeweb.com">http://www.doeweb.com</WWLINK>
EMAIL;PREF;INTERNET:jdoe@nowhere.com
REV:19980114T170559Z
END:VCARD
;;;;

Key elements:

INput is executeable, you can request an input any time you want while going through a file.

The @ holds a current line.

The /@ on an input says advance a line in the file (the /) and at the position the text is found, @'N:' for example start reading the variable name that follows.

 

_INFILE_ is an automatic variable that holds the value of the current line form the file being read. So you can parse it using any text search functions.

I did not do anything to parse out the address because I am afraid that you are likely to found to may addresses that have different things missing or poorly formatted. I suspect your example has an error of a Zip code in the state field. Each of those ; in the ADR probably indicate a different value but I found reference to at least 3 different VCARD formats and the addresses are one of the things that changes. So you may need to actually read that VERSION information to know how to parse a specific card.

 

The serious logic goes into identifying which things are SKIPPED. I do not know what one of these might look like without an EMAIL for instance. I suspect you may not even have a line that starts with EMAIL.

 

The forum has multiple examples of reading many similar files in a single data step. Search for one that uses the FILENAME option on an INFILE statement. That is used to restart reading from the top of a new file.

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1933 views
  • 0 likes
  • 2 in conversation