SAS Perl prx how to parse data more than once

Reply
Occasional Contributor
Posts: 10

SAS Perl prx how to parse data more than once

Hi,

Please find attached the csv file with the data I am trying to clean. My data contains the brand of a camera, its model #, its price (price1 or price2-- whichever column is populated; both will never be populated at the same time and refers to the same camera) and discamt: how much the camera is discounted. Then I use an If statement to set the value of the variable to 0 if it is missing so that I can add them together later. Since I am new to SAS, if there is a more efficient way to do this pls let me know. For some reason, when I run this program, it generates "invalid data for discamt" in the log but the resulting dataset is fine so I have ignored this.

data bbnov21zoom check_bbnov21zoom;

        length brand $ 80 model $ 25;

        infile "G:\irobot2\irobot-eval\data\nov21data\bb_nov21\zoombbnv21.csv" firstobs=2 dlm="|" dsd;

        input brand model price1 price2 discamt;

        if price1="" then price1=0;

        if price2="" then price2=0;

        if discamt="" then discamt=0;

        netprice= price1+price2;

        regprice=netprice+discamt;

        if index(model, 'B0000') then output check_bbnov21zoom;

                else output bbnov21zoom;

run;

Now my question is: how do I parse this dataset using the PRX Perl regex function within SAS two times:

1) once to extract the brandname of the camera which is a substring from the variable "brand"? For eg, I just need the manufacturer's name which is Sony or Canon or Panasonic, etc. I have written this partial code below but I don't know where to insert it within the datasteps or how to make it work

data manufbbnov21zoom;

        retain manufid;

        if _n_=1 then

        manufid=prxparse('/[[:alpha:]]+/o');

/*not sure what code goes in here*/

        call prxsubstr(manufid,brand,pos,len);

        if pos>0 then

        manuf=substr(brand,pos,len);

        keep brand regprice model manuf;

        run;


2) From the "model" variable I need to extract the model# only which is the combination of letters followed by digits, for eg, A4000 (in A4000ISBLU) or VR-310 (in VR-310 BLACK). The code below which is copied from another program does this correctly but I don't know where to insert this code and how to make it work within this program.

prxid=prxparse('/[[:alpha:]]+[-\s]?\d+/o');

Your help is so greatly appreciated. Thanks and hope to hear from you soon.

Kelly

Attachment
Super User
Posts: 11,343

Re: SAS Perl prx how to parse data more than once

Posted in reply to kellykishan

If your data continues to have the brand as the first item in the name and is always one word you don't need to parse;

BrandName = scan(Brand,1);

Occasional Contributor
Posts: 10

Re: SAS Perl prx how to parse data more than once

Thank you, I did not know that, it is so much easier! I still need help with parsing for the model# though. Anyone out there who can help me please?

Contributor
Posts: 29

Re: SAS Perl prx how to parse data more than once

Posted in reply to kellykishan

Hello. I think this is getting close to what you want...but looking at the model variable, I'm not sure how you parse that properly.

data bbnov21zoomT check_bbnov21zoomT;
  infile "G:\irobot2\irobot-eval\data\nov21data\bb_nov21\zoombbnv21.csv" firstobs=2 dlm="|" dsd TRUNCOVER;
  length netprice regprice discamt 8. newbrand $50. newmodel $50.;
 
  input brand    :$100.
        model    :$50.
        price1   :8.2
        price2   :8.2
        _discamt :$10.
  ;
 
  if price1  = . then price1=0;
  if price2  = . then price2=0;
  if _discamt in ('','"') then discamt=0;
  else discamt = input(_discamt,8.);
 
  netprice= price1+price2;
  regprice=netprice+discamt;

  newbrand = scan(brand,1);
  newmodel = scan(model,1,' ');
 
  drop _discamt;

  if index(model, 'B0000') then output check_bbnov21zoomT;
  else output bbnov21zoomT;
run;


By the way: the invalid discamt error was because there was a double-quote character at the end of one of the lines, being read into a numeric field.

Occasional Contributor
Posts: 10

Re: SAS Perl prx how to parse data more than once

Thank you very much TimArm. I now understand why that invalid discamt was appearing. Thanks also for the code, I will try it and also learn from it.

Kelly

Ask a Question
Discussion stats
  • 4 replies
  • 261 views
  • 6 likes
  • 3 in conversation