BookmarkSubscribeRSS Feed
Purushottam
Fluorite | Level 6

Hello Everyone !

I want to read this data , but not able , 

kindly advise me .

Problem 1

P9988 HR Finance Analytics S3498 HR IT Finance
R4634 Finance Analytics Sale

Vocab: EMPID Department

Output Desired
EMPID Department
P9988 HR
P9988 Finance
P9988 Analytics

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Purushottam Sharma
A students from Management college

Mob: 918130352772
5 REPLIES 5
art297
Opal | Level 21

Not exactly sure if I understand what you're trying to do, but the following should at least give you some idea of how you can parse such data:

data have;
  length string $80;
  input;
  string=_infile_;
  cards;
P9988 HR Finance Analytics S3498 HR IT Finance 
R4634 Finance Analytics Sale 
;

data want (keep=empid department);
  set have;
  length substring $80
         empid $5
         department $50;
  retain pattern;
  if _n_ eq 1 then pattern=PRXPARSE("/[a-zA-Z]\d\d\d\d/");
  substring=string;
  do until (start eq 0);
    CALL PRXSUBSTR(pattern, substring, start, length);
    if start gt 0 then do;
      EMPID=substr(substring,start,length);
      substring=substrn(substring,start+length);
      CALL PRXSUBSTR(pattern, substring, start, length);
      if start gt 0 then do;
        Department=substr(substring,1,start-1);
        substring=substrn(substring,start);
      end;
      else Department=substring;
      output;
    end;
  end;
run;

Art, CEO, AnalystFinder.com

 

Purushottam
Fluorite | Level 6
Thanks for your Reply,
But these codes are not working
Desired Output must be a table.
EMPID Department
P9988 HR
P9988 Finance
P9988 Analytics
S3498 HR
S3498 IT
S3498 Finance
R4634 Finance
R4634 Analytics
R4634 Sale
Purushottam Sharma
A students from Management college

Mob: 918130352772
art297
Opal | Level 21

The code I suggested did create a table, but apparently not containing what you want. I think the following does match what you want:

data have;
  length string $80;
  input;
  string=_infile_;
  cards;
P9988 HR Finance Analytics S3498 HR IT Finance 
R4634 Finance Analytics Sale 
;

data want (keep=empid department);
  set have;
  length substring $80
         empid $5
         department full_department $50;
  retain pattern;
  if _n_ eq 1 then pattern=PRXPARSE("/[a-zA-Z]\d\d\d\d/");
  substring=string;
  do until (start eq 0);
    CALL PRXSUBSTR(pattern, substring, start, length);
    if start gt 0 then do;
      EMPID=substr(substring,start,length);
      substring=substrn(substring,start+length);
      CALL PRXSUBSTR(pattern, substring, start, length);
      if start gt 0 then do;
        Full_Department=substr(substring,1,start-1);
        substring=substrn(substring,start);
      end;
      else Full_Department=substring;
      counter=1;
      do while (scan(Full_Department,counter) ne '');
        department=scan(Full_Department,counter);
        counter+1;
        output;
      end;
    end;
  end;
run;

Art, CEO, AnalystFinder.com

 

Ksharp
Super User


data have;
input x : $100. @@;
length id $ 100;
retain id;
pid=prxparse('/[a-z]\d+/i');
if prxmatch(pid,strip(x)) then id=x;
 else do;department=x;output;end;
drop pid x;
  cards;
P9988 HR Finance Analytics S3498 HR IT Finance 
R4634 Finance Analytics Sale 
;
run;

Tom
Super User Tom
Super User

Your post looks garbled. Please post same data using the Insert Code icon on the toolbar in the editor. This will pop-up a new window where you can past the data and/or code and it will preserve the spacing and line breaks.

 

If your data is in lines then something as simple as this will combine the first word with all of the following words on the line. 

data want ;
  length empid $10 department $20 ;
  infile datalines truncover ;
  input empid department @ ;
  do until (missing(department ));
    output;
    input department @;
  end;
datalines;
P9988 HR Finance Analytics 
S3498 HR IT Finance 
R4634 Finance Analytics Sale 
;

Is the there are multiple EMPID on the same line then you need some logic to tell an EMPID from a DEPARTMENT name. In you example if looks like they are a letter followed by 4 digits. So something like this should work.

data want ;
  length empid $10 department $20 ;
  infile datalines flowover ;
  retain empid ;
  input department @@ ;
  if prxmatch('/^[a-z][0-9]{4}$/i',trim(department)) then empid=department;
  else output;
datalines;
P9988 HR Finance Analytics S3498 HR IT Finance 
R4634 Finance Analytics Sale 
;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 1386 views
  • 1 like
  • 4 in conversation