Solved: How to Scan with Delimiters, without getting blanks or duplicates in t...

J111 · Posted 09-04-2022 02:34 AM

Hello,

What is the best solution for scanning a formula without getting in the output - c(I) columns (I = 1,2,3..)

a. Null values (for example in the available data we have C3 and C4 with null values)

b. Duplicate values (for example X1 appears twice in the second line of have data )
c. Digits (the digit 2 was dropped succesfully from available data )

Thanks.

Data Have ;
infile datalines dsd ;
input formula : $30. ;
datalines ;
X1 / X2 + X4
(X1 + X2) / (X3 - X1) * 2
;
Run ;

data Available ;
set Have ;
delims = '+-/*()' ;
Array ccc_ [10] $32 c1-c10 ;
do i = 1 to 10 ;
if countc(scan(formula, i, delims),'X') > 0
ccc_[i] = scan(formula, i, delims) ;
end ;
drop delims i ;
run ;

Data Want ;
infile datalines dsd ;
input C1 $3. C2 $3. C3 $3. ;
datalines ;
X1 X2 X4
X1 X2 X3
;
Run ;

ErikLund_Jensen · Posted 09-04-2022 04:32 AM

Hi @J111

The following should work. I had some trouble getting the sorting right until I noticed that some of the C1-C10 variables had leading blanks.

/*** Your code ***/

Data Have ;
  infile datalines dsd ;
  input formula : $30. ;
  datalines ;
X1 / X2 + X4
(X1 + X2) / (X3 - X1) * 2
;
Run ;

data Available ;
  set Have ;
  delims = '+-/*()' ;
  Array ccc_ [10] $32 c1-c10 ;
  do i = 1 to 10 ;
    if countc(scan(formula, i, delims),'X') > 0 then ccc_[i] = scan(formula, i, delims) ;
  end ;
  drop delims i ;
run ;

Data Want;
  infile datalines dsd ;
  input C1 $3. C2 $3. C3 $3. ;
  datalines ;
X1 X2 X4
X1 X2 X3
;
Run ;

/**** My solution ***/

* Set number of array elements;
data _null_;
  set available;
  array elements $ c:;
  call symputx('arrdim',dim(elements));
run;
%put &=arrdim;

data result (drop=c:);
  set available;
  array elements $ c:;
  array out $ O1-O%eval(&arrdim);

  * left align elements - to solve bug in input creation;
  do i = 1 to dim(elements);
    elements{i} = left(elements{i});
  end;

  call sortc (of elements{*});
  j = 0;
  do i = 1 to dim(elements);
    if elements{i} ne '' then do;
      if i = 1 then do;
        j = j + 1;
        out{j} = elements{i};
      end;
      else if elements{i} ne elements{i-1} then do;
        j = j + 1;
        out{j} = elements{i};
      end;
    end;
  end;
run;

View solution in original post

J111 · Posted 09-04-2022 02:42 AM

ops ... added "then" below

data Available ; set Have ; delims = '+-/*()' ; Array ccc_ [10] $32 c1-c10 ; do i = 1 to 10 ; if countc(scan(formula, i, delims),'X') > 0 then ccc_[i] = scan(formula, i, delims) ; end ; drop delims i ; run ;

ErikLund_Jensen · Posted 09-04-2022 04:32 AM

Hi @J111

The following should work. I had some trouble getting the sorting right until I noticed that some of the C1-C10 variables had leading blanks.

/*** Your code ***/

Data Have ;
  infile datalines dsd ;
  input formula : $30. ;
  datalines ;
X1 / X2 + X4
(X1 + X2) / (X3 - X1) * 2
;
Run ;

data Available ;
  set Have ;
  delims = '+-/*()' ;
  Array ccc_ [10] $32 c1-c10 ;
  do i = 1 to 10 ;
    if countc(scan(formula, i, delims),'X') > 0 then ccc_[i] = scan(formula, i, delims) ;
  end ;
  drop delims i ;
run ;

Data Want;
  infile datalines dsd ;
  input C1 $3. C2 $3. C3 $3. ;
  datalines ;
X1 X2 X4
X1 X2 X3
;
Run ;

/**** My solution ***/

* Set number of array elements;
data _null_;
  set available;
  array elements $ c:;
  call symputx('arrdim',dim(elements));
run;
%put &=arrdim;

data result (drop=c:);
  set available;
  array elements $ c:;
  array out $ O1-O%eval(&arrdim);

  * left align elements - to solve bug in input creation;
  do i = 1 to dim(elements);
    elements{i} = left(elements{i});
  end;

  call sortc (of elements{*});
  j = 0;
  do i = 1 to dim(elements);
    if elements{i} ne '' then do;
      if i = 1 then do;
        j = j + 1;
        out{j} = elements{i};
      end;
      else if elements{i} ne elements{i-1} then do;
        j = j + 1;
        out{j} = elements{i};
      end;
    end;
  end;
run;

Ksharp · Posted 09-04-2022 06:35 AM

If you have Y1 or Z2 ...........

Data Have ;
  infile datalines dsd ;
  input formula : $30. ;
  datalines ;
X1 / X2 + X4
(X1 + X2) / (X3 - X1) * 2
;
Run ;

data temp;
 set have;
 id+1;
 pid=prxparse('/[a-z]\d+/i');
 start=1;end=length(formula);
 call prxnext(pid,start,end,formula,p,l);
 do while(p>0);
  temp=substr(formula,p,l);
  output;
  call prxnext(pid,start,end,formula,p,l);
 end;
keep id temp;
run;
proc sort data=temp nodupkey;by id temp;run;
proc transpose data=temp out=want(drop=_:) prefix=C;
by id;
var temp;
run;

How to Scan with Delimiters, without getting blanks or duplicates in the output

Re: How to Scan with Delimiters, without getting blanks or duplicates in the output

Re: How to Scan with Delimiters, without getting blanks or duplicates in the output

Re: How to Scan with Delimiters, without getting blanks or duplicates in the output

Re: How to Scan with Delimiters, without getting blanks or duplicates in the output

How to Scan with Delimiters, without getting blanks or duplicates in the output

Re: How to Scan with Delimiters, without getting blanks or duplicates in the output

Re: How to Scan with Delimiters, without getting blanks or duplicates in the output

Re: How to Scan with Delimiters, without getting blanks or duplicates in the output

Re: How to Scan with Delimiters, without getting blanks or duplicates in the output

Click image to register for webinar

Classroom Training Available!