BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Jumboshrimps
Obsidian | Level 7

Have a couple million records with a string like
"00 00 01 00 00 01 00 01 00 00 00 00 01 01 00 01 00 00 00 00 01"

"01 00 01 00 00 00 01 00 00 00 00 00 00 01 00 01 00 00 00 00 00"
String has a length of 56. Positions 2/4/6/8/10, etc are filled with either a 1 or most likely, 0.
Positions 1/3/5/7/9, etc. are all zeros and don't matter.
.
My job is parse the string of each record every two positions
(there are no spaces, that is just for clarification).

If there is a 1 in position two that means increment var1 +1
If there is ALSO a 1 in position four, (don't care about leading "0"'s
in position 1/3/5/9, etc.) increment var2 + 1. Each two positions
represent a different var, 28 total vars. String length is ALWAYS 56.

The entire 56 len string must be parsed every two characters. Potentially
there could be 28 variables that have to be incremented in each pass of
the string, (but not realistic, most likely there is only five or six)
which could be found in any section of the string, beginning to end
(as long as they are in position 2/4/6/8 up to 56, etc.)

Using the first example string, results would be

Var1  0

Var2  0

Var3  1

Var4  0

Var5 0
var6  1

var7  0
Var8  1

..
Var13  1
Var14 1

var15 0
Var16 1
Var21 1  (skipping a bunch of vars with 0's)

Now we process the second string.

Results are now:

Var1 1 

Var2 0 (still zero, nothing in that position so far).

Var3 1+1 = 2 

Var4 0 (still zero)

Var5 0 (still zero)

Var6 1 (from previous row)

Var7 1 

Var8 0

..

Var14 1

Var15 0

Var 16 1

Var 21 1

 

This is what my boss gave me:
**if substr(BigString,2,1)='1' then var1+1;**

OK. Fine.
A) There are 27 more places to evaluate in the string.
B) there are a couple million records.

28 nested if then do loops doesn't sound like an answer (all I could think of). At least not to me.

Cut-and-pasted a lot of looping code as below, which check every two positions ,

but I get a finished product with twice as many records  

as I started with in a table I can query, but I'm looking to store variables var1-28 to access them later.

(I don't have any choice about this).
Thanx.

 


data have;
set sources;
Obs = _n_;
len = length(sourcerow);

do i=1 to len by 2;
value = substr(sourcerow, i, 2);
if value gt '00' then
output; 
end;
run;

1 ACCEPTED SOLUTION

Accepted Solutions
ChrisNZ
Tourmaline | Level 20

If I understood correctly, like this?

data have;
 input SOURCEROW $56.;
cards;
11111111111111110000000000000011111111111111111111111111
00000000000000111100000000000011111111111111111111111111
00000000000000001100000000000000000000000000000000000000
run;

data WANT; 
  set HAVE;  
  array VAR[28];
  do I=2 to 56 by 2;
    VAR[I/2]+ input(char(SOURCEROW,I),1.);
  end;
run;

proc print noobs;
  var VAR: ;
run;
2

VAR1 VAR2 VAR3 VAR4 VAR5 VAR6 VAR7 VAR8 VAR9 VAR10 VAR11 VAR12 VAR13 VAR14 VAR15 VAR16 VAR17 VAR18 VAR19 VAR20 VAR21 VAR22 VAR23 VAR24 VAR25 VAR26 VAR27 VAR28
1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 2 1 0 0 0 0 0 0 2 2 2 2 2 2 2 2 2 2 2 2 2
1 1 1 1 1 1 1 2 2 0 0 0 0 0 0 2 2 2 2 2 2 2 2 2 2 2 2 2

 

View solution in original post

11 REPLIES 11
smantha
Lapis Lazuli | Level 10
data have;
set sources;
array a{*} var1 -- var28;
Obs = _n_;
len = length(sourcerow);
do i=2 to len by 2;
a[i/2]= input(substr(sourcerow, i, 1));
end;
run;
Reeza
Super User

OP, you can use PROC FREQ on the resulting table to get your counts but the solution from @smantha is quite succinct and does what you need efficiently. 

 

 

Jumboshrimps
Obsidian | Level 7

Smantha,

 

Thanx for your quick reply.

 

Below is the error log from running the posted code:

 

56 data have;
57 set sources;
58 array a{*} var1 -- var28;
ERROR: Variable var1 cannot be found on the list of previously defined variables.
WARNING: Defining an array with zero elements.
59 Obs = _n_;
60 len = length(sourcerow);
61 do i=2 to len by 2;
62 a[i/2]= input(substr(sourcerow, i, 1));
-
22
76
ERROR 22-322: Syntax error, expecting one of the following: !, !!, &, *, **, +, ',', -, /, <,
<=, <>, =, >, ><, >=, AND, EQ, GE, GT, IN, LE, LT, MAX, MIN, NE, NG, NL, NOT,
NOTIN, OR, ^, ^=, |, ||, ~, ~=.

ERROR 76-322: Syntax error, statement will be ignored.

 

Not sure how to continue.

ballardw
Super User

Your shown example string has 62 characters not counting the " characters and only represents 21 "tuples". Intentional, error or what?Or are the spaces not supposed to be in the string? Please provide actual representative data that matches your description. If the string has not spaces then make sure your example has no spaces, if it is 56 characters provide 56 characters.

Possibly better is provide an actual data set in the form of data step code we can use. Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the </> icon or attached as text to show exactly what you have and that we can test code against.

 

I believe you may be skipping things a bit Are there any other variables involved that have to be carried along?

Your comment of "Potentially there could be 28 variables that have to be incremented in each pass of
 the string, (but not realistic, most likely there is only five or six)" makes it sound like you have blank and not values is that actually case?

 

Then show what the final desired output data set looks like.

 

 

AhmedAl_Attar
Rhodochrosite | Level 12

@Jumboshrimps 

Borrowing code from @smantha , here is a solution that gives you what you looking for

data sources;
	sourcerow="00000100000100010000000001010001000000000101010001010001";
	output;
	sourcerow="01000100000001000000000000010001000000000000010001010100";
	output;
run;

data want(KEEP=Var:);
	array a{28} 4 var1-var28;

	do _n_=1 by 1 until(eof);
		set sources end=eof;
		len=length(sourcerow);

		do i=2 to len by 2;
			a[i/2]=sum(a[i/2],input(substr(sourcerow, i, 1),1.));
		end;
	end;
	output;
run;

Hope this helps,

Ahmed

ChrisNZ
Tourmaline | Level 20

If I understood correctly, like this?

data have;
 input SOURCEROW $56.;
cards;
11111111111111110000000000000011111111111111111111111111
00000000000000111100000000000011111111111111111111111111
00000000000000001100000000000000000000000000000000000000
run;

data WANT; 
  set HAVE;  
  array VAR[28];
  do I=2 to 56 by 2;
    VAR[I/2]+ input(char(SOURCEROW,I),1.);
  end;
run;

proc print noobs;
  var VAR: ;
run;
2

VAR1 VAR2 VAR3 VAR4 VAR5 VAR6 VAR7 VAR8 VAR9 VAR10 VAR11 VAR12 VAR13 VAR14 VAR15 VAR16 VAR17 VAR18 VAR19 VAR20 VAR21 VAR22 VAR23 VAR24 VAR25 VAR26 VAR27 VAR28
1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 2 1 0 0 0 0 0 0 2 2 2 2 2 2 2 2 2 2 2 2 2
1 1 1 1 1 1 1 2 2 0 0 0 0 0 0 2 2 2 2 2 2 2 2 2 2 2 2 2

 

Jumboshrimps
Obsidian | Level 7
That's what I'm talking about!!!!!
Thanx.
Jumboshrimps
Obsidian | Level 7

Excellent!!  Accepted this reply most gratefully.

 

Now I look at a couple thousand records, and I see someone(not me, of course!) has entered a number other than "01"

somewhere in the string.  We are only counting each non -"00" entry as  a 1 for this application.

Is there a similar process I could run before processing that will loop through every 2 characters and, if not "00", change to "01".  

Looking through the data, 99% of the values are indeed "01", but I see some "05","09", even some "12" & "14" buried in there. 

 

Thanks so much.

ballardw
Super User

@Jumboshrimps wrote:

Excellent!!  Accepted this reply most gratefully.

 

Now I look at a couple thousand records, and I see someone(not me, of course!) has entered a number other than "01"

somewhere in the string.  We are only counting each non -"00" entry as  a 1 for this application.

Is there a similar process I could run before processing that will loop through every 2 characters and, if not "00", change to "01".  

Looking through the data, 99% of the values are indeed "01", but I see some "05","09", even some "12" & "14" buried in there. 

 

Thanks so much.


Try

    VAR[I/2]+ ( input(char(SOURCEROW,I),1.) > 0);

SAS returns 1 for a true comparison and 0 for false. So as long as that "other" character is actually a digit this should work. If you are getting things like A, B, C then the whole approach needs some modification.

Jumboshrimps
Obsidian | Level 7

You are a genius.  I checked each variable row after row, and no row increments more than one from the previous row.

 

😁

ballardw
Super User

@Jumboshrimps wrote:

You are a genius.  I checked each variable row after row, and no row increments more than one from the previous row.

 

😁


Not really, just the recipient of revealed wisdom from a prior boss when I was learning SAS back in 1987. Computers were slower and this sort of comparison to numeric ran faster than IF/Then/Else. I had asked why we had some code that looked something like:

newvar = (a=10)*(a*b*c) + (a=5)*(a*a*b) + (a=1)*(a*b*b);

Which is one way to pick a calculation instead of If a=10 then ...  ; else if a=5 then ... ; else if a=1 then ... ;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1670 views
  • 3 likes
  • 6 in conversation