DATA Step, Macro, Functions and more

I have a big Dataset (1+ Million records) that have four columns and I want to compare all coloumns with particular value and then need to set some Flags if value matches and also if not matches?

Reply
Occasional Contributor
Posts: 16

I have a big Dataset (1+ Million records) that have four columns and I want to compare all coloumns with particular value and then need to set some Flags if value matches and also if not matches?

dataset k;

input a b c d;

datalines;

1 2 4 4

1 3 3 4

2 3 4 4

so on ;

run;

data k;

set k;

if( a=xyz and b=lmn and c=hjk and d=cde) then;

flag1=x1;

flag2=x2;

flag3=x3;

else;

flag1=y1;

flag2=y2;

flag3=y3;

run;

I am worried about the execution time of the above. Could anybody help me optimizing the above problem. One solution I have is to index the four columns. Please provide some useful solution to optimize it in best possible manner.

Thanks in advance.

Valued Guide
Posts: 3,208

Re: I have a big Dataset (1+ Million records) that have four columns and I want to compare all coloumns with particular value and then need to set some Flags if value matches and also if not matches?

1M records  4 numeric of length 8 bytes that is 16Mbyte having 16 numeric 16Mbyte something so small  it will be cached in memory.  Nothing to bother about. Do you have specs that is more big?

---->-- ja karman --<-----
Occasional Contributor
Posts: 16

Re: I have a big Dataset (1+ Million records) that have four columns and I want to compare all coloumns with particular value and then need to set some Flags if value matches and also if not matches?

Thank you for the reply. But the dataset is much bigger more 4 Millions observations and we have limited resource( Executing Capacity). So I am thinking to optimize in some way. If there is any possible way to reduce the execution time please let me know.

Occasional Contributor
Posts: 16

Re: I have a big Dataset (1+ Million records) that have four columns and I want to compare all coloumns with particular value and then need to set some Flags if value matches and also if not matches?

if dataset has 4 millions row  then there will 128Mbyte data to cached and I don't know how well will SAS be able to tackle it with its cached memory.

Valued Guide
Posts: 3,208

Re: I have a big Dataset (1+ Million records) that have four columns and I want to compare all coloumns with particular value and then need to set some Flags if value matches and also if not matches?

Check your hardware settings, even UE is coming with the option set on about 2-3Gb. Your desktop is coming with at least 4Gb and can be easily 12G. Mostly server sessions even far higher.

With those low values (128 Mb) the IO OS systemcache will do. Optimizing buffersize at eg 128kb  and there will a 1000 buffers needs. Unless you are needing split second response time looking not a real issue.
Your selection looks simple (not complicated) your data looks simple (not complicated) unless you are having some challenges there I do not see a lot to optimize with all sequential processing on all data (fastest was)

---->-- ja karman --<-----
PROC Star
Posts: 7,363

Re: I have a big Dataset (1+ Million records) that have four columns and I want to compare all coloumns with particular value and then need to set some Flags if value matches and also if not matches?

Like Jaap said, 4 million is NOT a large dataset for SAS.

I'd be more concerned about whether your code is doing what you want. You have:

if( a=xyz and b=lmn and c=hjk and d=cde) then;

flag1=x1;

flag2=x2;

flag3=x3;

else;

flag1=y1;

flag2=y2;

flag3=y3;

What do xya, lmn, hjk and cde represent? If they're not variables, the above code will fail.

Also, the code doesn't make sense as written. Don't you really want something like?:

if( a=xyz and b=lmn and c=hjk and d=cde) then do;

  flag1=x1;

  flag2=x2;

  flag3=x3;

end;

else do;

  flag1=y1;

  flag2=y2;

  flag3=y3;

end;

Super User
Posts: 6,948

Re: I have a big Dataset (1+ Million records) that have four columns and I want to compare all coloumns with particular value and then need to set some Flags if value matches and also if not matches?

Even 4 million records with 4 numeric variables is EXTREMELY SMALL for SAS. I'd say that at this time, "big" starts at files well above 10 GB in size.

If you have trouble, show the log of your data step with fullstimer enabled.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Contributor ven
Contributor
Posts: 37

Re: I have a big Dataset (1+ Million records) that have four columns and I want to compare all coloumns with particular value and then need to set some Flags if value matches and also if not matches?

Forgetting the fact that your dataset is too small for a significant performance improvement in SAS , fix your code by including where clause instead of if statements, which will give you a better performance gain theoretically.

Super Contributor
Posts: 255

Re: I have a big Dataset (1+ Million records) that have four columns and I want to compare all coloumns with particular value and then need to set some Flags if value matches and also if not matches?

Do all records receive the same set of flags when match occurs and similarly other set of flags when match fails? Do the flags are CONSTANT?

Ask a Question
Discussion stats
  • 8 replies
  • 306 views
  • 0 likes
  • 6 in conversation