BookmarkSubscribeRSS Feed
shukuchi
Calcite | Level 5

sasHi I have a question on SAS that I hope you may be able to help. Suppose that I have two SAS files called thismonth and lastmonth. They have the same fields except one of the fields.

Here is a snippet of the dataset

Thismonth:

Contract NumberStatusPriceCode
43211100300CA
19180907500IF
431219991200CA
54331762700IF
54331762700IF

Lastmonth:

Contract NumberStatusPriceCode
17190444N600CA
43211000I300CA
19180907I500IF
54331762R700IF
54331762R700IF

So what I need to do is to index the data by both contract number and code first. Then I want to do a lookup on lastmonth. For example. if last month's status is "N" then thismonth's status will be "I". And if last month's status is "R" then this month's status is "N". I'm not sure how indexing works in SAS and would appreciate any help. The real datasets have over 400000 rows in each of the dataset, so I think I have to use indexing in order for this lookup to go faster.

Thanks.

4 REPLIES 4
LinusH
Tourmaline | Level 20

This is a good starting point:

SAS(R) 9.4 Language Reference: Concepts

Data never sleeps
Tom
Super User Tom
Super User

Doesn't look like a lookup problem to me. It looks like a straight forward merge.

data want ;

merge thismonth lastmonth(keep = contract code status rename=(status=oldstatus)) ;

by contact code ;

if oldstatus = 'N' then status='I' ;

else if oldstatus = 'R' then status = 'N';

run;

Patrick
Opal | Level 21

Creating an index is only worth it if you're going to use this index multiple times.

For what you need to do you can either use a data step merge as Tom shows, a hash lookup (loading "Lastmonth" into the hash table), or a PROC SQL left join.

Question:

What is the primary key in your data? One of the data lines you've posted occurs twice. Is this just a "typo" or is this how your real data looks like?

jakarman
Barite | Level 11

All depend on size when performance is getting more attention. All things like hasing and Proc SQL have been mentioned many times.

Using  Set / Key

There are however still an other solutions:

Using set wiht table lookup as example 8. SAS(R) 9.3 Statements: Reference

The number of records 400.000 is not that big. If you have some variable like 10 of lengt 10 bytes (recordlength 100 bytes)  is not bigger as 40Mb.

That is small in our era using of Gb's en Tb's.  All data could be easily fit into memory.

Using point/set is more work but could be interesting on big-data approaches

This is based on the approach of using a SAS datastet having a big SAS-dataset needed to get updated with lookups in a smaller dataset knowing the obs number. http://www2.sas.com/proceedings/sugi30/061-30.pdf  Yes, at that era proc sql, SAS/access  was not evolved that far. We were start working with hard disks of 10Mb and 640Kb as internal memory. The big blue machines 16Mb internal with 1 Gb drives.  

That is humor seeing those figures.

With your small data size of 400k records my advice would be go for the most simple to understand solution.  When it performs sufficient you are ready

---->-- ja karman --<-----

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 955 views
  • 0 likes
  • 5 in conversation