BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
malena
Calcite | Level 5

Hello,

 

I am trying to assign a random id to mothers in a dataset for 4 years. Of course there are many mothers who give birth more than once in this dataset so I will like to use the same random id if that is the case. I have close to 200,000 records and a key using last name, first name and DOB of the mothers.

 

thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

Or, if you just want large non repeating numbers as ids :

 

proc sort data=mothers; by lastName firstName dob; run;

data mothersId;
call streaminit(7566);
retain id;
set mothers; by lastName firstName dob;
if first.dob then id = int(1e12*rand("UNIFORM"));
format id z12.0;
run;
PG

View solution in original post

9 REPLIES 9
malena
Calcite | Level 5
Hello,

 

I am trying to assign a random id to mothers in a dataset for 4 years. Of course there are many mothers who give birth more than once in this dataset so I will like to use the same random id if that is the case. I have close to 200,000 records and a key using last name, first name and DOB of the mothers.

 

thanks!

LinusH
Tourmaline | Level 20
What will be the use of the random id?
Data never sleeps
TomKari
Onyx | Level 15

People frequently use UUIDs for this. Check out the UUIDGEN function.

 

Tom

PGStats
Opal | Level 21

To get random ids going from 1 to the number of distinct mothers, do:

 

proc sort data=mothers; by lastName firstName dob; run;

data mothers2;
call streaminit(7568);
retain id;
set mothers; by lastName firstName dob;
if first.dob then id = rand("UNIFORM");
run;

proc rank data=mothers2 out=mothersId ties=dense;
var id;
run;

(untested)

PG
PGStats
Opal | Level 21

Or, if you just want large non repeating numbers as ids :

 

proc sort data=mothers; by lastName firstName dob; run;

data mothersId;
call streaminit(7566);
retain id;
set mothers; by lastName firstName dob;
if first.dob then id = int(1e12*rand("UNIFORM"));
format id z12.0;
run;
PG
malena
Calcite | Level 5

thank you! this works perfectly

Ksharp
Super User

Hash Table

 

data have;
input first $ last $ dob : date9.;
format dob date9.;
cards;
a b 03jun1991
b c 09jan1980
b c 09jan1980
a b 03jun1991
;
run;
data want;
 if _n_=1 then do;
  if 0 then set have;
  declare hash h();
  h.definekey('first','last','dob');
  h.definedata('id');
  h.definedone();
 end;
 set have;
 rc=h.find();
 if rc ne 0 then do;n+1;id=n;h.add();end;
 drop rc n;
run;
LinusH
Tourmaline | Level 20
Not knowing the use case, but perhaps hashing the concatenated business key values could be one solution.
The id values will distributed, it's quite safe if privacy is required and it's simple to use: given the same input always the same hash is produced. Check out the md5() function.
Data never sleeps
Reeza
Super User

Hashing isn't a good idea...

 

New York Taxi Open Data - Reversed Engineered

http://www.theguardian.com/technology/2014/jun/27/new-york-taxi-details-anonymised-data-researchers-...

 

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 1278 views
  • 0 likes
  • 6 in conversation