Re: how do I populate a new variable with the first non-null value fro...

macs · Posted 05-16-2016 05:15 PM

I would like to create a new dataset with only 1 row for each ID. I would like the first non-null value from the "field" variable to be placed into a new variable called "Animal". I have included an example of my orginal dataset and my desired dataset below. What would be the best approach to create the desired dataset? Thanks in advance for any help you can provide.

Original Dataset

ID	Field
1
2	dog
3
2
1	cat
3	dog
1	cat
1
4	hamster
2	dog

Desired Datset

ID	Animal
1	cat
2	dog
3	dog
4	hamster

PGStats · Posted 05-16-2016 05:56 PM

Assuming you want to drop IDs that have no animal :

proc sort data=original(where=(field is not missing)) 
	out=animals equals /* maintain original order within by-groups */;
by ID;
run;

data desired;
set animals; by ID;
if first.ID;
rename field=Animal;
run;

PG

ballardw · Posted 05-16-2016 06:00 PM

If you have another "order" related variable such as date then sort by id and that other variable and use First.processing. If your original data order is important but you don't have such a variable then add one:

Data temp;

set have (where= (field ne ''));

order=_n_;

animal=field

run;

The sort:

Proc sort data=temp ;

by Id order;

run;

And First. processing

data want;

set temp;

by id order;

if first.id

keep id animal;

run;

carlosmirandad · Posted 05-16-2016 06:10 PM

Updated my answer for brevity (as in PG's and Ksharp's posts) and included a second option which would be helpful for very large datasets:

*Sample data;
data original;
infile datalines dlm="|" dsd;
input ID field $ @@;
datalines;
1||2|dog|3||2||1|cat|3|dog|1|cat|1||4|hamster|2|dog
;
run;

*Approach one: sorting and deduping by ID preserving the relative order;
proc sort 
	data = original (where=(field is not missing)) 
	out = result (rename=(field=animal)) 
	EQUALS NODUPKEY; 
	by ID; 
run;

*Aproach two: Using a hash table to pass once thru the data without having to sort beforehand;
data result2;
	set original;
	where ~missing(field);

	if _N_ = 1 then do;
		declare hash hAnimals();
		_rc = hAnimals.DefineKey('ID');
		_rc = hAnimals.DefineData('field');
		_rc = hAnimals.DefineDone();
		drop _rc;
	end;

	if hAnimals.find() then do;
		_rc = hAnimals.add();
		output;
	end;

	rename field=animal;
run;

Ksharp · Posted 05-16-2016 09:28 PM

Compress PG's code into one proc :

data original;
infile datalines dlm="|" dsd;
input ID field $ @@;
datalines;
1||2|dog|3||2||1|cat|3|dog|1|cat|1||4|hamster|2|dog
;
run;

proc sort data=original(where=(field is not missing)) out=want nodupkey; 
by ID;
run;

carlosmirandad · Posted 05-17-2016 12:07 PM

Short and sweet. That's great. I would just add the EQUALS option to ensure that order is preserved.

PGStats · Posted 05-17-2016 12:40 PM

Good idea. I add the EQUALS option when it's important, even if it is the default.

PG

carlosmirandad · Posted 05-17-2016 03:02 PM

Me too. I like to explicitly request the options that I need, even if they are the default, just out of precaution. If the NOSORTEQUALS system option was turned on for any reason, that would change the default of the sort procedure. It also makes your intent more clear in the code. Agree 100%

how do I populate a new variable with the first non-null value from a group

Re: how do I populate a new variable with the first non-null value from a group

Re: how do I populate a new variable with the first non-null value from a group

Re: how do I populate a new variable with the first non-null value from a group

Re: how do I populate a new variable with the first non-null value from a group

Re: how do I populate a new variable with the first non-null value from a group

Re: how do I populate a new variable with the first non-null value from a group

Re: how do I populate a new variable with the first non-null value from a group

Registration is open

SAS Training: Just a Click Away