Multiple records

SASPhile · Posted 10-30-2019 12:30 PM

ID	Visit	Test	res
1001	Screen	Height	170
1001	Screen	Weight	82
1001	Screen	Height
1001	Screen	Weight	82
1001	Day 1	Height	170
1001	Day 1	Weight	89
1001	Day 1	Height	170
1001	Day 1	Weight
1002	Screen	Height	180
1002	Screen	Weight	95
1002	Day 1	Height	180
1002	Day 1	Weight	95

How to get Height and Weight per visit and get them multiple times if they are different as below:

ID 1001 taking the non missing value for height. for weight taking non missng value but the weights are different on screen and day1 visits so we get 3 records.

ID 1002 ,height and weight are same during screen and day 1 so we get two records:

ID	Visit	Test	res
1001	Screen	Height	170
1001	Screen	Weight	82
1001	Day 1	Weight	89
1002	Screen	Height	180
1002	Screen	Weight	95

Tom · Posted 10-30-2019 12:33 PM

I am not sure I understand what you are asking for. But it looks like a simple PROC SORT with NODUPKEY . Looks like you want to eliminate the missing values.

proc sort data=have out=want nodupkey;
  where not missing(res);
  by _all_;
run;

Kurt_Bremser · Posted 10-30-2019 12:46 PM

See this:

data have;
infile datalines dlm='09'x dsd truncover;
input ID $ Visit $ Test$ res;
datalines;
1001	Screen	Height	170
1001	Screen	Weight	82
1001	Screen	Height	 
1001	Screen	Weight	82
1001	Day 1	Height	170
1001	Day 1	Weight	89
1001	Day 1	Height	170
1001	Day 1	Weight	 
1002	Screen	Height	180
1002	Screen	Weight	95
1002	Day 1	Height	180
1002	Day 1	Weight	95
;

proc sort
  data=have (where=(res ne .))
  out=int
;
by id test res;
run;

data want;
set int;
by id test res;
if first.res;
run;

proc sort data=want;
by id descending visit test;
run;

proc print data=want noobs;
run;

Result:

 ID     Visit      Test     res

1001    Screen    Height    170
1001    Screen    Weight     82
1001    Day 1     Weight     89
1002    Screen    Height    180
1002    Screen    Weight     95

PS by now (almost 800 posts), you should know where to put questions; you're not a "New User" anymore. It should also be an easy exercise for you to post example data in a data step with datalines, so we don't have to do that extra work. It's not rocket science, and it won't make your head explode or cause you to go blind. Promised.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Kurt_Bremser · Posted 10-30-2019 12:47 PM

PS I moved the question to the correct community.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

novinosrin · Posted 10-30-2019 12:55 PM


data have;
infile cards missover;
input ID	(Visit	Test) (& $10.)	res;
cards;
1001	Screen	Height	170
1001	Screen	Weight	82
1001	Screen	Height	 
1001	Screen	Weight	82
1001	Day 1	Height	170
1001	Day 1	Weight	89
1001	Day 1	Height	170
1001	Day 1	Weight	 
1002	Screen	Height	180
1002	Screen	Weight	95
1002	Day 1	Height	180
1002	Day 1	Weight	95
;


data want ;
if _n_=1 then do;
   dcl hash H () ;
   h.definekey  ("id","test","res") ;
   h.definedata ("id","test","res") ;
   h.definedone () ;
end;
set have;
where res>.;
if h.check() ne 0;
 rc=h.add();
drop rc;
run;

Hi @SASPhile A best time to recommend a solution beautifully demonstrated by geniuses @hashman and @DonH in their book.

Data Management Solutions Using SAS^® Hash Table Operations: A Business Intelligence Case Study

By Paul Dorfman and Don Henderson

I plagiarized their solution but why not.

novinosrin · Posted 10-30-2019 01:14 PM

The previous would needlessly consume memory to accomodate the entire distinct values of the entire table, and if that's a concern , by group processing helps memory footprint to equivalent only to the largest by group. So improving the previous to-->


data have;
infile cards missover;
input ID	(Visit	Test) (& $10.)	res;
cards;
1001	Screen	Height	170
1001	Screen	Weight	82
1001	Screen	Height	 
1001	Screen	Weight	82
1001	Day 1	Height	170
1001	Day 1	Weight	89
1001	Day 1	Height	170
1001	Day 1	Weight	 
1002	Screen	Height	180
1002	Screen	Weight	95
1002	Day 1	Height	180
1002	Day 1	Weight	95
;

data want;
if _n_=1 then do;
   dcl hash H () ;
   h.definekey  ("id","test","res") ;
   h.definedata ("id","test","res") ;
   h.definedone () ;
end;
 do until(last.id);
  set have;
  by id;
  where res>.;
  if h.check()=0 then continue;
  rc=h.add();
  output;
 end;
 h.clear();
 drop rc;
run;

novinosrin · Posted 10-30-2019 01:32 PM

And the traditional SORT and NODUPKEY as suggested by Tom

proc sort data =have out=want nodupkey;
by id test res;
where res>.;
run;

hashman · Posted 10-30-2019 03:24 PM

@novinosrin:

Please feel free to keep plagiarizing ;). Many thanks for the plug!

A little note, if I may: Here you can call the ADD method unassigned (i.e. without coding rc=) because due to the subsetting IF logic, it is always successful - the method is called only when the key is not in the table, and so there's no risk of getting a method call error.

Kind regards

Paul D.

DonH · Posted 10-30-2019 02:27 PM

Part of the point of the book was to provide source material that could be plagiarized.