SAS Programming

DATA Step, Macro, Functions and more
BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
JMagenta
Obsidian | Level 7

Hello,

Please help.

I have this dataset:

data have;
input ID DCDate Direction $ ICode1 $ ICode2 $ ;
cards;
1 2 South Q22.x Q22.y
1 2 South Q22.x .
1 2 South Q22.y .
1 2 South Q23.x .
2 2 South Q22.x Q22.y
2 2 South Q22.x .
2 2 South Q22.y .
2 2 South Q23.x .
;

I want this output:

ID Date ICode1 Icode2
1 2 Q22.x Q22.y
1 2 Q23.x   
2 2 Q22.x Q22.y
2 2 Q23.x   
1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Transpose and sort to eliminate dups

data tall;
  set have;
  array icode[10] $8 ;
  do i=1 to dim(icode);
    if not missing(icode[i]) then do;
      code=icode[i];
      output;
    end;
  end;
  keep id dcdate code ;
run;

proc sort data=tall nodupkey;
  by id dcdate code;
run;

then transpose back if you are worried you will have more observations for ID than the number of ICODE variables you want to create (perhaps you want to make a dataset just for printing) then here is a method you can use to roll the tall database back into wide, but with limit on how wide.

data want ;
  do i=1 to 2 until(last.dcdate);
    set tall;
    by id dcdate ;
    array icode[2] $8 ;
    icode[i] = code;
  end;
  drop i code;
run;

 

View solution in original post

9 REPLIES 9
Kurt_Bremser
Super User

Transpose to long, then sort:

data long (rename=(dcdate=date));
set have;
array ic {*} icode:;
do i = 1 to dim(ic);
  if not missing (ic{i}
  then do;
    icode = ic{i};
    output;
  enđ;
end;
keep id dcdate icode;
run;

proc sort data=long nodupkey;
by id date icode;
run;
JMagenta
Obsidian | Level 7

Sorry,

 

I would like the following output

ID  DCDate Direction  ICode1 Icode2
1 2 South Q22.x Q22.y
1 2 South Q23.x  
2 2 South Q22.x Q22.y
2 2 South Q23.x  

 

Keeping only the unique ICODEs for each ID group.

 

 

 

 

 

 

 

 

JMagenta
Obsidian | Level 7

ICode1 and ICode2, represent ICode to the nth in the original dataset.

I could drop the new ICODE in the new dataset I guess.

In reality, I have up to ICode10 with not all of them being filled.

I really just want for the group of IDs to capture only the unique ICode values.

 

Tom
Super User Tom
Super User

Transpose and sort to eliminate dups

data tall;
  set have;
  array icode[10] $8 ;
  do i=1 to dim(icode);
    if not missing(icode[i]) then do;
      code=icode[i];
      output;
    end;
  end;
  keep id dcdate code ;
run;

proc sort data=tall nodupkey;
  by id dcdate code;
run;

then transpose back if you are worried you will have more observations for ID than the number of ICODE variables you want to create (perhaps you want to make a dataset just for printing) then here is a method you can use to roll the tall database back into wide, but with limit on how wide.

data want ;
  do i=1 to 2 until(last.dcdate);
    set tall;
    by id dcdate ;
    array icode[2] $8 ;
    icode[i] = code;
  end;
  drop i code;
run;

 

Tom
Super User Tom
Super User

You could also do it with hash objects in a single data step (assuming data already sorted by ID).

data want;
  set have;
  by id dcdate;
  array icode[2] $8 ;
  if _n_=1 then do;
    declare hash h();
    h.definekey('code');
    h.definedata('code');
    h.definedone();
  end;
  do i=1 to dim(icode);
    code = icode[i];
    if not missing(code) then rc=h.add();
  end;
  if last.id then do;
    call missing(of code icode[*]);
    declare hiter iter('h');
    rc = iter.first();
    do until (rc);
      do i=1 to dim(icode) until (rc);
        icode[i]=code;
        rc=iter.next();
      end;
      output;
      call missing(of icode[*]);
    end;
    rc = iter.delete();
    rc = h.clear();
  end;
  drop rc i code;
run;

 

Tom
Super User Tom
Super User

You need to explain more about what you want. Especially since the examples are so simple.

What is the role of ID and DCDATE?  Do you want unique codes per ID?  Or per ID and DCDATE combination?  In your example there is no difference since DCDATE is a constant.

Why was DIRECTION included in the example data? It does not appear to be used at all? 

What is the meaning of ICODE1 and ICODE2?  What do you want to do if the same code appears in both variables for the same group?  Which column should keep the value?  Why are you keeping Q22.y in ICODE2 and not in ICODE1?  What is the rule for deciding where it belongs?

 

JMagenta
Obsidian | Level 7

Sorry,

 

ID is a unique Identifier.

DC date is the same because all codes happened on the same date

Think of Direction as a person's last name so it would be the same but unique to that person, as ID is unique to that person. What I really want to make sure of is that even if someone has the same last name they won't have the same ID since ID is unique and not names.

If in the group of the person's ID's there is a unique ICode then I want to capture it. Otherwise I don't wan it.

 

 

A_Kh
Barite | Level 11
data want; 
	set have; 
	by ID DCDate Direction ICode1 descending icode2;
	retain icode;
	if first.id then icode=icode2;
	if not first.icode1 or icode1=icode then delete; 
	drop icode; 
run;

sas-innovate-white.png

Join us for our biggest event of the year!

Four days of inspiring keynotes, product reveals, hands-on learning opportunities, deep-dive demos, and peer-led breakouts. Don't miss out, May 6-9, in Orlando, Florida.

 

View the full agenda.

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 770 views
  • 0 likes
  • 4 in conversation