DATA Step, Macro, Functions and more

Weird data manipulation. Need help.

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 6
Accepted Solution

Weird data manipulation. Need help.

 

 

 

 

Hi,

 

I received a data table that looks like this:

 

Code VAR2 VAR3 VAR4 VAR5 VAR6
C000501 C000873:3 C000501:3      
C000873 C003330:1 C000873:39 C003402:1 C000501:3 C001758:6
C001758 C001758:12 C003330:4 C000873:6    
C003330 C001758:4 C000873:1 C003330:12    
C003402 C000873:1 C003402:4      

 

This is how I wish to transform it:

 

Code C000501 C000873 C001758 C003330 C003402
C000501 3 3      
C000873 3 39 6 1 1
C001758   6 12 4  
C003330   1 4 12  
C003402   1     4

 

 

This is just a sample. Original dataset has about 5000 rows and few thousand columns.

 

Thanks for the help. Let me know if anything is needed.

 

 


Accepted Solutions
Solution
‎10-17-2015 08:15 PM
Respected Advisor
Posts: 4,173

Re: Weird data manipulation. Need help.

[ Edited ]
Posted in reply to bhupesh102

Just to "spell out" in code the approach @Reeza suggested.


data have;
  infile cards truncover;
  input (code var2 var3 var4 var5 var6) (:$15.);
  cards;
C000501 C000873:3 C000501:3 
C000873 C003330:1 C000873:39 C003402:1 C000501:3 C001758:6
C001758 C001758:12 C003330:4 C000873:6 
C003330 C001758:4 C000873:1 C003330:12 
C003402 C000873:1 C003402:4 
;
run;

data long(keep=code key value);
  set have;
  array vars {*} var2 - var6;
  length key $32 value 8;
  do _i=1 to dim(vars);
    if not(missing(vars[_i])) then
      do;
        key=scan(vars[_i],1,':');
        value=input(scan(vars[_i],2,':'),best32.);
        output;
      end;
  end;
run;

proc transpose data=long out=wide(drop=_:);
  by code notsorted;
  id key;
  var value;
run;

Capture.PNG

View solution in original post


All Replies
Super User
Posts: 19,772

Re: Weird data manipulation. Need help.

Posted in reply to bhupesh102
Step 1 - transform you data to a form to the following form using the SCAN and OUTPUT functions.

Code1 Code2 Num
C000501 C000873 3
C000501 C000501 3
...

etc.

Then try a Proc Transpose
Frequent Contributor
Posts: 95

Re: Weird data manipulation. Need help.

[ Edited ]
Posted in reply to bhupesh102

Read the data and create two datasets. Pull the column headings from the var one name per obs. Sort and eliminate dups. Second dataset each obs has the code ,the coll, and the count (after the colon); Do a file print. read the cols data set and print the codes across the top line. Read the code dataset .Move across the page puting the counts under the code column names.

 

I don’t know if you want to use file print but here is a strt of some code The code has errors may give you some ideas.

 

data code cols(keep=col);
infile cards truncover;
input (code var2 var3 var4 var5 var6) (:$15.);
if var2 ne " " then do;
col=scan(var2,1,':'); output cols;
count=scan(var2,2,':'); output code; end;
if var3 ne " " then do;
col=scan(var3,1,':'); output cols;
count=scan(var3,2,':'); output code; end;
if var4 ne " " then do;
col=scan(var4,1,':'); output cols;
count=scan(var4,2,':'); output code; end;
if var5 ne " " then do;
col=scan(var5,1,':'); output cols;
count=scan(var5,2,':'); output code; end;
if var6 ne " " then do;
col=scan(var6,1,':'); output cols;
count=scan(var6,2,':'); output code; end;
cards;
C000501 C000873:3 C000501:3
C000873 C003330:1 C000873:39 C003402:1 C000501:3 C001758:6
C001758 C001758:12 C003330:4 C000873:6
C003330 C001758:4 C000873:1 C003330:12
C003402 C000873:1 C003402:4
; proc print data=code; id code col count; run;
proc print data=cols; id col; run;
data code; set code; if count gt " "; keep code col count;
proc print ; run;
proc sort data=cols; by col;
proc sort; data=code; by code col;

data cols; set cols; by col; if last.col; proc print; run;
data; set cols; file print;
x+10;
put @x col@;

data; set code; by code; file print;
if first.code then do;
x=0; put @5 code @; end;
X+10; put @x count @;
if last.code then put '. ';
run;

Solution
‎10-17-2015 08:15 PM
Respected Advisor
Posts: 4,173

Re: Weird data manipulation. Need help.

[ Edited ]
Posted in reply to bhupesh102

Just to "spell out" in code the approach @Reeza suggested.


data have;
  infile cards truncover;
  input (code var2 var3 var4 var5 var6) (:$15.);
  cards;
C000501 C000873:3 C000501:3 
C000873 C003330:1 C000873:39 C003402:1 C000501:3 C001758:6
C001758 C001758:12 C003330:4 C000873:6 
C003330 C001758:4 C000873:1 C003330:12 
C003402 C000873:1 C003402:4 
;
run;

data long(keep=code key value);
  set have;
  array vars {*} var2 - var6;
  length key $32 value 8;
  do _i=1 to dim(vars);
    if not(missing(vars[_i])) then
      do;
        key=scan(vars[_i],1,':');
        value=input(scan(vars[_i],2,':'),best32.);
        output;
      end;
  end;
run;

proc transpose data=long out=wide(drop=_:);
  by code notsorted;
  id key;
  var value;
run;

Capture.PNG

Occasional Contributor
Posts: 6

Re: Weird data manipulation. Need help.

Thank you Reeza and especially patrick for making it very clear. It Worked!!!
Trusted Advisor
Posts: 1,137

Re: Weird data manipulation. Need help.

Posted in reply to bhupesh102

Yes indeed a very good solution provided with arrays, alternatively with tranpose step we could get the same result as below

 

proc sort data=have;
by code;
run;

proc transpose data=have out=new;
by code;
var var2-var6;
run;

data new2;
set new;
col2=scan(col1,2,':');
col1=scan(col1,1,':');
if col1 ne '';
drop _name_;
run;

proc transpose data=new2 out=trans(drop=_name_);
by code;
var col2;
id col1;
run; 

Please try and check.

 

Thanks,

Jag

Thanks,
Jag
🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 248 views
  • 1 like
  • 5 in conversation