Desktop productivity for business analysts and programmers

Data cleaning

Reply
Contributor
Posts: 29

Data cleaning

Hi everyone,

 

I had a data issue the other day, I resolved it using proc transpose however the results had some limitations.

 

I've recreated the data set as a basic dummy one below (I cannot share the real one):

 

application_ID status total income Type codeDesc
746281$45,7873main income
746282$4,6892other income
746283$57,8121one off income
2135141$12,7893main income
2135142$54,7132other income
2135143$13,5741one off income

 

I want to make the income types columns. I ran this code:

 

data test (drop = status type_code desc);
set dummy;
if Type_code = 1 then one_off_income = total_Income;
else if Type_code = 2 then other_income = total_Income;
else if type_code = 3 then main_income = total_income;
run;

 

and got:

 

application_ID total income one off incomemain incomeother income
74628$45,787.$45,787.
74628$4,689..$4,689
74628$57,812$57,812..
213514$12,789.$12,789.
213514$54,713..$54,713
213514$13,574$13,574..

 

How do i get it to turn out like this:

 

application_ID total income one off incomemain incomeother income
74628$45,787$57,812$45,787$4,689
213514$12,789$13,574$12,789$54,713

 

I don't want to use the transpose function becuase I have heaps of other variables and it errors if I included them all in the var step.

 

I hope the above makes sense. I appreciate any help.

 

Regards

Scott

PROC Star
Posts: 398

Re: Data cleaning

Why not use Proc summary ? Smiley Happy

 

Proc summary data=test nway missing;

 class application_id;

 var total income one off income main income other income; /**** Need change here, just guessing Smiley Happy ***/

 output out=testsum sum=;

run;

 

//Fredrik

Super User
Posts: 6,931

Re: Data cleaning

The suggestion from @FredrikE is a good fix, to take what you produced with your attempt and convert it to what you want.  However, if you want to take your original data set and patch your original program, here are the changes it would take:

 

data test (drop = status type_code desc);
set dummy;

by application_ID;

retain one_off_income other_income main_income;

if first.application_ID then call missing(one_off_income other_income main_income);
if Type_code = 1 then one_off_income = total_Income;
else if Type_code = 2 then other_income = total_Income;
else if type_code = 3 then main_income = total_income;

if last.application_id;
run;

 

To be permitted to use the BY statement, your data set has to be in sorted order before the DATA step begins.

Respected Advisor
Posts: 3,185

Re: Data cleaning

You didn't show your code using Proc Transpose, but I believe with some dataset options,  Proc transpose ALONE can produce the outcome you listed:

 

 

data have;
infile cards truncover ;
input application_ID  status  'total income'n :dollar10.  Type 'code Desc'n & $18.;
cards; 
74628 1 $45,787 3 main income 
74628 2 $4,689 2 other income 
74628 3 $57,812 1 one off income 
213514 1 $12,789 3 main income 
213514 2 $54,713 2 other income 
213514 3 $13,574 1 one off income 
;
run;
PROC TRANSPOSE DATA=WORK.have 
	OUT=WORK.want(drop=_name_ where=('main income'n ne .))
	LET
;
	BY application_ID;
	ID "code Desc"n;
	VAR "total income"n;
	COPY "total income"n;

run;
Ask a Question
Discussion stats
  • 3 replies
  • 226 views
  • 3 likes
  • 4 in conversation