DATA Step, Macro, Functions and more

Removing Missing Data/Reorganizing Missing Data

Accepted Solution Solved
Reply
New Contributor Stu
New Contributor
Posts: 3
Accepted Solution

Removing Missing Data/Reorganizing Missing Data

I have an extremely large csv dataset (in the millions) that is    formatted in an odd way.  There is only one column for variable    values, and another column with a code for what each value is    measuring.  So, for example, if I had a small dataset with two    variables A and B that was formatted the normal way:
   
    Year     A       B
    1994   10     30
    1994   20     40

    In the scheme I was given, it would be formatted in this way:
   
    Year   Value     Code     CodeDescription
    1994   10         2844                  "A"
    1994   20         2844                  "A"
    1994   30         2855                  "B"
    1994   40         2855                  "B"

    Using some simple "if" statements, I can read in the data in this    way to separate it all out:
   
    if(Code = 2844) then A = Value
    if(Code = 2855) then B = Value
   
    When I read in the data and sort it by year, it will come out in    this way:
   
    Year        A    B
    1994      10    .
    1994      20    .
    1994        .    30
    1994        .    40

    Even if I sort it by date, it will come out this way.  I would like    it to be formatted as such:
   
    Year       A     B
    1994     10    30
    1994     20    40

I've tried a bunch of different options but cannot seem to get it to sort properly.  Some of the variables have more measurements than others, and my goal is to have it aligned such that any missing variables will be at the end of the date sets, and the data can be aligned as best as it can be.  I have found that, due to the way the data is originally formatted, SAS will treat each read as an individual observation, and set any other variables to missing.
   

Does anyone know of a way I can do this?  I have tried everything from exporting it and re-reading it in to transposing twice to see if that would make any sort of difference.


Accepted Solutions
Solution
‎12-22-2011 08:55 PM
Super User
Super User
Posts: 7,060

Re: Removing Missing Data/Reorganizing Missing Data

It is not clear to me that it is valid to do what you want.  Why did you put A=10 and B=30 in the same row instead of A=20 and B=30?

If we assume that the values of the same variable for the same year can be matched in the order that they appear in the source file then we just need to introduce a new variable to let PROC TRANSPOSE do the work for us.

data have ;

  length name $34 ;

  input year value code name $ ;

  name=dequote(name);

cards;

1994 10 2844 "A"

1994 20 2844 "A"

1994 30 2855 "B"

1994 40 2855 "B"

run;

data vertical ;

  set have ;

  by year code ;

  if first.code then row=1;

  else row+1;

run;

proc sort ;

  by year row code ;

run;

proc transpose data=vertical out=want ;

  by year row ;

  id name ;

  var value ;

run;

Obs    year    row    _NAME_     A     B

1     1994     1     value     10    30

2     1994     2     value     20    40

View solution in original post


All Replies
Solution
‎12-22-2011 08:55 PM
Super User
Super User
Posts: 7,060

Re: Removing Missing Data/Reorganizing Missing Data

It is not clear to me that it is valid to do what you want.  Why did you put A=10 and B=30 in the same row instead of A=20 and B=30?

If we assume that the values of the same variable for the same year can be matched in the order that they appear in the source file then we just need to introduce a new variable to let PROC TRANSPOSE do the work for us.

data have ;

  length name $34 ;

  input year value code name $ ;

  name=dequote(name);

cards;

1994 10 2844 "A"

1994 20 2844 "A"

1994 30 2855 "B"

1994 40 2855 "B"

run;

data vertical ;

  set have ;

  by year code ;

  if first.code then row=1;

  else row+1;

run;

proc sort ;

  by year row code ;

run;

proc transpose data=vertical out=want ;

  by year row ;

  id name ;

  var value ;

run;

Obs    year    row    _NAME_     A     B

1     1994     1     value     10    30

2     1994     2     value     20    40

New Contributor Stu
New Contributor
Posts: 3

Re: Removing Missing Data/Reorganizing Missing Data

Thank you so much. This did exactly what I needed it to do.  I am an undergraduate in stats and am still grasping the more advanced portions of SAS, and this has been an extraordinary help.  By having the data in this way, I am able to perform the quantile regression analysis I have been asked to do.

Super Contributor
Posts: 1,636

Re: Removing Missing Data/Reorganizing Missing Data

data have;

input Year   Value     Code     CodeDescription $;

cards;

1994   10         2844                  "A"

1994   20         2844                  "A"

1994   30         2855                  "B"

1994   40         2855                  "B"

;

data want;

  set have;

  cd=translate(CodeDescription,'','"');

proc transpose data=want out=want11;

by cd;

var value;

run;

proc transpose data=want11 out=want12;

id cd;

var col:;

run;

proc transpose data=want out=want21;

by cd;

var year ;

run;

proc transpose data=want21 out=want22 let;

id _name_;

run;

data final(drop=_name_);

merge want22 want12;

year=ifn(year eq . ,lag1(year),year);

run;

proc print;run;

Linlin

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 214 views
  • 4 likes
  • 3 in conversation