Solved: Re: Creating a data set of a smaller group of fields based on grouping...

ADouglas · Posted 01-21-2019 12:58 PM

Hi All,

I have a contrived data set that has 9 fields of responses to examination questions. The questions have unique ids of 1, 2, 3, 4, 5, 6, 7, 8 & 9.

The 9 questions are grouped into blocks that are smaller modules of 3 questions. There are 3 blocks/modules of questions. The block/module ids are A, B, & C.

The first record of the data set includes the question ids, and the second record of the data set includes the block ids.

The information based on the first two records shows the specific questions grouped into each block/module.

The remaining records include examinees' responses to the questions, with 1 being answered correctly and 0 being answered incorrectly.

This is how the data set looks (Data Have):

1 2 3 4 5 6 7 8 9

A A A B B B C C C

1 0 1 0 1 0 1 0 1

0 1 1 1 0 0 1 1 0

1 1 1 1 0 0 0 0 1

Etc.

What I want to do is to create three separate data sets.

I want a data set that has only the questions in block/module A.

I want a data set that has only the questions in block/module B.

I want a data set that has only the questions in block/module C.

This is what I want (Data Want):

Output data set 1.

1 2 3

A A A

1 0 1

0 1 1

1 1 1

Etc.

Output data set 2.

4 5 6

B B B

0 1 0

1 0 0

Etc.

Output data set 3.

7 8 9

C C C

1 0 1

1 1 0

0 0 1

Etc.

I think I could transpose the original data set, and work with it to select records for each block/module to create three output files, and then transpose those output files back so that the block/module id is in the first record for each of the three files. But, that would require a number of transpose steps.

I am wondering if there is code that would be more concise & elegant that I can use in one DATA step that can be used together with a SAS macro. I need the code to generalize to a more realistic situation of starting with a data set that has 3000 fields of responses to exam questions grouped into 90 blocks/modules.

All I am hoping is that someone from the community would have a pretty decent idea of how to program the code for the single DATA step. I can then finish the code for the macro so that different block/module ids are specified and the DATA step can be executed for each block/module.

Any ideas?

Thanks!

novinosrin · Posted 01-21-2019 03:54 PM

Hi @ADouglas Have fun with hashes

data have;
input A1 A2 A3 B1 B2 B3 C1 C2 C3;
cards;
1  2 3  4 5 6  7 8  9
1  0 1  0 1 0  1 0  1 
0  1 1  1 0 0  1 1  0
1  1 1  1 0 0  0 0  1
;
dm log 'clear';

data _null_ ;
  if _n_ = 1 then do ;
    declare hash h ;   
    declare hash hoh() ;  
    hoh.defineKey ("j") ;
    hoh.defineData ("h", "j") ;
	   hoh.defineDone () ;
	end ;
set have end = lr ;  
array t(*) a1--c3;
do i= 1 to dim(t);
k=mod(i,3);
if k=1 then 
	do;
		j=first(vname(t(i)));
		if hoh.find() ne 0 then 
		do ;  
   			 h = _new_ hash (multidata:"Y",ordered:'y') ;  
			 h.defineKey ('_n_') ; 
		     do  k1=i to i+2;
	    		h.defineData (vname(t(k1))) ;
			end;
			h.defineDone () ;
		    hoh.add() ; 
	   end; 
	  h.add(key:_n_, data:t(i),data:t(i+1),data:t(i+2)) ;  
   end ;
end;
if lr ;
  declare hiter ihoh ("hoh") ;  
  do while (ihoh.next() = 0) ;
     h.output (dataset:j) ;
  end ;
run ;

NOTE: If memory is an issue, i can't help. A college student's help can go thus far only with the limited resources I have.

View solution in original post

PeterClemmensen · Posted 01-21-2019 01:27 PM

How is the data stored? A SAS data set can not have 3 variables with the name A.

The DATA to DATA Step Macro
Blog: SASnrd

ADouglas · Posted 01-21-2019 01:33 PM

Thanks for asking the question, and clarifying a detail I overlooked.

Let's say record 1 has the question ids, and record 2 has the block/module ids.

Data I have:

1 2 3 4 5 6 7 8 9

A A A B B B C C C

1 0 1 0 1 0 1 0 1

0 1 1 1 0 0 1 1 0

1 1 1 1 0 0 0 0 1

Etc.

Data I want.

Output data set 1.

1 2 3

A A A

1 0 1

0 1 1

1 1 1

Etc.

Output data set 2.

4 5 6

B B B

0 1 0

1 0 0

Etc.

Output data set 3.

7 8 9

C C C

1 0 1

1 1 0

0 0 1

Etc.

Reeza · Posted 01-21-2019 01:38 PM

Is your data in SAS data sets already or in text files to start with?

ADouglas · Posted 01-21-2019 01:42 PM

The data will come to me as a .csv file.

As per Draycut's question, I can have the IT professional store the block/module ids as the 2nd record, and the item ids as the first record, if need be?

Thanks

Reeza · Posted 01-21-2019 01:48 PM

Will you always have that many columns or is that an example? Is this a more than one time process or one time?

I'd probably build a dynamic process, first read the first two records, determine how many you need and then read the rest. But it depends on how generic it needs to be. Not really sure why you need to split it either.

@ADouglas wrote:

The data will come to me as a .csv file.

As per Draycut's question, I can have the IT professional store the block/module ids as the 2nd record, and the item ids as the first record, if need be?

Thanks

ADouglas · Posted 01-21-2019 01:52 PM

I receive semi-annually a data set with over 3000 variables. The code needs to be executed each time.

The blocks/modules of questions need to be put into separate output files, and I have to conduct analysis for each file.

Thanks,

Aaron

ADouglas · Posted 01-21-2019 02:01 PM

Thanks for clarifying. I forgot.

novinosrin · Posted 01-21-2019 03:54 PM

Hi @ADouglas Have fun with hashes

data have;
input A1 A2 A3 B1 B2 B3 C1 C2 C3;
cards;
1  2 3  4 5 6  7 8  9
1  0 1  0 1 0  1 0  1 
0  1 1  1 0 0  1 1  0
1  1 1  1 0 0  0 0  1
;
dm log 'clear';

data _null_ ;
  if _n_ = 1 then do ;
    declare hash h ;   
    declare hash hoh() ;  
    hoh.defineKey ("j") ;
    hoh.defineData ("h", "j") ;
	   hoh.defineDone () ;
	end ;
set have end = lr ;  
array t(*) a1--c3;
do i= 1 to dim(t);
k=mod(i,3);
if k=1 then 
	do;
		j=first(vname(t(i)));
		if hoh.find() ne 0 then 
		do ;  
   			 h = _new_ hash (multidata:"Y",ordered:'y') ;  
			 h.defineKey ('_n_') ; 
		     do  k1=i to i+2;
	    		h.defineData (vname(t(k1))) ;
			end;
			h.defineDone () ;
		    hoh.add() ; 
	   end; 
	  h.add(key:_n_, data:t(i),data:t(i+1),data:t(i+2)) ;  
   end ;
end;
if lr ;
  declare hiter ihoh ("hoh") ;  
  do while (ihoh.next() = 0) ;
     h.output (dataset:j) ;
  end ;
run ;

NOTE: If memory is an issue, i can't help. A college student's help can go thus far only with the limited resources I have.

ADouglas · Posted 01-21-2019 04:02 PM

This works, bravo bravo.

However, I will need to study the code to understand it, and I'm not sure how fast it will be when there are 3000 plus variables and 90 plus blocks/modules.

But, nonetheless, I'm sure it will come in handy.

The solution does require a change for inputting the data, but nonetheless, it get's the job done.

Thanks!!

novinosrin · Posted 01-21-2019 04:07 PM

Hello @ADouglas If you could bear with me until tomorrow. I can possibly effiiciently test at my college lab. Today it's a holiday here in Chicago and I would only go back to Depaul lab tomorrow morning.

So, if it's not urgent, I can efficiently test with rich samples and let you know or even make changes. Just a one caveat, I do have a presentation on a case study tomorrow that I hope to pass as I haven't even started yet lol, so the timing of my response may not be consistent.

Nonetheless, We the community(you included) are a SAS family. Feel free to reach out to us.

PS I will explain how the code works tomorrow. Today I am little too busy preparing for the case study. Bear with me sir!

ADouglas · Posted 01-21-2019 04:16 PM

Please finish your case study; and only if you have time, followup with me here.

There is only one issue with your code, I think.

The number of modules can very year to year; and for a given year, the number of questions between modules varies.

My example didn't show these complexities. But, what you've done will work for a set of modules that have a fixed number of questions. I could very well be presented with such a scenario for another project.

I applaud your interest and motivation to respond while being a college student.

Take Care

novinosrin · Posted 01-21-2019 04:19 PM

Pleasure will be mine. In the mean time, can you please post a "comprehensive sample" explaining your need, so that we can avoid going back and forth. I could modify the code in one shot.

ADouglas · Posted 01-21-2019 04:28 PM

Hi Novinosrin,

As you requested, below is a modification to the example that shows how the number of questions between modules vary.

Data I have:

1 2 3 4 5 6 7 8 9

A A A A B B C C C

1 0 1 0 1 0 1 0 1

0 1 1 1 0 0 1 1 0

1 1 1 1 0 0 0 0 1

Etc.

Data I want.

Output data set 1.

1 2 3 4

A A A A

1 0 1 0

0 1 1 1

1 1 1 1

Etc.

Output data set 2.

1 2

B B

1 0

0 0

Etc.

Output data set 3.

7 8 9

C C C

1 0 1

1 1 0

0 0 1

Etc.

But do your homework first !!

Thanks

Reeza · Posted 01-21-2019 04:31 PM

Do you actually have the 123456789 in your data set or are you adding that for clarity?

Creating a data set of a smaller group of fields based on grouping labels stored in first record

Re: Creating a data set of a smaller group of fields based on grouping labels stored in first record

Re: Creating a data set of a smaller group of fields based on grouping labels stored in first record

Re: Creating a data set of a smaller group of fields based on grouping labels stored in first record

Re: Creating a data set of a smaller group of fields based on grouping labels stored in first record

Re: Creating a data set of a smaller group of fields based on grouping labels stored in first record

Re: Creating a data set of a smaller group of fields based on grouping labels stored in first record

Re: Creating a data set of a smaller group of fields based on grouping labels stored in first record

Re: Creating a data set of a smaller group of fields based on grouping labels stored in first record

Re: Creating a data set of a smaller group of fields based on grouping labels stored in first record

Re: Creating a data set of a smaller group of fields based on grouping labels stored in first record

Re: Creating a data set of a smaller group of fields based on grouping labels stored in first record

Re: Creating a data set of a smaller group of fields based on grouping labels stored in first record

Re: Creating a data set of a smaller group of fields based on grouping labels stored in first record

Re: Creating a data set of a smaller group of fields based on grouping labels stored in first record

Re: Creating a data set of a smaller group of fields based on grouping labels stored in first record

Classroom Training Available!