Re: How to write 150 values in one column to be 150 data set

superbug · Posted 05-09-2020 09:53 PM

As in the attach "item.csv", I have a file contains two variables, "itemid" and "bpar". There are 150 values of bpar. I want to write each bpar value as a data set. The following is my trial, but there is error in the macro part. Any help would be appreciated. Thanks!

proc import out=item
datafile = "...\item.csv"
dbms = csv;
getnames = yes;
run; quit;

%macro dat_output(dat);
data %do i = 1 %to 150; bpar&i.
if count=i;
%end;
set &dat;
%do j = 1 %to 150;
output b&j.;
%end;
run;

%mend dat_output;

%dat_output(item);

superbug · Posted 05-09-2020 10:14 PM

I added a "count" variable in the "item_new.csv" file to refer to the "count" indicator in the macro. Please refer this "item_new.csv" if you could help. thanks!

Tom · Posted 05-09-2020 10:16 PM

Please show the SAS code you are trying to create, without ANY macro logic or macro variables. Just do it for 2 or 3 values.

Once you get that to work then you can begin to think about how you might use the macro language to generate that code for 3 or 150 or 10,000 values.

Is there a good reason to create 150 datasets? Or is the actual need to create 150 reports or 150 text files or 150 excel sheets?

superbug · Posted 05-09-2020 10:32 PM

@Tom

Thanks a lot for your reply!

The reason I need to create 150 data is because I need to use these values to calculate 150 probabilities in later steps.

I understood I should try 1 or 2 values before writing macro.

I tried the following, but I have been stucked by writing macro. Could you please help write them in a macro? Thanks much!

data bpar1;

set item;

if count=1;

run;

data bpar2;

set item;

if count=3;

run;

superbug · Posted 05-09-2020 10:46 PM

@Tom

I tried the following macro, but there are error messages. I don't know how to correct the code. could you please help?

%macro f1;
%do i=1 %to 2;

data par&i;
set item;
if count=&i;
end;
run;
%end;
%mend;
%f1;

error message: 161: LINE and COLUMN cannot be determined.
NOTE: NOSPOOL is on. Rerunning with OPTION SPOOL might allow recovery of the LINE and COLUMN
where the error has occurred.
ERROR 161-185: No matching DO/SELECT statement.

Tom · Posted 05-09-2020 11:11 PM

How are you calculating the probabilities? Why not just add BY COUNT to that step?

The error message is pretty clear. You have an END statement without a DO statement.

Note you can make all of the datasets in one pass (if you really need them) which can save a lot of time if the file is large.

You could use IF/THEN/ELSE pattern or SELECT /WHEN instead.

data par1 par2 par3;
  set item;
  if count=1 then output par1;
  else if count=2 then output par2;
  else if count=3 then output par3;
run;

Shmuel · Posted 05-09-2020 11:28 PM

Try next code after importing the csv file:

data _null_;
 set item;
     line = cats("data bpar",trim(_N_),"; bpar=",bpar,";run;");
     call execeute(line);
run;

or replace the import step into next code:

data _null_'
  infile ".../item.csv" dlm=",";
  input itemid $ bpar;
     line = cats("data bpar",trim(_N_),"; bpar=",bpar,";run;");
     call execeute(line);
run;

If you need the itemid too in the files then just add the

"; itemid=", itemid,

inside the line to be concatenated;

Kurt_Bremser · Posted 05-10-2020 02:44 AM

Question #1: why do you want to split the dataset in the first place. In 90% of cases, using by or where on the dataset is more appropriate.

So please tell us what you want to achieve.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

PaigeMiller · Posted 05-10-2020 06:07 AM

Hello, all, I think the problem that @superbug is trying to solve is the same one as his other thread at https://communities.sas.com/t5/SAS-Programming/how-to-write-code-for-1-parameter-logistic-model/m-p/...

In that thread, as this one, there is not yet (in my opinion) a complete explanation of all the steps. Further, he wants a macro to do this even though that is necessary. Apparently he also thinks that 150 data sets would work, when one data set would work.

Superbug, you don't need a macro, you don't need many data sets, you need one data set. A single SAS data set can do everything you want, without macros, without do loops (a data set is an implied do-loop, did you know that?) To ask for a macro in this case, or to ask for many data sets, only confuses the matter further. So again, I ask you to give a clear explanation of your problem (as I asked in your other thread), in sufficient detail such that someone can write a program to solve the problem.

--
Paige Miller

Ksharp · Posted 05-10-2020 07:23 AM

data _null_;
  set sashelp.class ;
  call execute(cat('data ',name,';set sashelp.class(where=(name="',name,'"));run;'));
run;

superbug · Posted 05-10-2020 08:58 AM

@Ksharp @PaigeMiller @Kurt_Bremser @Shmuel @Tom

Thank you all so much for your reply. I very much appreciate your time!

My condition is like this, I have a test contains 150 items, each of the item will be dichotomously coded as 0, 1, that is, if examinee get one item correct, then the examinee get 1 point, otherwise, the examinee get 0 point, so the raw score range is [0, 150] for this whole test. Before this test is being administered, I need to calculate the examinee estimated ability corresponding to each raw score. Higher raw score means examinee has higher ability, lower raw score means examinee has lower ability.

To get what I wanted, I need to calculate the probability of examinee correctly answering an item given a certain ability level. Corresponding to 150 items, there will be 150 probabilities. The summation of all those 150 probabilities is called “TCC”. I have a criteria delta=0.000001, when the difference between TCC and raw score is close to this criteria delta, the iteration will stop.

All the information I have is item difficulty parameter (please see the attached "ITRDATA.csv") and the following SPSS syntax. I need to get a raw to theta conversion table (please see the “raw to theta conversion.csv”) using SAS. In case my explanation is confusing, I think it’s better to post the whole SPSS syntax here. I am sorry for this long syntax. Since there are some common grounds between codes, I am sure your SAS experts would have a rough idea of what do these codes mean. Again very much appreciate your help!

/*the following is the SPSS syntax*/

Define !theta_raw2019 (infilenam=!TOKENS(1)/outfilenam=!TOKENS(1)/outexcel=!TOKENS(1)/ntot=!TOKENS(1)/formname=!TOKENS(1)/thetacut=!TOKENS(1))

new file.

SET MXLOOP=100000.

MATRIX.

GET IRTDATA

/FILE=!infilenam

/VARIABLES=iname bpar

/NAMES=VARNAMES /MISSING=ACCEPT /SYSMIS=0.

COMPUTE NITEM=NROW(IRTDATA).

COMPUTE CONVERT=MAKE(!ntot,4,0.00000).

LOOP K = 1 TO !ntot.

COMPUTE theta_lo={-20}. /* initial low value of theta before iteration*/

COMPUTE theta_hi={20}. /* initial high value of theta before iteration*/

COMPUTE delta={.000001}. /*criteria of terminating iteration, It is the difference between TCC and the corresponding raw score. */

COMPUTE cut={K}.

COMPUTE theta={0}.

COMPUTE low={theta_lo}.

COMPUTE hi={theta_hi}.

COMPUTE IPROB=MAKE(NITEM,1,0).

COMPUTE IINFO=MAKE(NITEM,1,0).

LOOP.

LOOP J=1 TO NITEM.

COMPUTE IPROB(J)=(exp(theta-IRTDATA(J,2))/(1+exp(theta-IRTDATA(J,2))))).

COMPUTE IINFO(J)=IPROB(J)*(1-IPROB(J)).

END LOOP.

COMPUTE TCC=CSUM(IPROB).

DO IF (tcc-cut>delta).

COMPUTE hi = theta.

COMPUTE theta=theta-ABS((theta-low))/2.

END IF.

DO IF (tcc-cut< (-1*delta)).

COMPUTE low=theta.

COMPUTE theta=theta+ABS((hi-theta))/2.

END IF.

END LOOP IF (abs(tcc-cut)<=delta).

COMPUTE INFO=CSUM(IINFO).

COMPUTE SE=sqrt(1/INFO).

COMPUTE CONVERT(K,:)={CUT,THETA,INFO,SE}.

END LOOP.

SAVE CONVERT

/OUTFILE=!outfilenam.

END MATRIX.

get file=!outfilenam.

string form (A6).

compute form=!formname.

rename variables (col1=raw) (col2=theta) (col3=info) (col4=stderror).

compute probpass=(1-cdf.normal(!thetacut,theta,stderror)).

execute.

save outfile=!outfilenam

/compressed

SAVE TRANSLATE OUTFILE=!outexcel

/KEEP=raw theta probpass info stderror form

/TYPE=XLS /VERSION=2 /MAP /REPLACE /FIELDNAMES .

!enddefine.

!theta_raw2019 infilenam='….. \ABC59.sav'

outfilenam='…. \form ABC059 raw to theta.sav'

outexcel='….\form ABC059 raw to theta.xls'

ntot=149

formname='ABC59'

thetacut=0.8554.

superbug · Posted 05-10-2020 07:06 PM

@Reeza

Thanks a lot for instructing me to get probability when theta=0, that is the initial theta starting value of iteration, I think. By the SPSS syntax above, could you please direct me how to incorporate theta_lo= -20 and theta_hi=20 this kind of information into the SAS code below to continue the iteration until reached the specified criteria?

data want;
set IRTDATA;
theta=0;
prob = exp(theta-bpar)/(1+exp(theta-bpar));
run;

Reeza · Posted 05-11-2020 11:25 AM

data want;
set IRTDATA;

prev_prob=0;

do theta=-20 to 20 by 0.5 while(condition to keep looping);
prev_prob = prob;
prob = exp(theta-bpar)/(1+exp(theta-bpar));
dif = prob - prev_prob;
end;


run;

Something like that maybe should get you started. I suspect PROC NLIN may be more suitable and efficient here but I'm not very familiar with that procedure.

superbug · Posted 05-11-2020 11:40 AM

@Reeza

Thanks much for the suggestion!

If theta interval is fixed, it will make my life easier. The challenge from me is the interval is unfixed.

I very much appreciate your time!

Registration is open

SAS Training: Just a Click Away