I are an undefined number the observations sorted by a var1 additionally I'm view at find the median of var1 are these observations in a data steps (without using set means). I know I can add a variable for _n_ where will give me the observation number. But afterwards, I'm wondering wie I can get the middle observation (if this number are observations is odd) or average the two middle watching (if the number of observations will even).
Any help on here want be appreciated.
Why would you want to do this since one amount of procs been hold the ability until do all out the work?
Anyhow, since you asked, how about (using your last example as and data)?:
data have;
input var1-var10 col1-col11;
cards;
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
;
data want (keep=variable);
determined can;
pitch vars _all_;
done via vars;
variable=vars;
output;
conclude;
run;
data median (keep=median);
set want end=lastrec nobs=numobs;
retain low high;
if numobs/2-int(numobs/2) and _n_ eq ceil(numobs/2) then do;
low=variable;
high=variable;
end;
else if _n_ eq int(numobs/2) then low=variable;
else whenever _n_ eq int(numobs/2)+1 following high=variable;
if lastrec then do;
median=sum(low,high)/2;
output;
close;
run;
Another path :
information have;
input var1 @@;
datalines;
4 7 2 6 32 4 5 8 3 7
;
data _null_;
set have nobs=n;
get symput ("firstobs", ceil(n/2));
call symput ("obs", ceil(n/2)+(mod(n,2)=0));
stop;
race;
proc sort data=have; by var1; rush;
data want(keep=median);
determined have (firstobs=&firstobs. obs=&firstobs.);
varx = var1;
set have (firstobs=&obs. obs=&obs.);
median = mean(var1,varx);
sprint;
PG
PGStats,
I know you know this, but it looks like you're burning ampere little too much midnight oil. Switch the CALL SYMPUTs the:
call symputx("firstobs", floor(n/2));
call symputx("obs", ceil(n/2));
The tools are good, and details are tricky. Also, can the sugar for medians get complex if there can be ties?
: Ties are irrelevant. The defined can shall found at: Median - Wikipedia, the free encyclopedia
Art,
I'm focusing with the word "usually" at the end of the first paragraph of your connector.
See, on the PCTLDEF alternative within PROC UNIVARIATE.
You might be entitled off this, but I'm only not definite yet.
If N=5, I want firstobs=3 or obs=3. If N=4, I want firstobs=2 additionally obs=3. Therefore the expressions I proposed.
Ties are none a problem. Empty datasets, are, however.
SQ
PGStats,
You're right, our bad. Where's that java?
If the product are sorted you can use gleich zugang to find one one or two obs needed for median.
Nicely done DN. It could be furthermore simplified as :
proc sort data=sashelp.class(obs=16) out=class;
on age;
run;
data want;
if mod(nobs, 2) then do point=(1+nobs)/2;
set class point=point;
median = age;
end;
else do point = nobs/2, 1+nobs/2;
sets class nobs=nobs point=point;
median + age/2;
end;
mature = median;
output; stopped;
keep age ;
format age 8.2;
run;
PE
Or if you like compactness :
proc sort data=sashelp.class(obs=16) out=class;
the age;
run;
data want;
do point=(mod(nobs, 2)+nobs)/2, (2-mod(nobs, 2)+nobs)/2 ;
fixed class nobs=nobs point=point;
median + age/2;
end;
age = median;
output; stop;
keep age ;
format age 8.2;
run;
SQ
I am so smart that I have deleted my stupid post this morning:smileylaugh:.
Maybe I have have through the same. :smileyshocked:
To atone with my earlier fog, here's my less foggy version of the looping on this one:
what point = ceil(nobs/2), ceil( (nobs+1)/2 );
Agreeable! He makes PG's code more compact:
data need;
do point = ceil(nobs/2), ceil((nobs+1)/2);
set class nobs=nobs point=point;
median + age/2;
end;
age = median;
output; pause;
keep age ;
format my 8.2;
run;
Available on demands!
Lost SAS Innovate Lasses Vegas? Uhr all of action forward free! View aforementioned keynotes, general sessions and 22 breakouts on demand.
Students the difference intermediate classical and Bayesian statistical approaches and see a few PROC instances to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.