
Modified Default Parameter Settings for DECTREE and REGTREE
Parameter New Value Old Value Description
minsplit 50 2 The minimum number of instances in a node required for a split. If
the number of instances in a node is less than minsplit, no further
split is applied and the node becomes a leaf.
Time Series Forecasting
Support for Time Series is introduced in this release. A time series is a sequence of numer-
ical data values, measured at successive, but not necessarily equidistant points in time.
Examples are daily stock prices, monthly unemployment counts, or annual changes in
global temperature. The two main goals of time series analysis are to understand the un-
derlying patterns that are represented by the observed data and to make forecasts. Time
Series support is implemented with the following new algorithm:
TIMESERIES (NVARCHAR(ANY) paramString)
Detailed information about this algorithm can be found in the “Time Series Forecasting”
section of the IBM SPSS In-Database Analytics Developer's Guide.
Missing Value Support in Analytics Functions
In prior releases of Netezza Analytics, analytic algorithms were unable to work with tables
that were missing values in the columns being used in the algorithm's calculation. Because
many real world databases suffer from missing values in tables, preprocessing was re-
quired in these cases to either remove rows or columns with missing values, replace miss-
ing values with some special value, or to impute the value by using the Netezza Analytics
supplied IMPUTE_DATA procedure. New to this release is an internal solution built into var-
ious algorithms to deal with the missing values. This provides:
A more convenient solution
Possibly better model quality
Possibly better predictions
The following selected algorithms are capable of building or applying models using tables
with missing values, internally handling missing values in an appropriate manner (instead
of just ignoring instances with missing values):
Decision Trees
Regression Trees
Naïve Bayes classifier
For other algorithms, if rows contain missing values, the rows are ignored, but the table is
still used. Preprocessing is still possible, using the Netezza Analytics supplied
IMPUTE_DATA procedure, but is not required. Note that preprocessing is not “automated.”
Detailed information about how missing values are handled can be found in the IBM SPSS
In-Database Analytics Developer's Guide.
Changes to the KMEANS Algorithm
The following new features were added to the existing KMEANS algorithm:
12
Kommentare zu diesen Handbüchern