EDOBE XDOM PMML Bedienerhandbuch PDF herunterladen (Seite 11)

Modified Default Parameter Settings for DECTREE and REGTREE

Parameter New Value Old Value Description

minsplit 50 2 The minimum number of instances in a node required for a split. If

the number of instances in a node is less than minsplit, no further

split is applied and the node becomes a leaf.

Time Series Forecasting

Support for Time Series is introduced in this release. A time series is a sequence of numer-

ical data values, measured at successive, but not necessarily equidistant points in time.

Examples are daily stock prices, monthly unemployment counts, or annual changes in

global temperature. The two main goals of time series analysis are to understand the un-

derlying patterns that are represented by the observed data and to make forecasts. Time

Series support is implemented with the following new algorithm:

TIMESERIES (NVARCHAR(ANY) paramString)

Detailed information about this algorithm can be found in the “Time Series Forecasting”

section of the IBM SPSS In-Database Analytics Developer's Guide.

Missing Value Support in Analytics Functions

In prior releases of Netezza Analytics, analytic algorithms were unable to work with tables

that were missing values in the columns being used in the algorithm's calculation. Because

many real world databases suffer from missing values in tables, preprocessing was re-

quired in these cases to either remove rows or columns with missing values, replace miss-

ing values with some special value, or to impute the value by using the Netezza Analytics

supplied IMPUTE_DATA procedure. New to this release is an internal solution built into var-

ious algorithms to deal with the missing values. This provides:

 A more convenient solution

 Possibly better model quality

 Possibly better predictions

The following selected algorithms are capable of building or applying models using tables

with missing values, internally handling missing values in an appropriate manner (instead

of just ignoring instances with missing values):

 Decision Trees

 Regression Trees

 Naïve Bayes classifier

For other algorithms, if rows contain missing values, the rows are ignored, but the table is

still used. Preprocessing is still possible, using the Netezza Analytics supplied

IMPUTE_DATA procedure, but is not required. Note that preprocessing is not “automated.”

Detailed information about how missing values are handled can be found in the IBM SPSS

In-Database Analytics Developer's Guide.

Changes to the KMEANS Algorithm

The following new features were added to the existing KMEANS algorithm:

1 2 ... 6 7 8 9 10 11 12 13 14 15 16 ... 27 28

Keine Kommentare

EDOBE XDOM PMML Bedienerhandbuch Seite 11