Descriptors
Descriptors are the primary means of converting standard .net types into their representative numerical forms. There are primarily two ways of doing this: through attributes or declaratively.
Strong Descriptor Declaration
The first approach for preparing Descriptors is by adding attributes to properties of a C# class:
public class Iris
{
[Feature]
public decimal SepalLength { get; set; }
[Feature]
public decimal SepalWidth { get; set; }
[Feature]
public decimal PetalLength { get; set; }
[Feature]
public decimal PetalWidth { get; set; }
[StringLabel]
public string Class { get; set; }
}
This class can then be used as a means to create a Descriptor:
var description = Descriptor.Create<Iris>();
There are several feature and label attributes that can be applied to a class:
- FeatureAttribute
- StringFeatureAttribute
- DateFeatureAttribute
- GuidFeatureAttribute
- EnumerableFeatureAttribute
- LabelAttribute
- StringLabelAttribute
- GuidLabelAttribute
While this approach is the simplest, it creates a very strong dependency on this library.
Descriptor Declaration
While the second approach is type dependent, it dispenses with the attribute requirement and instead uses a tpye-safe fluent API to delare features and labels:
var d = Descriptor.For<Iris>()
.With(i => i.SepalLength)
.With(i => i.SepalWidth)
.With(i => i.PetalLength)
.With(i => i.PetalWidth)
.Learn(i => i.Class);
This example creates a descriptor with 4 features and one label. There are several variants of the With method that allow further customization:
- With(Expression<Func<T, Object>>) - Standard approach infers property name and type
- WithString(Expression<Func<T, String>>, StringSplitType, String, Boolean, String) - Assumes properties of type string and allows additional settings for expansion
- WithDateTime(Expression<Func<T, DateTime>>, DatePortion) - Assumes properties of type DateTime and allows additional settings for expansion
- WithGuid(Expression<Func<T, Guid>>) - Assumes properties of type Guid
- WithEnumerable(Expression<Func<T, IEnumerable>>, Int32) - Assumes list properties and allows additional settings for expansion
This approach is less intrusive as it only relies on the structure of your already existing data types.
Weak Descriptor Declaration
The last approach is completely agnostic to the provided data type and only attempts to read properties of the given name and type. These type of descriptors work with a number of different data types (classes, DataTable's, Expando, etc) and allow for the greatest flexibility:
var d = Descriptor.New()
.With("SepalLength").As(typeof(decimal))
.With("SepalWidth").As(typeof(double))
.With("PetalLength").As(typeof(decimal))
.With("PetalWidth").As(typeof(int))
.Learn("Class").As(typeof(string));
This style of declaration creates an empty descriptor [New( )] and adds 4 features and a label. The general style of this fluent interface is the use of the With or Learn method (which describes the name of the property that will be accessed) and the As_ method (which describes the property type along with any additional information). The With method adds a Feature to the descriptor while the Learn method overwrites the Label of the descriptor.
There are sevaral As methods available:
Properties
Utlimately the process of creating descriptors boils down to creating a collection of features as well as an optional label that describes the types that will participate in the learning algorithms. These all implement a conversion between the respective types to a double. Some of these properties when expanded also could potential become multivalued as in the case of Strings, Enumerables, and DateTimes. Here is a list of available properties:
- Property (base)
- StringProperty (multi- or single-valued depending on use)
- DateTimeProperty (multi- or single-valued depending on use)
- GuidProperty
- EnumerableProperty (multi-valued)
One could also create their own property by deriving from the Property class and appending it as a feature of the desciptor.