Data Mining Project


Daniel Brockman
Dan@DanielBrockman.com


Pursuant to
Course CIS366, SAS Data Mining
Summer 2006
University of California Berkeley Extension

Jianmin Liu, Instructor

jiliu@ggu.edu

Contents

Report

Lift Charts for Final Models
cumulative
noncumulative

Notes on the project
Conclusion
On working with fellow students

Candidate Neural Network Models
Model NN-D2
Model NN-D3
Lift Chart -- cumulative
Lift Chart -- noncumulative

Project Assignment

Data

Review by Jaison K. Joseph

Cumulative Lift Chart for Final Models



Contents

Noncumulative Lift Chart for Final Models



Contents

Notes on the project

SAS Enterprise Data Miner impresses me with the ease with which one can produce useful results. This makes it appear simple. At the same time, it offers many opportunities to insert controls on the modeling process. This makes it appear complex.

Feeling constrained by time, I focused on the specifically assigned tasks, but did allow myself a bit of exploration.

1. Conclusion: I notice the Tree model gives results quite as good or better than other models, though it runs more quickly than the Neural Networks, and considers more data than the Logit model. If the cost of data processing becomes significant, the Tree model should outperform the others in return on cost. If the cost of data processing isn't significant, then the ensemble of three models, Tree-D2, Logit-D2 and NN-D3, gives the best results.

2. The Neural Network Models: Interested in Neural Networks, I created several of these models, two of which (
NN-D2 and NN-D3) were interesting, and I retained them. I used an assessment node to choose NN-D3 as the best Neural Network model.

3. Variable Transformation: The Neural Networks weren't interesting until I used the Variable Transformation node to create some binned variables, which showed up as "Ordinal" in the i network diagrams.

4. Discarded Models: In all, I created two Logit models, two Tree models and four Neural Network models, and discarded half of them because they had no predictive value and no interest.

5. Working with fellow students:I had initially agreed to collaborate with Chookij Vanatham, but he had scheduling conflicts with unrelated activities and had to withdraw from the collaboration. At that point, no opportunity remained to plan a joint project with someone else. Jaison Joseph reviewed my work at an intermediate stage. Nicholas Zemstov and Chookij asked me about structuring the project. They and Jaison inquired with me about some technical questions. I provided what suggestions I could. Jaison and Chookij and I worked together on the technical task of getting copies of the SAS software for our use.

6. Considerations for the Future: At the end of a project, I always notice what might have been done differently but for some constraint. They serve to guide future projects. One is the assessments using "profit" to discriminate the goodness of a model. In the future, I want to find the control on this behavior and investigate alternatives. Also, I've given little attention to CHAID, CART and C4.5 models, all of which have virtues about which I want to learn more. Further, the Neuron Network models' capabilities interest me immensely, and I want to explore them more.

Contents

Candidate Neural Network Models

Model NN-D2



Contents

Candidate Neural Network Models

Model NN-D3



Contents

Candidate Neural Network Models

Lift Chart -- Cumulative



Contents

Candidate Neural Network Models

Lift Chart -- Noncumulative



Contents

Top of Page | Home | Up | Daniel Brockman | Contact