Any study is to observe the properties of objects in order to determine and evaluation of meaningful relationships and interactions between indicators of these properties.
Specialization includes objects that differ in their properties and in a certain way are in some respects interconnected.Meeting the challenges in the field of programming begins with a study of the subject area.
Specialization - it's part of the real world, which is infinite and contains both important and unimportant data.The researcher must be able to allocate a substantial portion of them.For example, solving the problem of the loan, will be considered all relevant information about the private life of the customer (whether there is a work with a spouse, whether a customer brings minor children, education, customer, etc.).And in order to solve other tasks related to banking activities, such data will be quite significant.The significance of the data depends on what we choose as a subject area.
The study need to create a domain model.Knowledge from different sources should be formalized.Specialization is formalized by means of any means.Funds may be very different.This can be a text description of the subject area or specialized graphical notation.With the domain model describes the processes that take place in it, as well as study the data in the area of research.
Statement of the problem is also the description of the static and dynamic behavior of objects that we investigate.Description static behavior suggests characteristics of objects and their properties.In the description of the dynamic behavior of objects in the causes of behavior.
dynamic behavior of objects is often described together with static behavior.
Sometimes domain analysis and task are combined in one step.
At identifying and analyzing the data requirements made modeling data needed for Data Mining.To do this, we study issues of the distribution of users;analytical characteristics of the system;issues of access to the data needed for the analysis.
Specialization analyzes easier and more effective when the organization has a data warehouse.However, not all companies have such a data warehouse.In this case, the source for original data is the operational databases, reference and archival materials, that is, data from existing IP (information systems).
More information may be needed from the EC leaders, internal and external sources of various paper documents, as well as specialist knowledge and / or results of the polls.
should also be aware that in the process of data preparation software developers should describe as much as possible the factors that affect the process.There may be some data encoding.For example, one of the characteristics of the client - the level of his income, which can be defined as: very low, low, medium, high, very high.In this case, it is necessary to determine the gradation level of income.
In determining the right amount of data to be considered ordering data.
In the event that they are ordered, it is necessary to find out whether included in this data set seasonal / cyclical component.When they are not ordered, ie,set of events from the database is not linked to the timeline, then in the course of collection must comply with the following rules:
1) a small number of records in the database can be the cause of the creation of an inadequate model;
2) the accuracy of the model can be improved by increasing the number of data;
3) outdated information is excluded from the set;
4) algorithms that are used to create a model with very large databases, should have the ability to scale.