Rare events risk assessment: how data science can help business thrive?

We are constantly searching for cutting-edge informative data, which provide our clients with the opportunity to assess credit and operational risks really effectively. We discover new attributes, which improve quality of decision-making process and also pay great attention to design mechanics and ensure transparency of the results.

This presents a major challenge: the assessment of markers, which have been found and the way to maximise their potential in models. This problem is connected to the fact, that most part of those markers is related to rare events group and their standard assessment is much harder than usual or is nearly impossible in terms of standard methods and procedures.

In 2015-2017 JuicyTeam had designed rare events risk assessment technology which allowed to deliver interpretable and stable result independent from geography and assessment stage within the frames of credit conveyor or decision making process.

What are the main points of JuicyScore approach?

Our methodology is divided in 2 parts:

  1. Methods of search and localisation of new markers;
  2. Methods of valuation of the markers in risk assessment process.

The first part is devoted to rare events search and determining based on precise user device authentication as well as determining of various parameters characterising user device, its environment and methods of its using. We collect more than 65 000 events or the so-called data points on user device during one online session, most part of those are aggregated in data sets on a device.

Our methodology is based on the division of the entire session space into 4 groups, which allows to prioritize the search and estimate the probability / density of new markers discovery:

  • FAP (First attempt fraud) is a group where we identify fraud basing on intentions for devices that have not been spotted before within internet session framework;
  • SAP (Secondary Attempt Fraud) is a group where we identify fraud basing on intentions for devices that have not been spotted before within the framework of the online resource;
  • HRA (High Risk Applications) is a group that does not fall into the above mention two groups, but the risk in this group is rather high;
  • Other sessions with low fraud risk (with the advent of new risk assessment technologies some sessions can be removed from this group and may be added to FAP/SAP/HRA groups).

The second part of our methodology is related to risk assessment technologies / rare events scoring or markers. This technology is based on the complex of markers assessment connected with physical meaning and its parameters which may be logically grouped in one of the directions. We allocate 10 such directions - indexed IDX 1-10.

What's the use of analytical work? Every single index (both - statistically and logically) covers all the significant markers in one direction and may be used in classical methods of mathematical statistics or data science. Ok, but why IDX variable are so helpful in risk assessment?

The essential part of IDX indexes is a combination of rare events and factors related to them, collected by means of Deep Machine Learning algorithms to one variable, which may be used either for modeling or for integration to financial institutions decision-making system.

It should be noted that IDX indexes were created as Gaussian variables. Such approach has a number of bonuses: on the one hand they remain statistically significant in any other model, and, on the other hand, they allow structuring the entire probability space no matter what type of fraud events we have to deal with.

We are going to share rather interesting data on ****10 indexes as well as best practices for their application in risk management. It should be noted that we provide a “clean sample”, which implies the data that did not reveal a high probability of fraud through technical manipulations with the device or significant anomalies in the Internet connection.

IDX1: Stop markers

IDX1 is a combination of 50+ rare events, which show high probability of fraud through technical manipulation of the device. This variable includes the whole variety of device randomization tools, techniques of interference into "digital fingerprint" as well as determines the most dangerous markers of user high risk behaviour and network connection markers. The variable can be used both in rules and also as a component of a fraud prevention model to identify the most dangerous customer segments. The risk level raises along with the parameter value, high values may be used as filters for automatic denial.

The graph shows aggregated data for companies that have already been using IDX 1 in their decision making model.

Apart from the use of randomizers, which are identified with IDX1 as well as with the separate stop factors - for example, copying of another device's session (vector's variable session clone), identification of anomalies in the header of the web session (vector's variable UserAgent Issue), indication of manipulation with the color palette (variable Canvas blocker) - also great attention should be paid to browser or operating system anomalies.

IDX2 - User behaviour markers

This aggregated variable is a combination of various user behaviour markers on online business web resource. In JuicyScore vector there are dozens of markers which are in some way related to user behaviour and the main issue in terms of variable construction is to find such stable markers, which would allow, aggregated in one, to identify high risk segments regardless of online business operating geography.

Variable IDX2 is based on the factors related to various user behavior or device utilization categories. On the one hand, it incorporates many factors connected with frequency characteristics. For example, the number of applications or requests for financial product obtaining from one user or the same device with a certain time periodization or without it over the entire history. On the other hand, there are also such parameters included into this variable, which identify stability or, on the contrary, data variability used in a credit application. A wide variety of such data on the same device or related to one virtual user indicates a high operational risk.

At the same time high frequency of applications without data manipulation may indicate a higher credit risk (the so-called credit shopping, when a borrower applies for multiple credits to many financial institutions within a short period of time). The presence of both - high frequency and high variability of data on the same device or virtual user - is a strong sign of high operational risk.

On the graph below you can see an approximate generalized graph of the change in the level of risk based on ranges of the variable.

Moreover, this variable also includes a number of factors regarded as high risk markers of user behaviour, which are not related to those two categories. For example, the way of form application filling, device utilization etc. Basically, we can see the combination of rare events of medium and high risk, which, integrated together in a certain way, may be used in a decision-making system and models, constructed by means of classical Gaussian methods.

IDX 3: Device Markers

IDX3 is a combination of secondary risk markers and device anomalies, where every single anomaly may reflect possible risk, which should be taken into account during the borrower's verification. The combination of such markers highlights a high risk segment. Similarly to IDX1, the risk level is raising along with the variable value, and high values may be used for automatic denial.

Cross-tabulation of various indexes values may also provide some important information on risk level. For example, if IDX1 and IDX 3 are equal to 0, thus there is a high probability that no randomisation and virtualization are identified, and we are dealing with a physical device.

IDX4: Internet connection markers

IDX4 is a combination of network parameters and anomalies, the higher values are may be used for fraud detection purposes.

This index takes into account such indicators as the type of IP used, device time zone and its match with the real local time zone, the use of DNS configurations, etc.

IDX 5 - Device quality index

Speaking about the operational risk assessing, the prior task is to reject applications of users with high risk of a loan defaulting and non-payment risk, while the main objective of credit risk assessment is to find those segments of users, which can be provided with financial products with the right and accurate parameters. Aggregate variable IDX5 falls into the second category. Using this variable financial institutions are able to improve credit risk segmentation of the incoming flow significantly. It is particularly important when there is a great lack of strong data institutions or quality of the data provided by them is rather low.

Device quality function is the level of its cost, which is affected by the following data categories: device category (desktop or mobile), its technical metrics aggregate (storage capacity, number of cores, storage quality etc.) and device manufacturer (famous brand of the device or a no-name). It is important to note that devices with certain technical characteristics anomalies are not included into this index in order to convey a higher orthogonality level with the other aggregate variables IDX.

Every device has a big range of technical metrics and parameters, which affect on its quality and may be used also in order to assess credit risk. That is why developing this index it was highly important to identify such metrics and model factors in order to ensure the stability of value range of every factor, keeping in mind that we also need to provide the stability of value range of the index itself and to enhance its sharing ability. Speaking about stability it is also important to note that it should be ensured in time and also among all the geographies of our clients’ businesses.

Within the frames of Device Quality Index value interpretation, the part of the flow with low values of this Index identifies a segment with high credit risk and low level of disposable income. The part of the flow with high values of this Index makes it possible to identify segments with a low level of credit risk.

IDX 6 - Internet Infrastructure Quality

Internet infrastructure quality is a good component for credit risk assessment and social fraud detection, may be used as a parameter for credit score model. A value higher than or equal to 2 usually highlights more premium sub-segment in the application flow. This index includes such indicators as the IP of the country and the region, Internet infrastructure quality in the region, connection speed and quality etc.

IDX 7: Device Applications Quality

The variable is valid only for mobile SDK and represents an aggregated assessment of the applications installed on the applicant's device, it is recommended to be used to identify credit risk and social default risk. Please note that to evaluate this setting, you must configure the collection of the final list of applications as part of the SDK connection.

IDX8: Device Credentials Variability

Device credentials variability, aggregated assessment of applicant’s credentials on credit applications, an indicator of loan application data manipulation. The higher the value, the lower the risk. Thus, in this index, we have included such indicators as repeated multiple digits of the phone/user region, as well as a repeated device fingerprint.

IDX9: Device Applications Risk

Device applications risk is the aggregated assessment of the applications installed on the device. The risk grade is showing the level of risk of the devices installed and identified. Please note that in order to evaluate this setting you must configure the collection of the final list of applications as part of the SDK connection. We at JuicyScore perform detailed analysis of a wide range of applications that can have a significant impact on operational or credit risk (for example, remote access applications, malicious applications, applications with a toxic reputation, etc.).

We constantly expand our libraries with descriptions and characteristics of new applications. It is also important to note that we do not and have not sought to assess the risk of each of the available applications, as we consider this to be excessive.

The value of the variable shows the aggregate risk level of the applications installed on the applicant's device.

IDX10: EI estimation

Income estimation index shows the grade of disposable income. The higher the value the higher the income grade, the lower the risk and vice versa. This parameter is highly recommended for use in the assessment of operational and credit risk.

Why our methodology is so effective?

In order to fight fraud easily and effectively, you need to have as many useful and functional tools as possible. However, everyone knows that in risk management there are no universal solutions or tools that would have the same efficiency and payback for all companies. Our solution has a number of advantages that distinguish it from others:

  • Our methodology covers the entire probabilistic space of events;
  • The method gives a multiple economic payback: depending on the value of the asset and the type of risk, as well as the place in the credit pipeline;
  • Installing JuicyScore fully satisfies the need to obtain the maximum set of data for the Internet session, device and online behavior of the user.

The IDX indexes may also be set up as stop markers to cut off the flow in the red zone. Behind each red zone is the result of complex calculations, the results of numerous use cases studies in 30 countries, dozens of companies.

We pay great attention to the mechanics of our indexes design and we keep looking for the most useful and informative data that helps to manage credit and operational risks. JuicyScore allows to simplify the process of using the latest Data Science cutting edge technologies in risk management and anti-fraud.

If you would like to know more about our approach or to get a free consultation about risk level in your business please contact help@juicyscore.com or info@juicyscore.com.