Преглед

  • Дата на основаване октомври 13, 1959
  • Сектори Човешки ресурси
  • Публикувани работни места 0
  • Разгледано 25

Описание на компанията

Researchers Reduce Bias in aI Models while Maintaining Or Improving Accuracy

Machine-learning designs can fail when they attempt to make predictions for people who were underrepresented in the datasets they were trained on.

For example, a model that forecasts the finest treatment choice for somebody with a chronic disease may be trained using a dataset that contains mainly male patients. That model might make incorrect forecasts for female patients when deployed in a health center.

To enhance results, engineers can attempt balancing the training dataset by removing information points up until all subgroups are represented equally. While dataset balancing is appealing, it often needs getting rid of large quantity of data, hurting the design’s general performance.

MIT researchers established a brand-new strategy that recognizes and removes specific points in a training dataset that contribute most to a design’s failures on minority subgroups. By removing far fewer datapoints than other methods, this strategy maintains the total accuracy of the design while improving its performance concerning underrepresented groups.

In addition, the method can determine surprise sources of bias in a training dataset that lacks labels. Unlabeled data are much more widespread than identified information for numerous applications.

This method might likewise be integrated with other methods to enhance the fairness of machine-learning designs released in high-stakes situations. For instance, it may at some point help make sure underrepresented patients aren’t misdiagnosed due to a biased AI model.

„Many other algorithms that attempt to resolve this problem assume each datapoint matters as much as every other datapoint. In this paper, we are showing that assumption is not real. There specify points in our dataset that are contributing to this bias, and we can discover those information points, remove them, and get much better performance,“ says Kimia Hamidieh, an electrical engineering and computer technology (EECS) graduate trainee at MIT and co-lead author of a paper on this strategy.

She composed the paper with co-lead authors Saachi Jain PhD ’24 and fellow EECS graduate trainee Kristian Georgiev; Andrew Ilyas MEng ’18, PhD ’23, a Stein Fellow at Stanford University; and senior authors Marzyeh Ghassemi, an associate professor in EECS and a member of the Institute of Medical Engineering Sciences and the Laboratory for Details and Decision Systems, and Aleksander Madry, the Cadence Design Systems Professor at MIT. The research will be provided at the Conference on Neural Details Processing Systems.

Removing bad examples

Often, machine-learning designs are trained utilizing big datasets collected from numerous sources throughout the internet. These datasets are far too big to be carefully curated by hand, so they might contain bad examples that hurt model performance.

Scientists also know that some information points impact a design’s performance on certain downstream tasks more than others.

The MIT scientists combined these two ideas into a method that recognizes and removes these bothersome datapoints. They look for to solve a problem called worst-group error, which happens when a design underperforms on minority subgroups in a training dataset.

The scientists’ brand-new strategy is driven by previous operate in which they introduced a method, called TRAK, that recognizes the most essential training examples for a specific design output.

For this brand-new method, they take inaccurate forecasts the design made about minority subgroups and utilize TRAK to identify which training examples contributed the most to that incorrect forecast.

„By aggregating this details across bad test predictions in the proper way, we are able to find the specific parts of the training that are driving worst-group accuracy down in general,“ Ilyas explains.

Then they remove those specific samples and retrain the model on the remaining data.

Since having more data generally yields better overall performance, removing just the samples that drive worst-group failures maintains the model’s total precision while improving its efficiency on minority subgroups.

A more available approach

Across 3 machine-learning datasets, their technique surpassed numerous strategies. In one circumstances, it enhanced worst-group accuracy while getting rid of about 20,000 fewer training samples than a conventional data balancing method. Their strategy also attained higher precision than methods that need making modifications to the inner operations of a model.

Because the MIT approach involves changing a dataset rather, it would be much easier for a specialist to use and can be used to many kinds of models.

It can likewise be made use of when predisposition is unknown due to the fact that subgroups in a training dataset are not labeled. By determining datapoints that contribute most to a feature the model is learning, they can the variables it is utilizing to make a forecast.

„This is a tool anyone can use when they are training a machine-learning model. They can take a look at those datapoints and see whether they are aligned with the capability they are attempting to teach the design,“ states Hamidieh.

Using the technique to discover unidentified subgroup bias would need intuition about which groups to look for, so the researchers hope to confirm it and explore it more fully through future human studies.

They likewise desire to enhance the efficiency and reliability of their technique and ensure the approach is available and user friendly for practitioners who could sooner or morphomics.science later deploy it in real-world environments.

„When you have tools that let you critically take a look at the information and figure out which datapoints are going to cause predisposition or other unfavorable behavior, it gives you a primary step toward structure models that are going to be more fair and more dependable,“ Ilyas says.

This work is funded, in part, by the National Science Foundation and the U.S. Defense Advanced Research Projects Agency.

„Проектиране и разработка на софтуерни платформи - кариерен център със система за проследяване реализацията на завършилите студенти и обща информационна мрежа на кариерните центрове по проект BG05M2ОP001-2.016-0022 „Модернизация на висшето образование по устойчиво използване на природните ресурси в България“, финансиран от Оперативна програма „Наука и образование за интелигентен растеж“, съфинансирана от Европейския съюз чрез Европейските структурни и инвестиционни фондове."

LTU Sofia

Отговаряме бързо!

Здравейте, Добре дошли в сайта. Моля, натиснете бутона по-долу, за да се свържите с нас през Viber.