Next Page: 10000

          prostheticknowledge: Neural EnhanceProof of concept coding...      Cache   Translate Page      








prostheticknowledge:

Neural Enhance

Proof of concept coding project from Alex J. Champandard can enhance details in images (like CSI or Blade Runner) using neural networks:

As seen on TV! What if you could increase the resolution of your photos using technology from CSI laboratories? Thanks to deep learning and #NeuralEnhance, it’s now possible to train a neural network to zoom in to your images at 2x or even 4x.  You’ll get even better results by increasing the number of neurons or training with a dataset similar to your low resolution image.

The catch? The neural network is hallucinating details based on its training from example images. It’s not reconstructing your photo exactly as it would have been if it was HD. That’s only possible in Holywood — but using deep learning as “Creative AI” works and its just as cool!

More Here


           AI neural network builds new virtual cities by studying real ones       Cache   Translate Page      

A Nvidia team has taught a neural network system to generate virtual worlds based on real ...#source%3Dgooglier%2Ecom#https%3A%2F%2Fgooglier%2Ecom%2Fpage%2F%2F10000

To make virtual worlds feel more immersive, artists need to fill them with buildings, rocks, trees, and other objects. Creating and placing all those virtual objects quickly adds up to quite a high development time and cost. But now, researchers at Nvidia have taught artificial intelligence systems how to generate and detail new virtual cityscapes, by training neural networks on real video footage.

.. Continue Reading AI neural network builds new virtual cities by studying real ones

Category: Computers

Tags:
          Product Marketing Manager, DGX Systems - NVIDIA - Santa Clara, CA      Cache   Translate Page      
You have experience with deep learning, data science, and NVIDIA GPUs. Artificial Intelligence (AI) is rapidly growing in importance and NVIDIA is at the...
From NVIDIA - Fri, 30 Nov 2018 19:54:25 GMT - View all Santa Clara, CA jobs
          The empirical characteristics of human pattern vision defy theoretically-driven expectations      Cache   Translate Page      
by Peter Neri Contrast is the most fundamental property of images. Consequently, any comprehensive model of biological vision must incorporate this attribute and provide a veritable description of its impact on visual perception. Current theoretical and computational models predict that vision should modify its characteristics at low contrast: for example, it should become broader (more … Continua la lettura di The empirical characteristics of human pattern vision defy theoretically-driven expectations
          Metabolic syndrome in pregnancy and risk for adverse pregnancy outcomes: A prospective cohort of nulliparous women      Cache   Translate Page      
by Jessica A. Grieger, Tina Bianco-Miotto, Luke E. Grzeskowiak, Shalem Y. Leemaqz, Lucilla Poston, Lesley M. McCowan, Louise C. Kenny, Jenny E. Myers, James J. Walker, Gus A. Dekker, Claire T. Roberts Background Obesity increases the risk for developing gestational diabetes mellitus (GDM) and preeclampsia (PE), which both associate with increased risk for type 2 … Continua la lettura di Metabolic syndrome in pregnancy and risk for adverse pregnancy outcomes: A prospective cohort of nulliparous women
          Raltegravir-intensified initial antiretroviral therapy in advanced HIV disease in Africa: A randomised controlled trial      Cache   Translate Page      
by Cissy Kityo, Alexander J. Szubert, Abraham Siika, Robert Heyderman, Mutsa Bwakura-Dangarembizi, Abbas Lugemwa, Shalton Mwaringa, Anna Griffiths, Immaculate Nkanya, Sheila Kabahenda, Simon Wachira, Godfrey Musoro, Chatu Rajapakse, Timothy Etyang, James Abach, Moira J. Spyer, Priscilla Wavamunno, Linda Nyondo-Mipando, Ennie Chidziva, Kusum Nathoo, Nigel Klein, James Hakim, Diana M. Gibb, A. Sarah Walker, Sarah L. … Continua la lettura di Raltegravir-intensified initial antiretroviral therapy in advanced HIV disease in Africa: A randomised controlled trial
          Gut microbiota diversity across ethnicities in the United States      Cache   Translate Page      
by Andrew W. Brooks, Sambhawa Priya, Ran Blekhman, Seth R. Bordenstein Composed of hundreds of microbial species, the composition of the human gut microbiota can vary with chronic diseases underlying health disparities that disproportionally affect ethnic minorities. However, the influence of ethnicity on the gut microbiota remains largely unexplored and lacks reproducible generalizations across studies. … Continua la lettura di Gut microbiota diversity across ethnicities in the United States
          Amino acid residues in five separate HLA genes can explain most of the known associations between the MHC and primary biliary cholangitis      Cache   Translate Page      
by Rebecca Darlay, Kristin L. Ayers, George F. Mells, Lynsey S. Hall, Jimmy Z. Liu, Mohamed A. Almarri, Graeme J. Alexander, David E. Jones, Richard N. Sandford, Carl A. Anderson, Heather J. Cordell Primary Biliary Cholangitis (PBC) is a chronic autoimmune liver disease characterised by progressive destruction of intrahepatic bile ducts. The strongest genetic association … Continua la lettura di Amino acid residues in five separate HLA genes can explain most of the known associations between the MHC and primary biliary cholangitis
          TAMMiCol: Tool for analysis of the morphology of microbial colonies      Cache   Translate Page      
by Hayden Tronnolone, Jennifer M. Gardner, Joanna F. Sundstrom, Vladimir Jiranek, Stephen G. Oliver, Benjamin J. Binder Many microbes are studied by examining colony morphology via two-dimensional top-down images. The quantification of such images typically requires each pixel to be labelled as belonging to either the colony or background, producing a binary image. While this … Continua la lettura di TAMMiCol: Tool for analysis of the morphology of microbial colonies
          Performance of convolutional neural networks for identification of bacteria in 3D microscopy datasets      Cache   Translate Page      
by Edouard A. Hay, Raghuveer Parthasarathy Three-dimensional microscopy is increasingly prevalent in biology due to the development of techniques such as multiphoton, spinning disk confocal, and light sheet fluorescence microscopies. These methods enable unprecedented studies of life at the microscale, but bring with them larger and more complex datasets. New image processing techniques are therefore … Continua la lettura di Performance of convolutional neural networks for identification of bacteria in 3D microscopy datasets
          A comprehensive ensemble model for comparing the allosteric effect of ordered and disordered proteins      Cache   Translate Page      
by Luhao Zhang, Maodong Li, Zhirong Liu Intrinsically disordered proteins/regions (IDPs/IDRs) are prevalent in allosteric regulation. It was previously thought that intrinsic disorder is favorable for maximizing the allosteric coupling. Here, we propose a comprehensive ensemble model to compare the roles of both order-order transition and disorder-order transition in allosteric effect. It is revealed that … Continua la lettura di A comprehensive ensemble model for comparing the allosteric effect of ordered and disordered proteins
          Neural responses to natural and model-matched stimuli reveal distinct computations in primary and nonprimary auditory cortex      Cache   Translate Page      
by Sam V. Norman-Haignere, Josh H. McDermott A central goal of sensory neuroscience is to construct models that can explain neural responses to natural stimuli. As a consequence, sensory models are often tested by comparing neural responses to natural stimuli with model responses to those stimuli. One challenge is that distinct model features are often … Continua la lettura di Neural responses to natural and model-matched stimuli reveal distinct computations in primary and nonprimary auditory cortex
          Japanese deep learning startup Abeja raises series C extension round from Google      Cache   Translate Page      
See the original story in Japanese. Tokyo-based Abeja, the company offering solutions for retail stores to improve customer path or traffic based on image analysis and machine learning technologies, announced on Tuesday that it has secured funding from Google in a series C extension round. The company says this brings their total equity funding to
          Solving Problems with Data Science      Cache   Translate Page      

A challenge that I’ve been wrestling with is the lack of a widely populated framework or systematic approach to solving data science problems. In our analytics work at Viget, we use a framework inspired by Avinash Kaushik’s Digital Marketing and Measurement Model. We use this framework on almost every project we undertake at Viget. I believe data science could use a similar framework that organizes and structures the data science process.

As a start, I want to share the questions we like to ask when solving a data science problem. Even though some of the questions are not specific to the data science domain, they help us efficiently and effectively solve problems with data science.

Business Problem

What is the problem we are trying to solve?

That’s the most logical first step to solving any question, right? We have to be able to articulate exactly what the issue is. Start by writing down the problem without going into the specifics, such as how the data is structured or which algorithm we think could effectively solve the problem.

Then try explaining the problem to your niece or nephew, who is a freshman in high school. It is easier than explaining the problem to a third-grader, but you still can’t dive into statistical uncertainty or convolutional versus recurrent neural networks. The act of explaining the problem at a high school stats and computer science level makes your problem, and the solution, accessible to everyone within your or your client’s organization, from the junior data scientists to the Chief Legal Officer.

Clearly defining our business problem showcases how data science is used to solve real-world problems. This high-level thinking provides us with a foundation for solving the problem. Here are a few other business problem definitions we should think about.

  • Who are the stakeholders for this project?
  • Have we solved similar problems before?
  • Has someone else documented solutions to similar problems?
  • Can we reframe the problem in any way?

And don’t be fooled by these deceivingly simple questions. Sometimes more generalized questions can be very difficult to answer. But, we believe answering these framing question is the first, and possibly most important, step in the process, because it makes the rest of the effort actionable.  

Example

Say we work at a video game company —  let’s call the company Rocinante. Our business is built on customers subscribing to our massive online multiplayer game. Users are billed monthly. We have data about users who have cancelled their subscription and those who have continued to renew month after month. Our management team wants us to analyze our customer data.

What is the problem we are trying to solve?

Well, as a company, the Rocinante wants to be able to predict whether or not customers will cancel their subscription. We want to be able to predict which customers will churn, in order to address the core reasons why customers unsubscribe. Additionally, we need a plan to target specific customers with more proactive retention strategies.

Churn is the turnover of customers, also referred to as customer death. In a contractual setting - such as when a user signs a contract to join a gym - a customer “dies” when they cancel their gym membership. In a non-contractual setting, customer death is not observed and is more difficult to model. For example, Amazon does not know when you have decided to never-again purchase Adidas. Your customer death as an Amazon or Adidas customer is implied.

Possible Solutions

What are the approaches we can use to solve this problem?

There are many instances when we shouldn’t be using machine learning to solve a problem. Remember, data science is one of many tools in the toolbox. There could be a simpler, and maybe cheaper, solution out there. Maybe we could answer a question by looking at descriptive statistics around web analytics data from Google Analytics. Maybe we could solve the problem with user interviews and hear what the users think in their own words. This question aims to see if spinning up EC2 instances on Amazon Web Services is worth it. If the answer to, “Is there a simple solution,” is, “No,” then we can ask, “Can we use data science to solve this problem?” This yes or no question brings about two follow-up questions:

  1. Is the data available to solve this problem?” A data scientist without data is not a very helpful individual. Many of the data science techniques that are highlighted in media today — such as deep learning with artificial neural networks — requires a massive amount of data. A hundred data points is unlikely to provide enough data to train and test a model. If the answer to this question is no, then we can consider acquiring more data and pipelining that data to warehouses, where it can be accessed at a later date.
  2. Who are the team members we need in order to solve this problem?” Your initial answer to this question will be, “The data scientist, of course!” The vast majority of the problems we face at Viget can’t or shouldn’t be solved by a lone data scientist because we are solving business problems. Our data scientists team up with UXers, designers, developers, project managers, and hardware developers to develop digital strategies and solving data science problems is one part of that strategy. Siloing your problem and siloing your data scientists isn’t helpful for anyone.

Example

We want to predict when a customer will unsubscribe from Rocinante’s flagship game. One simple approach to solving this problem would be to take the average customer life - how long a gamer remains subscribed - and predict that all customers will churn after X amount of time. Say our data showed that on average customers churned after 72 months of subscription. Then we could predict a new customer would churn after 72 months of subscription. We test out this hypothesis on new data and learn that it is wildly inaccurate. The average customer lifetime for our previous data was 72 months, but our new batch of data had an average customer lifetime of 2 months. Users in the second batch of data churned much faster than those in the first batch. Our prediction of 72 months didn’t generalize well. Let’s try a more sophisticated approach using data science.

  1. Is the data available to solve this problem? The dataset contains 12,043 rows of data and 49 features. We determine that this sample of data is large enough for our use-case. We don’t need to deploy Rocinante’s data engineering team for this project.
  2. Who are the team members we need in order to solve this problem?  Let’s talk with the Rocinante’s data engineering team to learn more about their data collection process. We could learn about biases in the data from the data collectors themselves. Let’s also chat with the customer retention and acquisitions team and hear about their tactics to reduce churn. Our job is to analyze data that will ultimately impact their work. Our project team will consist of the data scientist to lead the analysis, a project manager to keep the project team on task, and a UX designer to help facilitate research efforts we plan to conduct before and after the data analysis.

Evaluation

How do we know if we have successfully solved the problem?

At Viget, we aim to be data-informed, which means we aren’t blindly driven by our data, but we are still focused on quantifiable measures of success. Our data science problems are held to the same standard. What are the ways in which this problem could be a success? What are the ways in which this problem could be a complete and utter failure? We often have specific success metrics and Key Performance Indicators (KPIs) that help us answer these questions.

Example

Our UX coworker has interviewed some of the other stakeholders at Rocinante and some of the gamers who play our game. Our team believes if our analysis is inconclusive, and we continue the status quo, the project would be a failure. The project would be a success if we are able to predict a churn risk score for each subscriber. A churn risk score, coupled with our monthly churn rate (the rate at which customers leave the subscription service per month), will be useful information. The customer acquisition team will have a better idea of how many new users they need to acquire in order to keep the number of customers the same, and how many new users they need in order to grow the customer base. 

Data Science-ing

What do we need to learn about the data and what analysis do we need to conduct?

At the heart of solving a data science problem are hundreds of questions. I attempted to ask these and similar questions last year in a blog post, Data Science Workflow. Below are some of the most crucial — they’re not the only questions you could face when solving a data science problem, but are ones that our team at Viget thinks about on nearly every data problem.

  1. What do we need to learn about the data?
  2. What type of exploratory data analysis do we need to conduct?
  3. Where is our data coming from?
  4. What is the current state of our data?
  5. Is this a supervised or unsupervised learning problem?
  6. Is this a regression, classification, or clustering problem?
  7. What biases could our data contain?
  8. What type of data cleaning do we need to do?
  9. What type of feature engineering could be useful?
  10. What algorithms or types of models have been proven to solve similar problems well?
  11. What evaluation metric are we using for our model?
  12. What is our training and testing plan?
  13. How can we tweak the model to make it more accurate, increase the ROC/AUC, decrease log-loss, etc. ?
  14. Have we optimized the various parameters of the algorithm? Try grid search here.
  15. Is this ethical?

That last question raises the conversation about ethics in data science. Unfortunately, there is no hippocratic oath for data scientists, but that doesn’t excuse the data science industry from acting unethically. We should apply ethical considerations to our standard data science workflow. Additionally, ethics in data science as a topic deserves more than a paragraph in this article — but I wanted to highlight that we should be cognizant and practice only ethical data science.

Example

Let’s get started with the analysis. It’s  time to answer the data science questions. Because this is an example, the answer to these data science questions are entirely hypothetical.

  1. We need to learn more about the time series nature of our data, as well as the format.
  2. We should look into average customer lifetime durations and summary statistics around some of the features we believe could be important.
  3. Our data came from login data and customer data, compiled by Rocinante’s data engineering team.
  4. The data needs to be cleaned, but it is conveniently in a PostgreSQL database.
  5. This is a supervised learning problem because we know which customers have churned.
  6. This is a binary classification problem.
  7. After conducting exploratory data analysis and speaking with the data engineering team, we do not see any biases in the data.
  8. We need to reformat some of the data and use missing data imputation for features we believe are important but have some missing data points.
  9. With 49 good features, we don’t believe we need to do any feature engineering.
  10. We have used random forests, XGBoost, and standard logistic regressions to solve classification problems.
  11. We will use ROC-AUC score as our evaluation metric.
  12. We are going to use a training-test split (80% training, 20% test) to evaluate our model.
  13. Let’s remove features that are statistically insignificant from our model to improve the ROC-AUC score.
  14. Let’s optimize the parameters within our random forests model to improve the ROC-AUC score.
  15. Our team believes we are acting ethically.

This process may look deceivingly linear, but data science is often a nonlinear practice. After doing all of the work in our example above, we could still end up with a model that doesn’t generalize well. It could be bad at predicting churn in new customers. Maybe we shouldn’t have assumed this problem was a binary classification problem and instead used survival regression to solve the problem. This part of the project will be filled with experimentation, and that’s totally normal.

Communication

What is the best way to communicated and circulate our results?

Our job is typically to bring our findings to the client, explain how the process was a success or failure, and explain why. Communicating technical details and explaining to non-technical audiences is important because not all of our clients have degrees in statistics.  There are three ways in which communication of technical details can be advantageous:

  • It can be used to inspire confidence that the work is thorough and multiple options have been considered.
  • It can highlight technical considerations or caveats that stakeholders and decision-makers should be aware of.  
  • It can offer resources to learn more about specific techniques applied.
  • It can provide supplemental materials to allow the findings to be replicated where possible.

We often use blog posts and articles to circulate our work. They help spread our knowledge and the lessons we learned while working on a project to peers. I encourage every data scientist to engage with the data science community by attending and speaking at meetups and conferences, publishing their work online, and extending a helping hand to other curious data scientists and analysts.

Example

Our method of binary classification was in fact incorrect, so we ended up using survival regression to determine there are four features that impact churn: gaming platform, geographical region, days since last update, and season. Our team aggregates all of our findings into one report, detailing the specific techniques we used, caveats about the analysis, and the multiple recommendations from our team to the customer retention and acquisition team. This report is full of the nitty-gritty details that the more technical folks, such as the data engineering team, may appreciate. Our team also creates a slide deck for the less-technical audience. This deck glosses over many of the technical details of the project and focuses on recommendations for the customer retention and acquisition team.

We give a talk at a local data science meetup, going over the trials, tribulations, and triumphs of the project and sharing them with the data science community at large.

Why?

Why are we doing all of this?

I ask myself this question daily — and not in the metaphysical sense, but in the value-driven sense. Is there value in the work we have done and in the end result? I hope the answer is yes. But, let’s be honest, this is business. We don’t have three years to put together a PhD thesis-like paper. We have to move quickly and cost-effectively. Critically evaluating the value ultimately created will help you refine your approach to the next project. And, if you didn’t produce the value you’d originally hoped, then at the very least, I hope you were able to learn something and sharpen your data science skills. 

Example

Rocinante has a better idea of how long our users will remain active on the platform based on user characteristics, and can now launch preemptive strikes in order to retain those users who look like they are about to churn. Our team eventually develops a system that alerts the customer retention and acquisition team when a user may be about to churn, and they know to reach out to that user, via email, encouraging them to try out a new feature we recently launched. Rocinante is making better data-informed decisions based on this work, and that’s great!

Conclusion

I hope this article will help guide your next data science project and get the wheels turning in your own mind. Maybe you will be the creator of a data science framework the world adopts! Let me know what you think about the questions, or whether I’m missing anything, in the comments below.


          Software engineer, Database System, MNC, Attractive package - Hudson - Singapore      Cache   Translate Page      
George Chen IT Recruitment Consultant Hudson SG Employment Agency Licence No.:. MNC, Artificial intelligence/Deep learning....
From Hudson - Wed, 28 Nov 2018 11:09:42 GMT - View all Singapore jobs
          NVIDIA Unveils Beastly Titan RTX Turing GPU, 24GB GDDR6, 11 GigaRays Ray Tracing Muscle      Cache   Translate Page      
NVIDIA Unveils Beastly Titan RTX Turing GPU, 24GB GDDR6, 11 GigaRays Ray Tracing Muscle Following a rash of strategic 'leaks' by social media influencers, NVIDIA on Monday formally introduced its Titan RTX graphics card, a heavy-hitting accelerator that the company is appropriately billing as "the world's most powerful desktop GPU." It's built to handle the data-crunching rigors of deep learning applications, and of course brings

          阿尔法狗再下一城 | 蛋白结构预测AlphaFold大胜传统人类模型      Cache   Translate Page      

(本文首发于微信公众号《驻波》,发布在知乎平台已获得官方许可)

文/袁博

审/范静萱、张涵雄、常亮


“我有一个要研究的蛋白,但我不知道它的结构和功能”——这是分子和细胞生物学家每天面临的最大难题之一。[1] 随着氨基酸测序技术的不断发展,越来越多的蛋白质序列得以被高通量地读取,但是从这个一维序列本身到能够解出实际的三维结构,仍然还有很大的距离。

[1]引自 Roy, A. et al., Nature Protocol, 2010


如果说生物的基本单元是细胞,那么细胞的基本功能单元,就是一个个错综复杂的蛋白。而决定蛋白质功能的核心,正是蛋白的结构。想要研究蛋白质的功能或是设计靶向的药物,蛋白质的结构也是非常重要的一环。也正是因为这种重要性,生物里面专门有一个领域,叫做结构生物学。知名华人生物学家施一公,就是结构生物学的带头人之一。

2018年,又是两年一度的国际蛋白质结构预测竞赛(CASP)的日子。这个迄今已经进行了25年的比赛项目,每届都能吸引来自世界各地的数百支团队参与,对一些组委会选中蛋白结构进行定量地预测。这个周末,正是本届竞赛结果揭榜的日子。

在揭榜以前,大赛组委会群发了这么一条剧透邮件给参赛者。

CASP13 this year has observed unprecedented progress in the ability of computational methods to predict protein 3D structure. The reasons are not yet fully clear, but all this, including of course the results, will be discussed at the meeting.

大意是说,这届比赛见证了“前所未有”的突破,具体的原因尚不十分清楚,但我们会在周末的正式大会上进行详细地讨论。“史无前例的神秘团队”,高高地吊起了众人的胃口。因为这次事件的特殊性,大会甚至追加了一轮注册,给想要来看热闹的媒体们提供额外一次买票的机会。

美国时间本周五凌晨,本届比赛参赛总榜正式揭开,一个署名为A7D的团队拔得头筹,并且把其他队伍的结果远远地甩在了后面。甩得有多远呢?驻波去挖了一下前几次比赛的最终结果,A7D和第二名在本届比赛中的差距,几乎比CASP成立二十年以来模型性能提升的总和还要高[2]。

[2] 每届预测目标不同,直接比较的结果并不非常准确。

纵坐标:模型对每一道赛题(蛋白)的

预测近似性成绩累加,越高越好

深灰色的是今年97支队伍的成绩,浅灰色的是2016年上届128支队伍成绩,红色的是DeepMind的队伍A7D。


举几个被预测蛋白的例子,A7D的模型的画风是这样的——

下面每幅图代表参赛模型对某一个未知结构的蛋白的预测结果

纵坐标:模型预测和实际结构在全链上的累计偏差(越低越好)




青色的线是A7D的预测结果

粉色的线是总成绩第二名的密西根大学团队

这样的——


和这样的——

除了累计总分第一名,DeepMind团队成功在43个参赛蛋白中拿到25个单项最佳模型;相比之下,累计总分第2名的团队拿到了其中的3个[3]。 这个可谓一骑绝尘的队伍究竟是何方神圣呢?是的你已经被剧透了,赛后A7D在论坛上表示,自己其实是来自DeepMind的研究人员,对就是那个当年开发AlphaGo的DeepMind!据报道,DeepMind已经将此模型正式命名为AlphaFold [3]。

[3] Guardian新闻

事实上,早在2017年10月,DeepMind就在一次公开采访中表示,团队开始对人工智能在药物开发中的应用感兴趣,而新药开发的关键一步,就是对靶点蛋白质三维结构的精准测算。如今这个新闻正好过去一年的时间,DeepMind也向世人再次证明了深度学习在又一全新场景中的巨大潜力。

Deep Learning又双叒叕渗透进了一个新的应用场景


DeepMind这一次做了什么?

据统计,截至2010年,只有0.6%的已知蛋白序列被解析出了相应的结构[3]。正是缘于这个巨大的断层,第一届蛋白质结构预测挑战(Critical Assessment of Techniques for Protein Structure Prediction,CASP)于1994年在加州举办。得益于问题的规范化,二十年来许许多多的计算模型得以被开发。笔者的导师Chris Sander,多年前从理论物理方向转行生物的时候,也是从结构预测这个生物学问题开始的。我也是这次写这篇新闻稿才知道,Chris当年也是CASP竞赛第一届的获奖者之一。

我的导师Chris Sander

可以手写蛋白质三个汉字的德国老爷子



历史上来说,这些计算的模型主要分成了三大流派——Comparative Modeling的演化流,threading methods的比对流,还有from scratch的ab initio流。

演化流的核心概念是寻找演化历史上同源或者近似同源的序列,从他们的结构出发预测新的目标蛋白;比对流的核心概念是说,不一定要演化上同源,直接将目标序列中的片段和曾经解析出来的三维结构进行匹配和比对,就可以用来预测新的蛋白;而最难却也最关键的,就是ab initio流,目的是从零开始预测那些完全找不到相似性的蛋白序列,这是拉丁语里从最初开始的意思。

1999年,一款基于ab initio的模型Rossetta由华盛顿大学David Baker团队开发。模型通过Monte Carlo模拟退火算法成功预测了长度100个氨基酸左右的若干蛋白,预测精度最低达到方均根差(RMSD) 3.8Å,并成为了CASP III的获奖者之一 [4]。Baker 2003年发表于Science杂志上的一项工作中,更是成功预测了一段长度93个氨基酸的人工合成序列TOP7,精度达到1.2Å[5]。2005年,Baker团队开发了屏保程序Rosetta@home,客户端会在闲置时帮助Rossetta服务器进行结构解析的模拟运算。借用这种分布式计算的形式,调用众多闲置个人计算资源,取得了极大的成功。

[4] Simons et al., Proteins. 1999.

[5] Kuhlman et al., Science. 2003.

Baker Lab开发的的屏幕保护程序Rossetta@Home

近年来,随着CASP挑战的持续进行,这些流派之间的界限也逐渐变得模糊,越来越多的科研团队开始把这三方面信息都整合到一个模型之中,融合成一个更加准确的预测模型。在这些团队之中,来自密西根大学的Yang Zhang团队所开发的I-TASSER,就是成功的例子之一。

UMich的教授Yang Zhang和他开发的I-TASSER,该工具已经被引用超过6000次,协助过来自141个国家的超过100,000名科研人员。

从2008年模型面世开始,I-TASSER及他的各种变体组合已经成为了最流行的结构计算模型之一,在接近十年的CASP比赛中名列前茅。本次CASP比赛中,Zhang团队通过整合I-TASSER和卷积神经网络CNN,将模型准确率进一步提高,并获得了本次比赛的第二名。

AlphaGo版的蛋白预测模型因何取得巨大突破

甚至早在AlphaGo面世之前,就有一些学者尝试使用神经网络和Reinforcement Learning来模拟退火的这一步过程[6]。那么这次AlphaGo是因何能够在本次比赛中脱颖而出呢?AlphaGo官方声明还没有出,我们只能从他在比赛网站上po出的一页纸的概要中略窥一二。

[6] Czibula et al., Int.J.Comp.Tech.Appl. 2011.

DeepMind在CASP比赛网站上po出的模型概述

按照队伍带头人之一Andrew Senior所述,本次DeepMind提交的预测结果来自于三种不同变体的神经网络生成模型。整个模型由一个二维接触网络和一个评估网络组成。

在二维接触网络中,蛋白的一级序列被用来预测蛋白每一对氨基酸与氨基酸两两组合的距离。在这一计算模块中,虽然三维的结构尚不清楚,但是神经网络可以学习并预测出是哪些氨基酸在相对较近的同一空间区域当中(contact matrix),相当于把一维的信息转变为二维的距离。

评估网络模块的输入,就是第一个网络的输出,再加上序列匹配(Multiple Sequence Alignment,MSA)和几何结构(Structure Geometry)的信息。这些信息则被投入到一个退火组装的模型当中,然后根据全片段的预测结构和实际结构的相似性学习出一个使退火组装表现最好的分数;在预测过程中,这个分数作为退火模型优化的目标函数。

为了训练这个神经网络,DeepMind将国际蛋白质数据库PDB中全部已知结构的蛋白都投入到了training当中。每个蛋白被分割成许许多多互相重叠的短肽,模型被要求对这些短肽片段进行结构的预测和评分。这些评分的结果和传统的Rossetta的评分一起被用来训练这两个模块中的参数,从而自动训练出一个模拟退火的目标函数。

使用CASP13中的一个例子CASP13-T1008

演示模型训练的进程


如果到这里就结束了,那你就小看DeepMind了。DeepMind这次上传的模型中,还包含了一种完全抛弃传统上fragmentation先分段后组装的训练方法,在这个模型中,DeepMind将氨基酸之间的扭转角(torsion)直接作为模型的预测输出,对应两个神经网络给出的二维结构评估和全长评分直接进行梯度下降(Gradient Descent,GD),竟也取得了相当惊人的效果。这种几乎完全放弃传统意义上biophysics手动选取feature的方法,仿佛让人们看到了当年AlphaGo高处不胜寒的影子。


后记:结构生物学的春天来临了吗

“除了DeepMind外,包括我们在内的很多工作组也在使用其他的Machine Learning方法解决这一问题。”英国科学家Liam McGuffin也表达了他的乐观,“这几年来AI给这个领域带来了惊人的推动,也许在2020年左右,我们就可以基本上解决蛋白结构预测的问题,我对此很乐观。”

在结构生物学领域,这毫无疑问是一项巨大的突破,但也掀起了很多质疑和担忧的声音。事实上,这个模型也尚没有达到极高的准确率,在一些传统模型可以解决的案例中,反而达不到预期的效果。比如CASP13-T0966-D1,对应E. Coli中的RRSP蛋白,是一种和Ras-Erk同路有相互作用的重要蛋白,也是一种治疗Ras相关癌症的潜在药物靶点,AlphaFold对这个蛋白的预测甚至都没有达到平均水平。模型对于什么样的蛋白分子更有效?为什么更有效?这些都尚未被详细地研究清楚。这样的模型可以被用来实际应用帮助药物开发吗?可能还要画一个小小的问号。

一个DeepMind模型失败的例子:蛋白RRSP

青色的线表示A7D的结果,粉色的线是总平均第二名的密西根大学团队


“比起正确率的突破,DeepMind在退火模拟中没有采用外界所预期的强化学习的思路更值得玩味。”MIT人工智能实验室的一名博士生S这样告诉驻波,“DeepMind有一万条理由去尝试这条思路,但是最终公布的结果并没有采用。如果拥有如此资源的DeepMind都没有做到这些,这可能对于强化学习在蛋白折叠中的应用是一个警告。”

“而且,模型中也包含了Rossetta的评分。”S补充道,“DeepMind尝试过抛弃这种评分系统,但是最终也没有完全做到,这充分说明这种传统方法长期积累下来的估值函数也起到了相当于重要的作用。”

实际上,AI在生物学的整合并非个例。近年来以google为首的人工智能团队在生物医药领域全面开花,已经在癌症病理图片识别,基因组突变检测,疾病风险评估等诸多领域取得了人类水平,甚至超过人类水平的耀眼成绩。但这些表面看上去很成功的模型也都不可避免地受到普适性、可用性、可解释性的障碍。

一个成熟的应用场景不仅仅需要一个高精度的网络模型,更需要对领域里亟待解决的问题有足够深的理解,以及更多有着交叉学科背景的人才携手努力。随着越来越多的人加入到这场战役之中,AI医疗,未来可期。



来源:知乎 www.zhihu.com
作者:知乎用户(登录查看详情)

【知乎日报】千万用户的选择,做朋友圈里的新鲜事分享大牛。 点击下载

          Third workshop on Bayesian Deep Learning (NeurIPS 2018)      Cache   Translate Page      
none
          (PR) NVIDIA Reveals the Titan of Turing: TITAN RTX      Cache   Translate Page      
Turing-Powered TITAN Delivers 130 Teraflops of Deep Learning Horsepower, 11 GigaRays of Ray-Tracing Performance to World’s Most Demanding Users MONTREAL—Conference on Neural Information Processing Systems—Dec. 3, 2018—NVIDIA today introduced NVIDIA® TITAN RTX™, the world’s most powerful desktop GPU, providing massive performance for AI research, data science and creative applications. Driven by the new NVIDIA Turing™ architecture, TITAN RTX — dubbed T-Rex — delivers 130 teraflops of deep learning performance and 11 GigaRays of ray-tracing performance. “Turing is NVIDIA’s biggest advance in a decade – fusing shaders, ray tracing, and deep learning to reinvent the GPU,” said Jensen Huang, founder and...

Keep on reading: (PR) NVIDIA Reveals the Titan of Turing: TITAN RTX
          New attack could make website security captchas obsolete      Cache   Translate Page      
Researchers have created new artificial intelligence that could spell the end for one of the most widely used website security systems.The new algorithm, based on deep learning methods, is the most effective solver of captcha security and authentication systems to date and is able to defeat versions of text captcha schemes used to defend the majority of the world's most popular websites
          Deep Learning Scientist / Engineer      Cache   Translate Page      
CA-San Jose, I am currently working with several companies in the area who are actively hiring in the field of Computer Vision and Deep Learning. AI, and specifically Computer Vision and Deep Learning are my niche market specialty and I only work with companies in this space. I am actively recruiting for multiple levels of seniority and responsibility, from experienced Individual Contributor roles, to Team Lea
          OSS Leftovers      Cache   Translate Page      
  • Monex Platform Now Available in Private Beta Launch: Built on Open-Source Blockchain

    Monax, a digital legal infrastructure platform built on an open-source, universal blockchain, has introduced the private beta launch of the Monax Platform, the latest in its line of smart contract products. The Monax Platform is a collaborative workspace for businesses, legal and tech professionals, with market-ready smart contract templates available for individual or commercial use.

  • 2019 telecoms forecast: the year of 5G and open source

    2019 is shaping up to be a massive year for telco companies. In the final few months of 2018, countless 5G projects have launched and several new uses cases in cloud computing and IoT have come to light, driving demand for high capacity and low latency connectivity.

    As a result of the monetisation challenges, there has been a distinct move away from just providing faster network speeds to consumers, and towards enabling a whole host of new technologies on mobile networks. To achieve this, an increasing number of telecoms operators are functioning like software companies.

  • AI in 2019: 8 trends to watch

    “Today, more leading-edge software development occurs inside open source communities than ever before, and it is becoming increasingly difficult for proprietary projects to keep up with the rapid pace of development that open source offers,” says Ibrahim Haddad, director of research at The Linux Foundation, which includes the LF Deep Learning Foundation. “AI is no different and is becoming dominated by open source software that brings together multiple collaborators and organizations.”

    In particular, Haddad expects more cutting-edge technology companies and early adopters to open source their internal AI work to catalyze the next waves of development, not unlike how Google spun Kubernetes out from an internal system to an open source project.

    “We foresee more companies open sourcing their internal AI stacks in order to build communities around those projects,” Haddad says. “This will enable companies and communities to harmonize across a set of critical projects that together will create a comprehensive open source stack in the AI, machine learning, and deep learning space. Large companies that were the first to take their AI efforts open source are already seeing early mover advantages, and we expect this to increase.”

  • The 8 biggest open source milestones in 2018

    Open source continues to climb the charts of popularity and usability. Every year that goes by marks newer and greater milestones for open source, and 2018 was no stranger to such events. The open source community enjoyed plenty of highs and suffered its share of lows.

  • Guten Tag Sindelfingen!

    This week, Collaborans will be taking part, and speaking, in this year's ESE Kongress, "the only German-language convention with an exclusive and extensive focus on the manifold issues and challenges with respect to the development of device and system software for industrial applications, automotive engineering, automation, drives, measurement systems, communication systems as well as consumer electronics and medical devices."'

  • Chrome 71 for Mac, Windows, Linux rolling out w/ ad removal on abusive sites, billing protection

    At target are websites that continue to display advertising that masquerades as fake system dialogs or ineffective ‘close’ buttons even after warnings from the Google Search Console’s Abusive Experiences Report. According to Google, this ad removal will affect a “small number of sites with persistent abusive experiences,” with scammers and phishing schemes often using these ads to steal personal information.

  • The Servo Blog: Experience porting Servo to the Magic Leap One

    We now have nightly releases of Servo for the Magic Leap One augmented reality headset. You can head over to https://download.servo.org/, install the application, and browse the web in a virtual browser.

  • SmartArt improvements in LibreOffice, part 2

    I recently dived into the SmartArt support of LibreOffice, which is the component responsible for displaying complex diagrams from PPTX. I focused especially on the case when only document model and the layout constraints are given, not a pre-rendered result.

    First, thanks to our partner SUSE for working with Collabora to make this possible.

  • BloomReach Experience Manager v13, Magnolia v6 and More Open Source News
  • Top 14 Joomla extensions

    In the first part of this series, I explained how to use the Joomla Extension Directory to find extensions to expand your Joomla website's functionality. Here, I'll describe the top 14 free Joomla extensions—the ones I don't think any site should do without.

  • Register today for LibrePlanet 2019!

    The free software community spans the entire world, with supporters in nearly every corner of the globe, busily coding, tinkering, and spreading the word about the growing importance of controlling our computing. The Internet provides us with many great tools to share the latest news and advances, but ultimately, there’s nothing quite like meeting in person at the LibrePlanet conference! At LibrePlanet, you can meet other developers, activists, policy experts, students, and more, to make connections and help us strategize the future of free software.

  • Introducing Lei Zhao, intern with the FSF tech team

    I first became aware of free software in the sense of freedom at the age of 19. I encountered free software even earlier, but it took some time to appreciate the free/libre aspect of free software.

  • PAINS management: an open source model to eliminate nuisance compounds

    High-throughput screening (HTS) technologies have enabled the routine testing of millions of compounds towards the identification of novel ‘hit’ molecules for therapeutic targets. Oftentimes in this drug discovery process, however, compounds that show promising activity in primary screens show no activity during subsequent hit qualification or progression efforts.

  • Open source tool picks best chemo drug 80% of the time
  • Inside FC Barcelona's Open-Source Strategy to Innovate in Soccer
  • XML Language Server and the VSCode Extension [Ed: Red Hat as Microsoft marketing department, helping to sell proprietary lockin Visual Studio]

read more


          NVIDIA Unveils TITAN RTX GPU for Accelerated Ai      Cache   Translate Page      

Today NVIDIA introduced the TITAN RTX as what the company calls "the world’s most powerful desktop GPU" for AI research, data science and creative applications. "Driven by the new NVIDIA Turing architecture, TITAN RTX — dubbed T-Rex — delivers 130 teraflops of deep learning performance and 11 GigaRays of ray-tracing performance. Turing is NVIDIA’s biggest advance in a decade – fusing shaders, ray tracing, and deep learning to reinvent the GPU,” said Jensen Huang, founder and CEO of NVIDIA. “The introduction of T-Rex puts Turing within reach of millions of the most demanding PC users — developers, scientists and content creators.”

The post NVIDIA Unveils TITAN RTX GPU for Accelerated Ai appeared first on insideHPC.


          Comment on Star Trek Picard Series Will Premiere Next Year by Vulcan Soul      Cache   Translate Page      
Oh sure, upscaling and super resolution is already possible - I actually meant the next level, which is to add details to ships (so the Defiant won't look like a "grey blob" as per the interview Matt linked above) and even people's faces that is not in the source material based upon "guessing" using algorithms trained with millions of hours of television. Maybe even use remastered TNG to auto-remaster DS9? If you have seen some of the crazy stuff deep learning is capable of today already, you know it's closer than we think!
          Third workshop on Bayesian Deep Learning (NeurIPS 2018)      Cache   Translate Page      
none
          #10: Python Machine Learning: A Deep Dive Into Python Machine Learning and Deep Learning, Using Tensor Flow And Keras: From Beginner To Advance      Cache   Translate Page      

Pick your books posted a photo:

#10: Python Machine Learning: A Deep Dive Into Python Machine Learning and Deep Learning, Using Tensor Flow And Keras: From Beginner To Advance



Python Machine Learning: A Deep Dive Into Python Machine Learning and Deep Learning, Using Tensor Flow And Keras: From Beginner To Advance
Leonard Eddison (Author)

Buy new: $0.99

(Visit the Hot New Releases in Programming list for authoritative information on this product's current rank.)

Buy now: #10: Python Machine Learning: A Deep Dive Into Python Machine Learning and Deep Learning, Using Tensor Flow And Keras: From Beginner To Advance www1.pickyourbook.net/?p=58317


          Deep learning in Satellite imagery      Cache   Translate Page      
In this article, I hope to inspire you to start exploring satellite imagery datasets. Recently, this technology has gained huge momentum, and we are finding that new possibilities arise when we use satellite image analysis. Satellite data changes the game because it allows us to gather new information that is not readily available to businesses. […] Artykuł Deep learning in Satellite imagery pochodzi z serwisu Appsilon Data Science | End­ to­ End Data Science Solutions.
          Intel Select Solutions for BigDL on Apache Spark – Intel Conversations in the Cloud – Episode 161      Cache   Translate Page      
In this Intel Conversations in the Cloud audio podcast: On this week’s Conversations in the Cloud, we welcome Radhika Rangarajan, Engineering Director for Data Analytics and AI Ecosystem at Intel. Radhika offers an overview of BigDL, a distributed deep learning library for Apache Spark that enables efficient, scalable, and optimized deep learning development. People don’t [...]
          Comment on Dropout Regularization in Deep Learning Models With Keras by Jason Brownlee      Cache   Translate Page      
Some ideas: Perhaps the out of sample dataset is not representative? Perhaps other regularization methods are needed? Perhaps the training process needs tuning too? Let me know how you go.
          Book Memo: “Embedded Deep Learning”      Cache   Translate Page      
Algorithms, Architectures and Circuits for Always-on Neural Network Processing This book covers algorithmic and hardware implementation techniques to enable embedded …

Continue reading


          Distilled News      Cache   Translate Page      
Activation Regularization for Reducing Generalization Error in Deep Learning Neural Networks Deep learning models are capable of automatically learning a …

Continue reading


          Cirrascale Cloud Services Adds NVMe Hot Tier Storage Offering Powered...      Cache   Translate Page      
The company’s multi-GPU deep learning and HPC cloud platform now supports WekaIO Matrix™ to provide unmatched storage speed and scalability for AI and HPC analytics applications(PRWeb December 03, 2018)Read the full story at https://www.prweb.com/releases/cirrascale_cloud_services_adds_nvme_hot_tier_storage_offering_powered_by_wekaio_to_accelerate_deep_learning_applications/prweb15955750.htm (Source: PRWeb: Medical Pharmaceuticals)
          Fraud Analytics for Open Banking: Behavioral Profiling      Cache   Translate Page      

Bank on sea of data

Digital banking channels are increasingly popular, and behavioral profiling of customers is vital in preventing new types of fraud. The open banking revolution makes understanding each customer’s behavior even more important in preventing fraud by considering all the aspects of transactions. Transactions in the world of open banking contain data not previously seen in the payments ecosystem.

Behavioral profiling approaches are extremely important in tackling fraud that happens when banks share financial data with third parties through application programming interfaces. The behavioral profiling in the FICO Falcon Platform leverages historical details to track a customer’s patterns, including:

  • typical spending velocity
  • the hours and days when they tend to transact
  • which foreign countries they have transferred to before
  • favorite beneficiaries

Transaction Profiles

Transaction profiles enable FICO Falcon Platform to detect subtle, yet anomalous changes in behavior and elevate the score on the transaction. Each profile is a continuous learning cognitive “mini-model” that uses machine learning to interpret behavior in real-time.

Profiles compactly summarise each customer’s transactional history, which is too big to be retrieved when a decision has to be made in milliseconds (Figure 1). This is why we require streaming analytics.

FICO chart

Transaction profiling, applying Kalman filter principles, creates a profile for each customer. This is updated in real time, with each transaction, to account for behavioral changes.

Profiles are:

  • Recursively updated; computing the estimate for the current profile state only requires the estimated profile state from the previous transaction and the information connected with the current transaction.
  • Composed of numerous monetary and non-monetary parameters that are continuously updated to enable adaptive behavioral profiling.
  • Memory-efficient and do not require extensive storage space.

In practice, when a transaction enters the FICO Falcon Platform, the system pulls a profile connected with that transaction. The system updates the variables stored in that profile, and uses the updated profile to produce the final score, which indicates the likelihood of fraud.

Understanding Recurring Behavior

People form habits, and by looking at their transactional history we can learn their frequent behaviors. Generally, customers use the same devices, such as computer or mobile phones, go to the same online merchants and transfer money to repeated beneficiaries. These recurrences can be analysed and understood to shine further light on normal behavior, and thus on fraud.

To understand recurrences, FICO Falcon Platform maintains behavior-sorted lists or B-LISTS, which enable the system to create a real-time ranking of features associated with each customer’s most frequent behaviors.

By using machine learning, the system makes sure that only the activities that keep recurring remain in each customer’s B-LIST. Frequent activities have higher ranks and are less likely to be fraudulent.

FICO chart

In Figure 2, money transfers to the same beneficiaries have higher weights in a customer’s B-LIST and are less likely to be fraudulent. On the other hand, money transfers to destinations that are not included in the customer’s B-LIST are substantially riskier. FICO’s B-LIST technology is a powerful facet of the transaction profile.

The open banking changes, specifically the need to fight fraud and keep genuine customers happy, means that behavioral profiling at the individual customer level is crucial. Each customer’s profile, including transaction profiling and B-LIST technology, is a “mini-model” that uses machine learning in order to learn highly detailed behavioral patterns of that customer in real time.

Learn More

For more information, see my previous post on fraud analytics, and my white papers on how AI and machine learning address the open banking revolution.

The post Fraud Analytics for Open Banking: Behavioral Profiling appeared first on FICO.


          Software engineer, Database System, MNC, Attractive package - Hudson - Singapore      Cache   Translate Page      
George Chen IT Recruitment Consultant Hudson SG Employment Agency Licence No.:. MNC, Artificial intelligence/Deep learning....
From Hudson - Wed, 28 Nov 2018 11:09:42 GMT - View all Singapore jobs
          Gannett Is Using Deep Learning to Determine Why Certain Ad Designs Work      Cache   Translate Page      
Gannett is turning to a form of artificial intelligence to design better online ads. The USA Today publisher recently rolled out a new internal platform that uses deep learning and computer vision to determine which images, colors and other design aspects work best in online ads across its dozens of local news sites. The company...
          Intel Select Solutions for BigDL on Apache Spark – Intel Conversations in the Cloud – Episode 161      Cache   Translate Page      
In this Intel Conversations in the Cloud audio podcast: On this week’s Conversations in the Cloud, we welcome Radhika Rangarajan, Engineering Director for Data Analytics and AI Ecosystem at Intel. Radhika offers an overview of BigDL, a distributed deep learning library for Apache Spark that enables efficient, scalable, and optimized deep learning development. People don’t [...]
          Why we should worry when machines hallucinate      Cache   Translate Page      
Take driverless cars which are currently undergoing field trials: these often rely on sophisticated deep learning neural networks to navigate and tell ...
          Why we should worry when machines hallucinate      Cache   Translate Page      
Take driverless cars which are currently undergoing field trials: these often rely on sophisticated deep learning neural networks to navigate and tell ...
          Amazon Leaps Forward In Cloudy AI      Cache   Translate Page      
AWS added reinforcement learning (RL) to SageMaker, its end-to-end AI developer platform. Google added RL to its TensorFlow Deep Learning (DL) ...
          Amazon Leaps Forward In Cloudy AI      Cache   Translate Page      
AWS added reinforcement learning (RL) to SageMaker, its end-to-end AI developer platform. Google added RL to its TensorFlow Deep Learning (DL) ...
          NVIDIA Announces Beastly New Titan RTX      Cache   Translate Page      
Later on in the press release, NVIDIA says that it's "built for AI researchers and deep learning developers," claims it's "perfect for data scientists," and ...
          NVIDIA Announces Beastly New Titan RTX      Cache   Translate Page      
Later on in the press release, NVIDIA says that it's "built for AI researchers and deep learning developers," claims it's "perfect for data scientists," and ...
          Axios Future      Cache   Translate Page      
In January, Marcus, who has been a critic of deep learning for years, stirred up a stormy debate with an article that fundamentally questioned deep ...
          Think Like Amazon: Brands Deploy Smart Filters, User-Generated Content      Cache   Translate Page      
Amazon sits atop a culture of innovation using machine learning, deep learning and artificial intelligence, not all of which was developed in-house, ...
          Principal Data Scientist - Deep Learning - QuantumBlack - Montréal, QC      Cache   Translate Page      
Work closely with Data Engineers, Machine Learning Engineers and Designers to build end-to-end analytics solutions for our clients that drive real impact in the...
From QuantumBlack - Thu, 25 Oct 2018 16:08:33 GMT - View all Montréal, QC jobs
          New attack could make website security captchas obsolete      Cache   Translate Page      
Researchers have created new artificial intelligence that could spell the end for one of the most widely used website security systems. The new algorithm, based on deep learning methods, is the most effective solver of captcha security and authentication systems to date and is able to defeat versions of text captcha schemes used to defend the majority of the world's most popular websites.
          Software engineer, Database System, MNC, Attractive package - Hudson - Singapore      Cache   Translate Page      
George Chen IT Recruitment Consultant Hudson SG Employment Agency Licence No.:. MNC, Artificial intelligence/Deep learning....
From Hudson - Wed, 28 Nov 2018 11:09:42 GMT - View all Singapore jobs
          contextflow selected for Philips HealthWorks AI in Radiology Accelerator      Cache   Translate Page      
Deep learning expert contextflow GmbH thrilled to announce its participation in highly-selective program with one of the world’s leading health innovation companies Vienna, Austria, December 5th 2018 - contextflow, a recognized name in the highly competitive field of AI in medical

          NVIDIA Unveils TITAN RTX GPU for Accelerated Ai      Cache   Translate Page      

Today NVIDIA introduced the TITAN RTX as what the company calls "the world’s most powerful desktop GPU" for AI research, data science and creative applications. "Driven by the new NVIDIA Turing architecture, TITAN RTX — dubbed T-Rex — delivers 130 teraflops of deep learning performance and 11 GigaRays of ray-tracing performance. Turing is NVIDIA’s biggest advance in a decade – fusing shaders, ray tracing, and deep learning to reinvent the GPU,” said Jensen Huang, founder and CEO of NVIDIA. “The introduction of T-Rex puts Turing within reach of millions of the most demanding PC users — developers, scientists and content creators.”

The post NVIDIA Unveils TITAN RTX GPU for Accelerated Ai appeared first on insideHPC.


          Comment on Introducing the PyTorch Scholarship Challenge from Facebook by udacity      Cache   Translate Page      
Thank you for your question. The key difference is that, in taking the course via the challenge, you enter yourself into the opportunity to earn a full scholarship from Udacity and Facebook, for Udacity's Deep Learning Nanodegree program. Whereas if you just enroll in the "standard" free course, you don't have that opportunity. So your decision as to which way to enroll should depend on whether you want to try and earn the scholarship opportunity. Hope that's clear, but please let us know if you have additional questions, thank you!
          Highlights from five years of Facebook AI Research: open sourcing Torch deep learning modules and Caffe2, scalable text classification, and translation research (Oliver Libaw/Facebook Code)      Cache   Translate Page      

Oliver Libaw / Facebook Code:
Highlights from five years of Facebook AI Research: open sourcing Torch deep learning modules and Caffe2, scalable text classification, and translation research  —  Five years ago, we created the Facebook AI Research (FAIR) group to advance the state of the art of AI through open research …


          NVIDIA Has Trained AI to Create Entire Virtual Worlds       Cache   Translate Page      

With a background in industrial design, a good portfolio and some luck, you could land a job as a digital set designer for Hollywood. Their job involves, among other things, rendering cityscapes through which green-screened actors might run, fly or have car chases. But while the actors won't be replaced by computers yet, the days of designers specializing in cityscapes might be numbered.

That's because researchers at NVIDIA have managed to harness the power of AI to render not just single scenes, but entire urban environments, and everything you might expect to see in one:

"This is the first time we can do this with a neural network," said Bryan Catanzaro, Vice President of Applied Deep Learning at NVIDIA. "Neural networks – specifically – generative models are going to change the way graphics are created.

"One of the main obstacles developers face when creating virtual worlds, whether for game development, telepresence, or other applications is that creating the content is expensive. This method allows artists and developers to create at a much lower cost, by using AI that learns from the real world."

You can learn more about the technology here.



          IBM boosts AI chip speed, bringing deep learning to the edge      Cache   Translate Page      
IBM is developing specialized hardware for AI applications in the interest of developing a true "broad AI" solution.
          Comment on HPC File Systems Fail for Deep Learning at Scale by DiPe      Cache   Translate Page      
From what have heard Lustre and GPFS do not offer very scalable metadata performance. With BeeGFS this kind of workload should not be a problem because each metadata server runs highly multi-threaded and there is not any known scaling limit to the number of metadata servers one can add to a single cluster. One important piece of information is missing here. Are those 200k files all in a single flat directory or are are they spread out over many sub-directories ? If it is the latter this should not be a problem with BeeGFS as it randomly assigns a metadata server to each directory level.
          insideBIGDATA Guide to Data Platforms for Artificial Intelligence and Deep Learning – Part 4      Cache   Translate Page      
data platforms for artificial intelligenceWith AI and DL, storage is cornerstone to handling the deluge of data constantly generated in today’s hyperconnected world. It is a vehicle that captures and shares data to create business value. In this technology guide, insideBIGDATA Guide to Data Platforms for Artificial Intelligence and Deep Learning, we’ll see how current implementations for AI and DL applications can be deployed using new storage architectures and protocols specifically designed to deliver data with high-throughput, low-latency and maximum concurrency.
          Writing better code with pytorch and einops      Cache   Translate Page      

Writing better code with pytorch and einops
Writing better code with pytorch and einops Rewriting building blocks of deep learning

Below are some fragments of code taken from official tutorials and popular repositories (fragments taken for educational purposes, sometimes shortened). For each fragment an enhanced version proposed with comments.

In most examples, einops was used to make things less complicated. But you'll also find some common recommendations and practices to improve the code.

Left: as it was, Right : improved version

# start from importing some stuff import torch import torch.nn as nn import torch.nn.functional as F import numpy as np import math from einops import rearrange, reduce, asnumpy, parse_shape from einops.layers.torch import Rearrange, Reduce Simple ConvNet class Net(nn.Module): def __init__(self): super(Net, self).__init__() self.conv1 = nn.Conv2d(1, 10, kernel_size=5) self.conv2 = nn.Conv2d(10, 20, kernel_size=5) self.conv2_drop = nn.Dropout2d() self.fc1 = nn.Linear(320, 50) self.fc2 = nn.Linear(50, 10) def forward(self, x): x = F.relu(F.max_pool2d(self.conv1(x), 2)) x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2)) x = x.view(-1, 320) x = F.relu(self.fc1(x)) x = F.dropout(x, training=self.training) x = self.fc2(x) return F.log_softmax(x, dim=1) conv_net_old = Net()

conv_net_new = nn.Sequential( nn.Conv2d(1, 10, kernel_size=5), nn.MaxPool2d(kernel_size=2), nn.ReLU(), nn.Conv2d(10, 20, kernel_size=5), nn.MaxPool2d(kernel_size=2), nn.ReLU(), nn.Dropout2d(), Rearrange('b c h w -> b (c h w)'), nn.Linear(320, 50), nn.ReLU(), nn.Dropout(), nn.Linear(50, 10), nn.LogSoftmax(dim=1) )

Reasons to prefer new code:

in the original code if input size is changed and batch size is divisible by 16 (that's usualy so), we'll get something senseless after reshaping new code explicitly drops error in this case we won't forget to use dropout with flag self.training with new version code is straightforward to read and analyze sequential makes printing / saving / passing trivial. And there is no need in your code to load a model ... and we could also add inplace for ReLU Super-resolution class SuperResolutionNetOld(nn.Module): def __init__(self, upscale_factor): super(SuperResolutionNetOld, self).__init__() self.relu = nn.ReLU() self.conv1 = nn.Conv2d(1, 64, (5, 5), (1, 1), (2, 2)) self.conv2 = nn.Conv2d(64, 64, (3, 3), (1, 1), (1, 1)) self.conv3 = nn.Conv2d(64, 32, (3, 3), (1, 1), (1, 1)) self.conv4 = nn.Conv2d(32, upscale_factor ** 2, (3, 3), (1, 1), (1, 1)) self.pixel_shuffle = nn.PixelShuffle(upscale_factor) def forward(self, x): x = self.relu(self.conv1(x)) x = self.relu(self.conv2(x)) x = self.relu(self.conv3(x)) x = self.pixel_shuffle(self.conv4(x)) return x

def SuperResolutionNetNew(upscale_factor): return nn.Sequential( nn.Conv2d(1, 64, kernel_size=5, padding=2), nn.ReLU(inplace=True), nn.Conv2d(64, 64, kernel_size=3, padding=1), nn.ReLU(inplace=True), nn.Conv2d(64, 32, kernel_size=3, padding=1), nn.ReLU(inplace=True), nn.Conv2d(32, upscale_factor ** 2, kernel_size=3, padding=1), Rearrange('b (h2 w2) h w -> b (h h2) (w w2)', h2=upscale_factor, w2=upscale_factor), )

Here is the difference:

no need in special instruction pixel_shuffle (and result is transferrable between frameworks) output doesn't contain a fake axis (and we could do the same for the input) inplace ReLU used now, for high resolution pictures that becomes critical and saves us much memory and all the benefits of nn.Sequential again Restyling Gram matrix for style transfer

Original code is already good - its first line shows expected tensor shape

einsum operation should be read like: for each batch and for each pair of channels, we sum over h and w. I've also changed normalization, because that's how Gram matrix is defined, otherwise we should call it normalized Gram matrix or alike def gram_matrix_old(y): (b, ch, h, w) = y.size() features = y.view(b, ch, w * h) features_t = features.transpose(1, 2) gram = features.bmm(features_t) / (ch * h * w) return gram def gram_matrix_new(y): b, ch, h, w = y.shape return torch.einsum('bchw,bdhw->bcd', [y, y]) / (h * w)

It would be great to use just 'b c1 h w,b c2 h w->b c1 c2' , but einsum supports only one-letter axes

Recurrent model

All we did here is just made information about shapes explicit to skip deciphering

class RNNModelOld(nn.Module): """Container module with an encoder, a recurrent module, and a decoder.""" def __init__(self, ntoken, ninp, nhid, nlayers, dropout=0.5): super(RNNModel, self).__init__() self.drop = nn.Dropout(dropout) self.encoder = nn.Embedding(ntoken, ninp) self.rnn = nn.LSTM(ninp, nhid, nlayers, dropout=dropout) self.decoder = nn.Linear(nhid, ntoken) def forward(self, input, hidden): emb = self.drop(self.encoder(input)) output, hidden = self.rnn(emb, hidden) output = self.drop(output) decoded = self.decoder(output.view(output.size(0)*output.size(1), output.size(2))) return decoded.view(output.size(0), output.size(1), decoded.size(1)), hidden

class RNNModelNew(nn.Module): """Container module with an encoder, a recurrent module, and a decoder.""" def __init__(self, ntoken, ninp, nhid, nlayers, dropout=0.5): super(RNNModel, self).__init__() self.drop = nn.Dropout(p=dropout) self.encoder = nn.Embedding(ntoken, ninp) self.rnn = nn.LSTM(ninp, nhid, nlayers, dropout=dropout) self.decoder = nn.Linear(nhid, ntoken) def forward(self, input, hidden): t, b = input.shape emb = self.drop(self.encoder(input)) output, hidden = self.rnn(emb, hidden) output = rearrange(self.drop(output), 't b nhid -> (t b) nhid') decoded = rearrange(self.decoder(output), '(t b) token -> t b token', t=t, b=b) return decoded, hidden

Channel shuffle (from shufflenet) def channel_shuffle_old(x, groups): batchsize, num_channels, height, width = x.data.size() channels_per_group = num_channels // groups # reshape x = x.view(batchsize, groups, channels_per_group, height, width) # transpose # - contiguous() required if transpose() is used before view(). # See https://github.com/pytorch/pytorch/issues/764 x = torch.transpose(x, 1, 2).contiguous() # flatten x = x.view(batchsize, -1, height, width) return x

def channel_shuffle_new(x, groups): return rearrange(x, 'b (c1 c2) h w -> b (c2 c1) h w', c1=groups)

While progress is obvious, this is not the limit. As you'll see below, we don't even need to write these couple of lines.

Shufflenet from collections import OrderedDict def channel_shuffle(x, groups): batchsize, num_channels, height, width = x.data.size() channels_per_group = num_channels // groups # reshape x = x.view(batchsize, groups, channels_per_group, height, width) # transpose # - contiguous() required if transpose() is used before view(). # See https://github.com/pytorch/pytorch/issues/764 x = torch.transpose(x, 1, 2).contiguous() # flatten x = x.view(batchsize, -1, height, width) return x class ShuffleUnitOld(nn.Module): def __init__(self, in_channels, out_channels, groups=3, grouped_conv=True, combine='add'): super(ShuffleUnitOld, self).__init__() self.in_channels = in_channels self.out_channels = out_channels self.grouped_conv = grouped_conv self.combine = combine self.groups = groups self.bottleneck_channels = self.out_channels // 4 # define the type of ShuffleUnit if self.combine == 'add': # ShuffleUnit Figure 2b self.depthwise_stride = 1 self._combine_func = self._add elif self.combine == 'concat': # ShuffleUnit Figure 2c self.depthwise_stride = 2 self._combine_func = self._concat # ensure output of concat has the same channels as # original output channels. self.out_channels -= self.in_channels else: raise ValueError("Cannot combine tensors with \"{}\"" \ "Only \"add\" and \"concat\" are" \ "supported".format(self.combine)) # Use a 1x1 grouped or non-grouped convolution to reduce input channels # to bottleneck channels, as in a ResNet bottleneck module. # NOTE: Do not use group convolution for the first conv1x1 in Stage 2. self.first_1x1_groups = self.groups if grouped_conv else 1 self.g_conv_1x1_compress = self._make_grouped_conv1x1( self.in_channels, self.bottleneck_channels, self.first_1x1_groups, batch_norm=True, relu=True ) # 3x3 depthwise convolution followed by batch normalization self.depthwise_conv3x3 = conv3x3( self.bottleneck_channels, self.bottleneck_channels, stride=self.depthwise_stride, groups=self.bottleneck_channels) self.bn_after_depthwise = nn.BatchNorm2d(self.bottleneck_channels) # Use 1x1 grouped convolution to expand from # bottleneck_channels to out_channels self.g_conv_1x1_expand = self._make_grouped_conv1x1( self.bottleneck_channels, self.out_channels, self.groups, batch_norm=True, relu=False ) @staticmethod def _add(x, out): # residual connection return x + out @staticmethod def _concat(x, out): # concatenate along channel axis return torch.cat((x, out), 1) def _make_grouped_conv1x1(self, in_channels, out_channels, groups, batch_norm=True, relu=False): modules = OrderedDict() conv = conv1x1(in_channels, out_channels, groups=groups) modules['conv1x1'] = conv if batch_norm: modules['batch_norm'] = nn.BatchNorm2d(out_channels) if relu: modules['relu'] = nn.ReLU() if len(modules) > 1: return nn.Sequential(modules) else: return conv def forward(self, x): # save for combining later with output residual = x if self.combine == 'concat': residual = F.avg_pool2d(residual, kernel_size=3, stride=2, padding=1) out = self.g_conv_1x1_compress(x) out = channel_shuffle(out, self.groups) out = self.depthwise_conv3x3(out) out = self.bn_after_depthwise(out) out = self.g_conv_1x1_expand(out) out = self._combine_func(residual, out) return F.relu(out) class ShuffleUnitNew(nn.Module): def __init__(self, in_channels, out_channels, groups=3, grouped_conv=True, combine='add'): super().__init__() first_1x1_groups = groups if grouped_conv else 1 bottleneck_channels = out_channels // 4 self.combine = combine if combine == 'add': # ShuffleUnit Figure 2b self.left = Rearrange('...->...') # identity depthwise_stride = 1 else: # ShuffleUnit Figure 2c self.left = nn.AvgPool2d(kernel_size=3, stride=2, padding=1) depthwise_stride = 2 # ensure output of concat has the same channels as original output channels. out_channels -= in_channels assert out_channels > 0 self.right = nn.Sequential( # Use a 1x1 grouped or non-grouped convolution to reduce input channels # to bottleneck channels, as in a ResNet bottleneck module. conv1x1(in_channels, bottleneck_channels, groups=first_1x1_groups), nn.BatchNorm2d(bottleneck_channels), nn.ReLU(inplace=True), # channel shuffle Rearrange('b (c1 c2) h w -> b (c2 c1) h w', c1=groups), # 3x3 depthwise convolution followed by batch conv3x3(bottleneck_channels, bottleneck_channels, stride=depthwise_stride, groups=bottleneck_channels), nn.BatchNorm2d(bottleneck_channels), # Use 1x1 grouped convolution to expand from # bottleneck_channels to out_channels conv1x1(bottleneck_channels, out_channels, groups=groups), nn.BatchNorm2d(out_channels), ) def forward(self, x): if self.combine == 'add': combined = self.left(x) + self.right(x) else: combined = torch.cat([self.left(x), self.right(x)], dim=1) return F.relu(combined, inplace=True)

Rewriting the code helped to identify:

There is no sense in doing reshuffling and not using groups in the first (indeed, I in the paper it is not so). However, this is equivalent model. It is also strange that first convolution may be not grouped, while last convolution is always grouped (and that is different from the paper)

Other comments:

You've probably noticed that there is an identity layer for pytorch introduced here The last thing left is get rid of conv1x1 and conv3x3 in the code - those are not better than standard Simplifying ResNet class ResNetOld(nn.Module): def __init__(self, block, layers, num_classes=1000): self.inplanes = 64 super(ResNetOld, self).__init__() self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False) self.bn1 = nn.BatchNorm2d(64) self.relu = nn.ReLU(inplace=True) self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1) self.layer1 = self._make_layer(block, 64, layers[0]) self.layer2 = self._make_layer(block, 128, layers[1], stride=2) self.layer3 = self._make_layer(block, 256, layers[2], stride=2) self.layer4 = self._make_layer(block, 512, layers[3], stride=2) self.avgpool = nn.AvgPool2d(7, stride=1) self.fc = nn.Linear(512 * block.expansion, num_classes) for m in self.modules(): if isinstance(m, nn.Conv2d): n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels m.weight.data.normal_(0, math.sqrt(2. / n)) elif isinstance(m, nn.BatchNorm2d): m.weight.data.fill_(1) m.bias.data.zero_() def _make_layer(self, block, planes, blocks, stride=1): downsample = None if stride != 1 or self.inplanes != planes * block.expansion: downsample = nn.Sequential( nn.Conv2d(self.inplanes, planes * block.expansion, kernel_size=1, stride=stride, bias=False), nn.BatchNorm2d(planes * block.expansion), ) layers = [] layers.append(block(self.inplanes, planes, stride, downsample)) self.inplanes = planes * block.expansion for i in range(1, blocks): layers.append(block(self.inplanes, planes)) return nn.Sequential(*layers) def forward(self, x): x = self.conv1(x) x = self.bn1(x) x = self.relu(x) x = self.maxpool(x) x = self.layer1(x) x = self.layer2(x) x = self.layer3(x) x = self.layer4(x) x = self.avgpool(x) x = x.view(x.size(0), -1) x = self.fc(x) return x def make_layer(inplanes, planes, block, n_blocks, stride=1): downsample = None if stride != 1 or inplanes != planes * block.expansion: # output size won't match input, so adjust residual downsample = nn.Sequential( nn.Conv2d(inplanes, planes * block.expansion, kernel_size=1, stride=stride, bias=False), nn.BatchNorm2d(planes * block.expansion), ) return nn.Sequential( block(inplanes, planes, stride, downsample), *[block(planes * block.expansion, planes) for _ in range(1, n_blocks)] ) def ResNetNew(block, layers, num_classes=1000): e = block.expansion resnet = nn.Sequential( Rearrange('b c h w -> b c h w', c=3, h=224, w=224), nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False), nn.BatchNorm2d(64), nn.ReLU(inplace=True), nn.MaxPool2d(kernel_size=3, stride=2, padding=1), make_layer(64, 64, block, layers[0], stride=1), make_layer(64 * e, 128, block, layers[1], stride=2), make_layer(128 * e, 256, block, layers[2], stride=2), make_layer(256 * e, 512, block, layers[3], stride=2), # combined AvgPool and view in one averaging operation Reduce('b c h w -> b c', 'mean'), nn.Linear(512 * e, num_classes), ) # initialization for m in resnet.modules(): if isinstance(m, nn.Conv2d): n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels m.weight.data.normal_(0, math.sqrt(2. / n)) elif isinstance(m, nn.BatchNorm2d): m.weight.data.fill_(1) m.bias.data.zero_() return resnet

Things that were changed

make_layer

Improving RNN language modelling class RNNOld(nn.Module): def __init__(self, vocab_size, embedding_dim, hidden_dim, output_dim, n_layers, bidirectional, dropout): super().__init__() self.embedding = nn.Embedding(vocab_size, embedding_dim) self.rnn = nn.LSTM(embedding_dim, hidden_dim, num_layers=n_layers, bidirectional=bidirectional, dropout=dropout) self.fc = nn.Linear(hidden_dim*2, output_dim) self.dropout = nn.Dropout(dropout) def forward(self, x): #x = [sent len, batch size] embedded = self.dropout(self.embedding(x)) #embedded = [sent len, batch size, emb dim] output, (hidden, cell) = self.rnn(embedded) #output = [sent len, batch size, hid dim * num directions] #hidden = [num layers * num directions, batch size, hid dim] #cell = [num layers * num directions, batch size, hid dim] #concat the final forward (hidden[-2,:,:]) and backward (hidden[-1,:,:]) hidden layers #and apply dropout hidden = self.dropout(torch.cat((hidden[-2,:,:], hidden[-1,:,:]), dim=1)) #hidden = [batch size, hid dim * num directions] return self.fc(hidden.squeeze(0)) class RNNNew(nn.Module): def __init__(self, vocab_size, embedding_dim, hidden_dim, output_dim, n_layers, bidirectional, dropout): super().__init__() self.embedding = nn.Embedding(vocab_size, embedding_dim) self.rnn = nn.LSTM(embedding_dim, hidden_dim, num_layers=n_layers, bidirectional=bidirectional, dropout=dropout) self.dropout = nn.Dropout(dropout) self.directions = 2 if bidirectional else 1 self.fc = nn.Linear(hidden_dim * self.directions, output_dim) def forward(self, x): #x = [sent len, batch size] embedded = self.dropout(self.embedding(x)) #embedded = [sent len, batch size, emb dim] output, (hidden, cell) = self.rnn(embedded) hidden = rearrange(hidden, '(layer dir) b c -> layer b (dir c)', dir=self.directions) # take the final layer's hidden return self.fc(self.dropout(hidden[-1])) original code misbehaves for non-bidirectional models and fails when bidirectional = False, and there is only one layer modification of the code shows both how hidden is structured and how it is modified Writing FastText faster class FastTextOld(nn.Module): def __init__(self, vocab_size, embedding_dim, output_dim): super().__init__() self.embedding = nn.Embedding(vocab_size, embedding_dim) self.fc = nn.Linear(embedding_dim, output_dim) def forward(self, x): #x = [sent len, batch size] embedded = self.embedding(x) #embedded = [sent len, batch size, emb dim] embedded = embedded.permute(1, 0, 2) #embedded = [batch size, sent len, emb dim] pooled = F.avg_pool2d(embedded, (embedded.shape[1], 1)).squeeze(1) #pooled = [batch size, embedding_dim] return self.fc(pooled)

def FastTextNew(vocab_size, embedding_dim, output_dim): return nn.Sequential( Rearrange('t b -> t b'), nn.Embedding(vocab_size, embedding_dim), Reduce('t b c -> b c', 'mean'), nn.Linear(embedding_dim, output_dim), Rearrange('b c -> b c'), )

Some comments on new code:

Rearrange('b t -> t b'),

CNNs for text classification class CNNOld(nn.Module): def __init__(self, vocab_size, embedding_dim, n_filters, filter_sizes, output_dim, dropout): super().__init__() self.embedding = nn.Embedding(vocab_size, embedding_dim) self.conv_0 = nn.Conv2d(in_channels=1, out_channels=n_filters, kernel_size=(filter_sizes[0],embedding_dim)) self.conv_1 = nn.Conv2d(in_channels=1, out_channels=n_filters, kernel_size=(filter_sizes[1],embedding_dim)) self.conv_2 = nn.Conv2d(in_channels=1, out_channels=n_filters, kernel_size=(filter_sizes[2],embedding_dim)) self.fc = nn.Linear(len(filter_sizes)*n_filters, output_dim) self.dropout = nn.Dropout(dropout) def forward(self, x): #x = [sent len, batch size] x = x.permute(1, 0) #x = [batch size, sent len] embedded = self.embedding(x) #embedded = [batch size, sent len, emb dim] embedded = embedded.unsqueeze(1) #embedded = [batch size, 1, sent len, emb dim] conved_0 = F.relu(self.conv_0(embedded).squeeze(3)) conved_1 = F.relu(self.conv_1(embedded).squeeze(3)) conved_2 = F.relu(self.conv_2(embedded).squeeze(3)) #conv_n = [batch size, n_filters, sent len - filter_sizes[n]] pooled_0 = F.max_pool1d(conved_0, conved_0.shape[2]).squeeze(2) pooled_1 = F.max_pool1d(conved_1, conved_1.shape[2]).squeeze(2) pooled_2 = F.max_pool1d(conved_2, conved_2.shape[2]).squeeze(2) #pooled_n = [batch size, n_filters] cat = self.dropout(torch.cat((pooled_0, pooled_1, pooled_2), dim=1)) #cat = [batch size, n_filters * len(filter_sizes)] return self.fc(cat) class CNNNew(nn.Module): def __init__(self, vocab_size, embedding_dim, n_filters, filter_sizes, output_dim, dropout): super().__init__() self.embedding = nn.Embedding(vocab_size, embedding_dim) self.convs = nn.ModuleList([ nn.Conv1d(embedding_dim, n_filters, kernel_size=size) for size in filter_sizes ]) self.fc = nn.Linear(len(filter_sizes) * n_filters, output_dim) self.dropout = nn.Dropout(dropout) def forward(self, x): x = rearrange(x, 't b -> t b') emb = rearrange(self.embedding(x), 't b c -> b c t') pooled = [reduce(conv(emb), 'b c t -> b c', 'max') for conv in self.convs] concatenated = rearrange(pooled, 'filter b c -> b (filter c)') return self.fc(self.dropout(F.relu(concatenated))) Original code misuses Conv2d, while Conv1d is the right choice Fixed code can work with any number of filter_sizes (and won't fail) First line in new code does nothing, but was added for simplicity

Highway convolutions Highway convolutions are common in TTS systems. Code below makes splitting a bit more explicit. Splitting policy may eventually turn out to be important if input had previously groups over channel axes (group convolutions or bidirectional LSTMs/GRUs) Same applies to GLU and gated units in general

class HighwayConv1dOld(nn.Conv1d): def forward(self, inputs): L = super(HighwayConv1dOld, self).forward(inputs) H1, H2 = torch.chunk(L, 2, 1) # chunk at the feature dim torch.sigmoid_(H1) return H1 * H2 + (1.0 - H1) * inputs

class HighwayConv1dNew(nn.Conv1d): def forward(self, inputs): L = super().forward(inputs) H1, H2 = rearrange(L, 'b (split c) t -> split b c t', split=2) torch.sigmoid_(H1) return H1 * H2 + (1.0 - H1) * inputs

Tacotron's CBHG module class CBHG_Old(nn.Module): """CBHG module: a recurrent neural network composed of: - 1-d convolution banks - Highway networks + residual connections - Bidirectional gated recurrent units """ def __init__(self, in_dim, K=16, projections=[128, 128]): super(CBHG, self).__init__() self.in_dim = in_dim self.relu = nn.ReLU() self.conv1d_banks = nn.ModuleList( [BatchNormConv1d(in_dim, in_dim, kernel_size=k, stride=1, padding=k // 2, activation=self.relu) for k in range(1, K + 1)]) self.max_pool1d = nn.MaxPool1d(kernel_size=2, stride=1, padding=1) in_sizes = [K * in_dim] + projections[:-1] activations = [self.relu] * (len(projections) - 1) + [None] self.conv1d_projections = nn.ModuleList( [BatchNormConv1d(in_size, out_size, kernel_size=3, stride=1, padding=1, activation=ac) for (in_size, out_size, ac) in zip( in_sizes, projections, activations)]) self.pre_highway = nn.Linear(projections[-1], in_dim, bias=False) self.highways = nn.ModuleList( [Highway(in_dim, in_dim) for _ in range(4)]) self.gru = nn.GRU( in_dim, in_dim, 1, batch_first=True, bidirectional=True) def forward_old(self, inputs): # (B, T_in, in_dim) x = inputs # Needed to perform conv1d on time-axis # (B, in_dim, T_in) if x.size(-1) == self.in_dim: x = x.transpose(1, 2) T = x.size(-1) # (B, in_dim*K, T_in) # Concat conv1d bank outputs x = torch.cat([conv1d(x)[:, :, :T] for conv1d in self.conv1d_banks], dim=1) assert x.size(1) == self.in_dim * len(self.conv1d_banks) x = self.max_pool1d(x)[:, :, :T] for conv1d in self.conv1d_projections: x = conv1d(x) # (B, T_in, in_dim) # Back to the original shape x = x.transpose(1, 2) if x.size(-1) != self.in_dim: x = self.pre_highway(x) # Residual connection x += inputs for highway in self.highways: x = highway(x) # (B, T_in, in_dim*2) outputs, _ = self.gru(x) return outputs def forward_new(self, inputs, input_lengths=None): x = rearrange(inputs, 'b t c -> b c t') _, _, T = x.shape # Concat conv1d bank outputs x = rearrange([conv1d(x)[:, :, :T] for conv1d in self.conv1d_banks], 'bank b c t -> b (bank c) t', c=self.in_dim) x = self.max_pool1d(x)[:, :, :T] for conv1d in self.conv1d_projections: x = conv1d(x) x = rearrange(x, 'b c t -> b t c') if x.size(-1) != self.in_dim: x = self.pre_highway(x) # Residual connection x += inputs for highway in self.highways: x = highway(x) # (B, T_in, in_dim*2) outputs, _ = self.gru(self.highways(x)) return outputs

There is still a large room for improvements, but in this example only forward function was changed

Simple attention

Good news: there is no more need to guess order of dimensions. Neither for inputs nor for outputs

class Attention(nn.Module): def __init__(self): super(Attention, self).__init__() def forward(self, K, V, Q): A = torch.bmm(K.transpose(1,2), Q) / np.sqrt(Q.shape[1]) A = F.softmax(A, 1) R = torch.bmm(V, A) return torch.cat((R, Q), dim=1) def attention(K, V, Q): _, n_channels, _ = K.shape A = torch.einsum('bct,bcl->btl', [K, Q]) A = F.softmax(A * n_channels ** (-0.5), 1) R = torch.einsum('bct,btl->bcl', [V, A]) return torch.cat((R, Q), dim=1) Transformer's attention needs more attention class ScaledDotProductAttention(nn.Module): ''' Scaled Dot-Product Attention ''' def __init__(self, temperature, attn_dropout=0.1): super().__init__() self.temperature = temperature self.dropout = nn.Dropout(attn_dropout) self.softmax = nn.Softmax(dim=2) def forward(self, q, k, v, mask=None): attn = torch.bmm(q, k.transpose(1, 2)) attn = attn / self.temperature if mask is not None: attn = attn.masked_fill(mask, -np.inf) attn = self.softmax(attn) attn = self.dropout(attn) output = torch.bmm(attn, v) return output, attn class MultiHeadAttentionOld(nn.Module): ''' Multi-Head Attention module ''' def __init__(self, n_head, d_model, d_k, d_v, dropout=0.1): super().__init__() self.n_head = n_head self.d_k = d_k self.d_v = d_v self.w_qs = nn.Linear(d_model, n_head * d_k) self.w_ks = nn.Linear(d_model, n_head * d_k) self.w_vs = nn.Linear(d_model, n_head * d_v) nn.init.normal_(self.w_qs.weight, mean=0, std=np.sqrt(2.0 / (d_model + d_k))) nn.init.normal_(self.w_ks.weight, mean=0, std=np.sqrt(2.0 / (d_model + d_k))) nn.init.normal_(self.w_vs.weight, mean=0, std=np.sqrt(2.0 / (d_model + d_v))) self.attention = ScaledDotProductAttention(temperature=np.power(d_k, 0.5)) self.layer_norm = nn.LayerNorm(d_model) self.fc = nn.Linear(n_head * d_v, d_model) nn.init.xavier_normal_(self.fc.weight) self.dropout = nn.Dropout(dropout) def forward(self, q, k, v, mask=None): d_k, d_v, n_head = self.d_k, self.d_v, self.n_head sz_b, len_q, _ = q.size() sz_b, len_k, _ = k.size() sz_b, len_v, _ = v.size() residual = q q = self.w_qs(q).view(sz_b, len_q, n_head, d_k) k = self.w_ks(k).view(sz_b, len_k, n_head, d_k) v = self.w_vs(v).view(sz_b, len_v, n_head, d_v) q = q.permute(2, 0, 1, 3).contiguous().view(-1, len_q, d_k) # (n*b) x lq x dk k = k.permute(2, 0, 1, 3).contiguous().view(-1, len_k, d_k) # (n*b) x lk x dk v = v.permute(2, 0, 1, 3).contiguous().view(-1, len_v, d_v) # (n*b) x lv x dv mask = mask.repeat(n_head, 1, 1) # (n*b) x .. x .. output, attn = self.attention(q, k, v, mask=mask) output = output.view(n_head, sz_b, len_q, d_v) output = output.permute(1, 2, 0, 3).contiguous().view(sz_b, len_q, -1) # b x lq x (n*dv) output = self.dropout(self.fc(output)) output = self.layer_norm(output + residual) return output, attn class MultiHeadAttentionNew(nn.Module): def __init__(self, n_head, d_model, d_k, d_v, dropout=0.1): super().__init__() self.n_head = n_head self.w_qs = nn.Linear(d_model, n_head * d_k) self.w_ks = nn.Linear(d_model, n_head * d_k) self.w_vs = nn.Linear(d_model, n_head * d_v) nn.init.normal_(self.w_qs.weight, mean=0, std=np.sqrt(2.0 / (d_model + d_k))) nn.init.normal_(self.w_ks.weight, mean=0, std=np.sqrt(2.0 / (d_model + d_k))) nn.init.normal_(self.w_vs.weight, mean=0, std=np.sqrt(2.0 / (d_model + d_v))) self.fc = nn.Linear(n_head * d_v, d_model) nn.init.xavier_normal_(self.fc.weight) self.dropout = nn.Dropout(p=dropout) self.layer_norm = nn.LayerNorm(d_model) def forward(self, q, k, v, mask=None): residual = q q = rearrange(self.w_qs(q), 'b l (head k) -> head b l k', head=self.n_head) k = rearrange(self.w_ks(k), 'b t (head k) -> head b t k', head=self.n_head) v = rearrange(self.w_vs(v), 'b t (head v) -> head b t v', head=self.n_head) attn = torch.einsum('hblk,hbtk->hblt', [q, k]) / np.sqrt(q.shape[-1]) if mask is not None: attn = attn.masked_fill(mask[None], -np.inf) attn = torch.softmax(attn, dim=3) output = torch.einsum('hblt,hbtv->hblv', [attn, v]) output = rearrange(output, 'head b l v -> b l (head v)') output = self.dropout(self.fc(output)) output = self.layer_norm(output + residual) return output, attn

Benefits of new implementation

we have one module, not two now code does not fail for None mask the amount of caveats in the original code that we removed is huge. Try erasing comments and deciphering what happens there Self-attention GANs

SAGANs are currently SotA for image generation, and can be simplified using same tricks. If torch.einsum supported non-one letter axes, we could improve this solution further.

class Self_Attn_Old(nn.Module): """ Self attention Layer""" def __init__(self,in_dim,activation): super(Self_Attn_Old,self).__init__() self.chanel_in = in_dim self.activation = activation self.query_conv = nn.Conv2d(in_channels = in_dim , out_channels = in_dim//8 , kernel_size= 1) self.key_conv = nn.Conv2d(in_channels = in_dim , out_channels = in_dim//8 , kernel_size= 1) self.value_conv = nn.Conv2d(in_channels = in_dim , out_channels = in_dim , kernel_size= 1) self.gamma = nn.Parameter(torch.zeros(1)) self.softmax = nn.Softmax(dim=-1) # def forward(self, x): """ inputs : x : input feature maps( B X C X W X H) returns : out : self attention value + input feature attention: B X N X N (N is Width*Height) """ m_batchsize,C,width ,height = x.size() proj_query = self.query_conv(x).view(m_batchsize,-1,width*height).permute(0,2,1) # B X CX(N) proj_key = self.key_conv(x).view(m_batchsize,-1,width*height) # B X C x (*W*H) energy = torch.bmm(proj_query,proj_key) # transpose check attention = self.softmax(energy) # BX (N) X (N) proj_value = self.value_conv(x).view(m_batchsize,-1,width*height) # B X C X N out = torch.bmm(proj_value,attention.permute(0,2,1) ) out = out.view(m_batchsize,C,width,height) out = self.gamma*out + x return out,attention class Self_Attn_New(nn.Module): """ Self attention Layer""" def __init__(self, in_dim): super().__init__() self.query_conv = nn.Conv2d(in_dim, out_channels=in_dim//8, kernel_size=1) self.key_conv = nn.Conv2d(in_dim, out_channels=in_dim//8, kernel_size=1) self.value_conv = nn.Conv2d(in_dim, out_channels=in_dim, kernel_size=1) self.gamma = nn.Parameter(torch.zeros([1])) def forward(self, x): proj_query = rearrange(self.query_conv(x), 'b c h w -> b (h w) c') proj_key = rearrange(self.key_conv(x), 'b c h w -> b c (h w)') proj_value = rearrange(self.value_conv(x), 'b c h w -> b (h w) c') energy = torch.bmm(proj_query, proj_key) attention = F.softmax(energy, dim=2) out = torch.bmm(attention, proj_value) out = x + self.gamma * rearrange(out, 'b (h w) c -> b c h w', **parse_shape(x, 'b c h w')) return out, attention Improving time sequence prediction

While this example was considered to be simplistic, I had to analyze surrounding code to understand what kind of input was expected. You can try yourself.

One minor change done is now the code works with any dtype, not only double; and new code supports using GPU.

class SequencePredictionOld(nn.Module): def __init__(self): super(SequencePredictionOld, self).__init__() self.lstm1 = nn.LSTMCell(1, 51) self.lstm2 = nn.LSTMCell(51, 51) self.linear = nn.Linear(51, 1) def forward(self, input, future = 0): outputs = [] h_t = torch.zeros(input.size(0), 51, dtype=torch.double) c_t = torch.zeros(input.size(0), 51, dtype=torch.double) h_t2 = torch.zeros(input.size(0), 51, dtype=torch.double) c_t2 = torch.zeros(input.size(0), 51, dtype=torch.double) for i, input_t in enumerate(input.chunk(input.size(1), dim=1)): h_t, c_t = self.lstm1(input_t, (h_t, c_t)) h_t2, c_t2 = self.lstm2(h_t, (h_t2, c_t2)) output = self.linear(h_t2) outputs += [output] for i in range(future):# if we should predict the future h_t, c_t = self.lstm1(output, (h_t, c_t)) h_t2, c_t2 = self.lstm2(h_t, (h_t2, c_t2)) output = self.linear(h_t2) outputs += [output] outputs = torch.stack(outputs, 1).squeeze(2) return outputs class SequencePredictionNew(nn.Module): def __init__(self): super(SequencePredictionNew, self).__init__() self.lstm1 = nn.LSTMCell(1, 51) self.lstm2 = nn.LSTMCell(51, 51) self.linear = nn.Linear(51, 1) def forward(self, input, future=0): b, t = input.shape h_t, c_t, h_t2, c_t2 = torch.zeros(4, b, 51, dtype=self.linear.weight.dtype, device=self.linear.weight.device) outputs = [] for input_t in rearrange(input, 'b t -> t b ()'): h_t, c_t = self.lstm1(input_t, (h_t, c_t)) h_t2, c_t2 = self.lstm2(h_t, (h_t2, c_t2)) output = self.linear(h_t2) outputs += [output] for i in range(future): # if we should predict the future h_t, c_t = self.lstm1(output, (h_t, c_t)) h_t2, c_t2 = self.lstm2(h_t, (h_t2, c_t2)) output = self.linear(h_t2) outputs += [output] return rearrange(outputs, 't b () -> b t') Transforming spacial transformer network (STN) class SpacialTransformOld(nn.Module): def __init__(self): super(Net, self).__init__() # Spatial transformer localization-network self.localization = nn.Sequential( nn.Conv2d(1, 8, kernel_size=7), nn.MaxPool2d(2, stride=2), nn.ReLU(True), nn.Conv2d(8, 10, kernel_size=5), nn.MaxPool2d(2, stride=2), nn.ReLU(True) ) # Regressor for the 3 * 2 affine matrix self.fc_loc = nn.Sequential( nn.Linear(10 * 3 * 3, 32), nn.ReLU(True), nn.Linear(32, 3 * 2) ) # Initialize the weights/bias with identity transformation self.fc_loc[2].weight.data.zero_() self.fc_loc[2].bias.data.copy_(torch.tensor([1, 0, 0, 0, 1, 0], dtype=torch.float)) # Spatial transformer network forward function def stn(self, x): xs = self.localization(x) xs = xs.view(-1, 10 * 3 * 3) theta = self.fc_loc(xs) theta = theta.view(-1, 2, 3) grid = F.affine_grid(theta, x.size()) x = F.grid_sample(x, grid) return x class SpacialTransformNew(nn.Module): def __init__(self): super(Net, self).__init__() # Spatial transformer localization-network linear = nn.Linear(32, 3 * 2) # Initialize the weights/bias with identity transformation linear.weight.data.zero_() linear.bias.data.copy_(torch.tensor([1, 0, 0, 0, 1, 0], dtype=torch.float)) self.compute_theta = nn.Sequential( nn.Conv2d(1, 8, kernel_size=7), nn.MaxPool2d(2, stride=2), nn.ReLU(True), nn.Conv2d(8, 10, kernel_size=5), nn.MaxPool2d(2, stride=2), nn.ReLU(True), Rearrange('b c h w -> b (c h w)', h=3, w=3), nn.Linear(10 * 3 * 3, 32), nn.ReLU(True), linear, Rearrange('b (row col) -> b row col', row=2, col=3), ) # Spatial transformer network forward function def stn(self, x): grid = F.affine_grid(self.compute_theta(x), x.size()) return F.grid_sample(x, grid) new code will give reasonable errors when passed image size is different from expected if batch size is divisible by 18, whatever you input in the old code, it'll fail no sooner than affine_grid. Improving GLOW

That's a good old depth-to-space written manually!

Since GLOW is revertible, it will frequently rely on rearrange -like operations.

def unsqueeze2d_old(input, factor=2): assert factor >= 1 and isinstance(factor, int) factor2 = factor ** 2 if factor == 1: return input size = input.size() B = size[0] C = size[1] H = size[2] W = size[3] assert C % (factor2) == 0, "{}".format(C) x = input.view(B, C // factor2, factor, factor, H, W) x = x.permute(0, 1, 4, 2, 5, 3).contiguous() x = x.view(B, C // (factor2), H * factor, W * factor) return x def squeeze2d_old(input, factor=2): assert factor >= 1 and isinstance(factor, int) if factor == 1: return input size = input.size() B = size[0] C = size[1] H = size[2] W = size[3] assert H % factor == 0 and W % factor == 0, "{}".format((H, W)) x = input.view(B, C, H // factor, factor, W // factor, factor) x = x.permute(0, 1, 3, 5, 2, 4).contiguous() x = x.view(B, C * factor * factor, H // factor, W // factor) return x

def unsqueeze2d_new(input, factor=2): return rearrange(input, 'b (c h2 w2) h w -> b c (h h2) (w w2)', h2=factor, w2=factor) def squeeze2d_new(input, factor=2): return rearrange(input, 'b c (h h2) (w w2) -> b (c h2 w2) h w', h2=factor, w2=factor)

term squeeze isn't very helpful: which dimension is squeezed? There is torch.squeeze , but it's very different. in fact, we could skip creating functions completely Detecting problems in YOLO detection def YOLO_prediction_old(input, num_classes, num_anchors, anchors, stride_h, stride_w): bs = input.size(0) in_h = input.size(2) in_w = input.size(3) scaled_anchors = [(a_w / stride_w, a_h / stride_h) for a_w, a_h in anchors] prediction = input.view(bs, num_anchors, 5 + num_classes, in_h, in_w).permute(0, 1, 3, 4, 2).contiguous() # Get outputs x = torch.sigmoid(prediction[..., 0]) # Center x y = torch.sigmoid(prediction[..., 1]) # Center y w = prediction[..., 2] # Width h = prediction[..., 3] # Height conf = torch.sigmoid(prediction[..., 4]) # Conf pred_cls = torch.sigmoid(prediction[..., 5:]) # Cls pred. FloatTensor = torch.cuda.FloatTensor if x.is_cuda else torch.FloatTensor LongTensor = torch.cuda.LongTensor if x.is_cuda else torch.LongTensor # Calculate offsets for each grid grid_x = torch.linspace(0, in_w - 1, in_w).repeat(in_w, 1).repeat( bs * num_anchors, 1, 1).view(x.shape).type(FloatTensor) grid_y = torch.linspace(0, in_h - 1, in_h).repeat(in_h, 1).t().repeat( bs * num_anchors, 1, 1).view(y.shape).type(FloatTensor) # Calculate anchor w, h anchor_w = FloatTensor(scaled_anchors).index_select(1, LongTensor([0])) anchor_h = FloatTensor(scaled_anchors).index_select(1, LongTensor([1])) anchor_w = anchor_w.repeat(bs, 1).repeat(1, 1, in_h * in_w).view(w.shape) anchor_h = anchor_h.repeat(bs, 1).repeat(1, 1, in_h * in_w).view(h.shape) # Add offset and scale with anchors pred_boxes = FloatTensor(prediction[..., :4].shape) pred_boxes[..., 0] = x.data + grid_x pred_boxes[..., 1] = y.data + grid_y pred_boxes[..., 2] = torch.exp(w.data) * anchor_w pred_boxes[..., 3] = torch.exp(h.data) * anchor_h # Results _scale = torch.Tensor([stride_w, stride_h] * 2).type(FloatTensor) output = torch.cat((pred_boxes.view(bs, -1, 4) * _scale, conf.view(bs, -1, 1), pred_cls.view(bs, -1, num_classes)), -1) return output def YOLO_prediction_new(input, num_classes, num_anchors, anchors, stride_h, stride_w): raw_predictions = rearrange(input, 'b (anchor prediction) h w -> prediction b anchor h w', anchor=num_anchors, prediction=5 + num_classes) anchors = torch.FloatTensor(anchors).to(input.device) anchor_sizes = rearrange(anchors, 'anchor dim -> dim () anchor () ()') _, _, _, in_h, in_w = raw_predictions.shape grid_h = rearrange(torch.arange(in_h).float(), 'h -> () () h ()').to(input.device) grid_w = rearrange(torch.arange(in_w).float(), 'w -> () () () w').to(input.device) predicted_bboxes = torch.zeros_like(raw_predictions) predicted_bboxes[0] = (raw_predictions[0].sigmoid() + grid_w) * stride_w # center x predicted_bboxes[1] = (raw_predictions[1].sigmoid() + grid_h) * stride_h # center y predicted_bboxes[2:4] = (raw_predictions[2:4].exp()) * anchor_sizes # bbox width and height predicted_bboxes[4] = raw_predictions[4].sigmoid() # confidence predicted_bboxes[5:] = raw_predictions[5:].sigmoid() # class predictions # merging all predicted bboxes for each image return rearrange(predicted_bboxes, 'prediction b anchor h w -> b (anchor h w) prediction')

We changed and fixed a lot:

new code won't fail if input is not on the first GPU old code has wrong grid_x and grid_y for non-square images new code doesn't use replication when broadcasting is sufficient old code strangely sometimes takes .data , but this has no real effect, as some branches preserve gradient till the end if gradients not needed, torch.no_grad should be used, so it's redundant Simpler output for a bunch of pictures

Next time you need to output drawings of you generative models, you can use this trick

device = 'cpu' plt.imshow(np.transpose(vutils.make_grid(fake_batch.to(device)[:64], padding=2, normalize=True).cpu(),(1,2,0))) padded = F.pad(fake_batch[:64], [1, 1, 1, 1]) plt.imshow(rearrange(padded, '(b1 b2) c h w -> (b1 h) (b2 w) c', b1=8).cpu()) Instead of conclusion

Better code is a vague term; to be specific, things that are expected from code are:

reliable: does what expected and does not fail. Explicitly fails for wrong inputs readaility counts maintainable and modifiable reusable: understanding and modifying code should be easier than writing from scratch fast: in my measurements, proposed versions have speed similar to the original code

I've tried to demonstrate how you can improve these criteria for deep learning code. And einops helps you a lot.

Links pytorch and einops significant part of the code was taken from official examples and tutorials (references for other code are given in source of this html, if you're really curious) einops has a tutorial if you want a gentle introduction


          Recommender Systems using Deep Learning in PyTorch from scratch      Cache   Translate Page      

Recommender Systems using Deep Learning in PyTorch from scratch
Photo by Susan Yin on Unsplash

Recommender systems (RS) have been around for a long time, and recent advances in deep learning have made them even more exciting. Matrix factorization algorithms have been the workhorse of RS. In this article, I would assume that you are vaguely familiar with collaborative filtering based methods and have basic knowledge about training a neural network in PyTorch.

In this post, my goal is to show you how to implement a RS in PyTorch from scratch. The theory and model presented in this article were made available in this paper . Here is the GitHub repository for this article.

Problem Definition

Given a past record of movies seen by a user, we will build a recommender system that helps the user discover movies of their interest.

Specifically, given <userID, itemID> occurrence pairs, we need to generate a ranked list of movies for each user.

We model the problem as a binary classification problem , where we learn a function to predict whether a particular user will like a particular movie or not.


Recommender Systems using Deep Learning in PyTorch from scratch
Our model will learn thismapping Dataset

We use the MovieLens 100K dataset, which has 100,000 ratings from 1000 users on 1700 movies. The dataset can be downloaded from here .

The ratings are given to us in form of <userID,itemID, rating, timestamp> tuples. Each user has a minimum of 20 ratings.

Training

We drop the exact value of rating (1,2,3,4,5) and instead convert it to an implicit scenario i.e. any positive interaction is given value of 1. All other interactions are given a value of zero, by default.

Since we are training a classifier, we need both positive and negative samples. The records present in the dataset are counted as positive samples. We assume that all entries in the user-item interaction matrix are negative samples (a strong assumption, and easy to implement).

We randomly sample 4 items that are not interacted by the user, for every item interacted by the user. This way, if a user has 20 positive interactions, he will have 80 negative interactions. These negative interactions cannot contain any positive interaction by the user, though they may not be all unique due to random sampling.

Evaluation

We randomly sample 100 items that are not interacted by the user, ranking the test item among the 100 items. This same strategy is used in the paper, which is the inspiration for this post (referenced below). We truncate the ranked list at 10.

Since it is too time-consuming to rank all items for every user, for we will have to calculate 1000*1700 ~10 values. With this strategy, we need 1000*100 ~ 10 values, an order of magnitude less.

For each user, we use the latest rating(according to timestamp) in the test set, and we use the rest for training. This evaluation methodology is also known as leave-one-out strategy and is the same as used in the reference paper.

Metrics

We use Hit Ratio(HR), and Normalized Discounted Cumulative Gain(NDCG) to evaluate the performance for our RS.

Our model gives a confidence score between 0 and 1 for each item present in the test set for a given user. The items are sorted in decreasing order of their score, and top 10 items are given as recommendation. If the test item (which is only one for each user) is present in this list, HR is one for this user, else it is zero. The final HR is reported after averaging for all users. A similar calculation is done for NDCG.

While training, we will be minimizing the cross-entropy loss, which is the standard loss function for a classification problem. The real strength of RS lies in giving a ranked list of top-k items, which a user is most likely to interact. Think about why you mostly click on google search results only on the first page, and never go to other pages. Metrics like NDCG and HR help in capturing this phenomenon by indicating the quality of our ranked lists. Here is a good introduction on evaluating recommender systems .

Baseline: Item Popularity model

A baseline model is one we use to provide a first cut, easy, non-sophisticated solution to the problem. In much of use cases for recommender systems, recommending the same list of most popular items to all users gives a tough to beat baseline.

In the GitHub repository, you will also find the code for implementing item popularity model from scratch. Below are the results for the baseline model.

Deep Learning basedmodel

With all the fancy architecture and jargon of neural networks, we aim to beat this item popularity model.

Our next model is a deep multi-layer perceptron (MLP). The input to the model is userID and itemID, which is fed into an embedding layer. Thus, each user and item is given an embedding. There are multiple dense layers afterward, followed by a single neuron with a sigmoid activation. The exact model definition can be found in the file MLP.py .

The output of the sigmoid neuron can be interpreted as the probability the user is likely to interact with an item. It is interesting to observe that we end up training a classifier for the task of recommendation.


Recommender Systems using Deep Learning in PyTorch from scratch
Figure 2: The architecture for Neural Collaborative Filtering

Our loss function is Binary Cross-entropy loss. We use Adam for gradient descent and L-2 norm for regularization.

Results

For the popularity based model, which takes less than 5 seconds to train, these are the scores:

HR = 0.4221 | NDCG = 0.2269

For the deep learning model, we obtain these results after nearly 30 epochs of training (~3 minutes on CPU):

HR = 0.6013 | NDCG = 0.3294

The results are exciting. There is a huge jump in metrics we care about. We observe a 30% reduction in error according to HR, which is huge. These numbers are obtained from a very coarse hyper-parameter tuning. It might still be possible to extract more juice by hyper-parameter optimization.

Conclusion

State of the art algorithms for matrix factorization, and much more, can be easily replicated using neural networks. For a non-neural perspective, read this excellent post about matrix factorization for recommender systems .

In this post, we saw how neural networks offer a straightforward way of building recommender systems. The trick is to think of recommendation problem as a classification prob
          Gannett Is Using Deep Learning to Determine Why Certain Ad Designs Work      Cache   Translate Page      
Gannett is turning to a form of artificial intelligence to design better online ads. The USA Today publisher recently rolled out a new internal platform that uses deep learning and computer vision to determine which images, colors and other design aspects work best in online ads across its dozens of local news sites. The company...
          NVIDIA's new TITAN RTX is a deep learning beast aimed at developers      Cache   Translate Page      
NVIDIA+has+just+announced+the+TITAN+RTX+%2D+dubbed+T%2DRex%2C+driven+by+NVIDIA+Turing+architecture+a


Next Page: 10000

Site Map 2018_01_14
Site Map 2018_01_15
Site Map 2018_01_16
Site Map 2018_01_17
Site Map 2018_01_18
Site Map 2018_01_19
Site Map 2018_01_20
Site Map 2018_01_21
Site Map 2018_01_22
Site Map 2018_01_23
Site Map 2018_01_24
Site Map 2018_01_25
Site Map 2018_01_26
Site Map 2018_01_27
Site Map 2018_01_28
Site Map 2018_01_29
Site Map 2018_01_30
Site Map 2018_01_31
Site Map 2018_02_01
Site Map 2018_02_02
Site Map 2018_02_03
Site Map 2018_02_04
Site Map 2018_02_05
Site Map 2018_02_06
Site Map 2018_02_07
Site Map 2018_02_08
Site Map 2018_02_09
Site Map 2018_02_10
Site Map 2018_02_11
Site Map 2018_02_12
Site Map 2018_02_13
Site Map 2018_02_14
Site Map 2018_02_15
Site Map 2018_02_15
Site Map 2018_02_16
Site Map 2018_02_17
Site Map 2018_02_18
Site Map 2018_02_19
Site Map 2018_02_20
Site Map 2018_02_21
Site Map 2018_02_22
Site Map 2018_02_23
Site Map 2018_02_24
Site Map 2018_02_25
Site Map 2018_02_26
Site Map 2018_02_27
Site Map 2018_02_28
Site Map 2018_03_01
Site Map 2018_03_02
Site Map 2018_03_03
Site Map 2018_03_04
Site Map 2018_03_05
Site Map 2018_03_06
Site Map 2018_03_07
Site Map 2018_03_08
Site Map 2018_03_09
Site Map 2018_03_10
Site Map 2018_03_11
Site Map 2018_03_12
Site Map 2018_03_13
Site Map 2018_03_14
Site Map 2018_03_15
Site Map 2018_03_16
Site Map 2018_03_17
Site Map 2018_03_18
Site Map 2018_03_19
Site Map 2018_03_20
Site Map 2018_03_21
Site Map 2018_03_22
Site Map 2018_03_23
Site Map 2018_03_24
Site Map 2018_03_25
Site Map 2018_03_26
Site Map 2018_03_27
Site Map 2018_03_28
Site Map 2018_03_29
Site Map 2018_03_30
Site Map 2018_03_31
Site Map 2018_04_01
Site Map 2018_04_02
Site Map 2018_04_03
Site Map 2018_04_04
Site Map 2018_04_05
Site Map 2018_04_06
Site Map 2018_04_07
Site Map 2018_04_08
Site Map 2018_04_09
Site Map 2018_04_10
Site Map 2018_04_11
Site Map 2018_04_12
Site Map 2018_04_13
Site Map 2018_04_14
Site Map 2018_04_15
Site Map 2018_04_16
Site Map 2018_04_17
Site Map 2018_04_18
Site Map 2018_04_19
Site Map 2018_04_20
Site Map 2018_04_21
Site Map 2018_04_22
Site Map 2018_04_23
Site Map 2018_04_24
Site Map 2018_04_25
Site Map 2018_04_26
Site Map 2018_04_27
Site Map 2018_04_28
Site Map 2018_04_29
Site Map 2018_04_30
Site Map 2018_05_01
Site Map 2018_05_02
Site Map 2018_05_03
Site Map 2018_05_04
Site Map 2018_05_05
Site Map 2018_05_06
Site Map 2018_05_07
Site Map 2018_05_08
Site Map 2018_05_09
Site Map 2018_05_15
Site Map 2018_05_16
Site Map 2018_05_17
Site Map 2018_05_18
Site Map 2018_05_19
Site Map 2018_05_20
Site Map 2018_05_21
Site Map 2018_05_22
Site Map 2018_05_23
Site Map 2018_05_24
Site Map 2018_05_25
Site Map 2018_05_26
Site Map 2018_05_27
Site Map 2018_05_28
Site Map 2018_05_29
Site Map 2018_05_30
Site Map 2018_05_31
Site Map 2018_06_01
Site Map 2018_06_02
Site Map 2018_06_03
Site Map 2018_06_04
Site Map 2018_06_05
Site Map 2018_06_06
Site Map 2018_06_07
Site Map 2018_06_08
Site Map 2018_06_09
Site Map 2018_06_10
Site Map 2018_06_11
Site Map 2018_06_12
Site Map 2018_06_13
Site Map 2018_06_14
Site Map 2018_06_15
Site Map 2018_06_16
Site Map 2018_06_17
Site Map 2018_06_18
Site Map 2018_06_19
Site Map 2018_06_20
Site Map 2018_06_21
Site Map 2018_06_22
Site Map 2018_06_23
Site Map 2018_06_24
Site Map 2018_06_25
Site Map 2018_06_26
Site Map 2018_06_27
Site Map 2018_06_28
Site Map 2018_06_29
Site Map 2018_06_30
Site Map 2018_07_01
Site Map 2018_07_02
Site Map 2018_07_03
Site Map 2018_07_04
Site Map 2018_07_05
Site Map 2018_07_06
Site Map 2018_07_07
Site Map 2018_07_08
Site Map 2018_07_09
Site Map 2018_07_10
Site Map 2018_07_11
Site Map 2018_07_12
Site Map 2018_07_13
Site Map 2018_07_14
Site Map 2018_07_15
Site Map 2018_07_16
Site Map 2018_07_17
Site Map 2018_07_18
Site Map 2018_07_19
Site Map 2018_07_20
Site Map 2018_07_21
Site Map 2018_07_22
Site Map 2018_07_23
Site Map 2018_07_24
Site Map 2018_07_25
Site Map 2018_07_26
Site Map 2018_07_27
Site Map 2018_07_28
Site Map 2018_07_29
Site Map 2018_07_30
Site Map 2018_07_31
Site Map 2018_08_01
Site Map 2018_08_02
Site Map 2018_08_03
Site Map 2018_08_04
Site Map 2018_08_05
Site Map 2018_08_06
Site Map 2018_08_07
Site Map 2018_08_08
Site Map 2018_08_09
Site Map 2018_08_10
Site Map 2018_08_11
Site Map 2018_08_12
Site Map 2018_08_13
Site Map 2018_08_15
Site Map 2018_08_16
Site Map 2018_08_17
Site Map 2018_08_18
Site Map 2018_08_19
Site Map 2018_08_20
Site Map 2018_08_21
Site Map 2018_08_22
Site Map 2018_08_23
Site Map 2018_08_24
Site Map 2018_08_25
Site Map 2018_08_26
Site Map 2018_08_27
Site Map 2018_08_28
Site Map 2018_08_29
Site Map 2018_08_30
Site Map 2018_08_31
Site Map 2018_09_01
Site Map 2018_09_02
Site Map 2018_09_03
Site Map 2018_09_04
Site Map 2018_09_05
Site Map 2018_09_06
Site Map 2018_09_07
Site Map 2018_09_08
Site Map 2018_09_09
Site Map 2018_09_10
Site Map 2018_09_11
Site Map 2018_09_12
Site Map 2018_09_13
Site Map 2018_09_14
Site Map 2018_09_15
Site Map 2018_09_16
Site Map 2018_09_17
Site Map 2018_09_18
Site Map 2018_09_19
Site Map 2018_09_20
Site Map 2018_09_21
Site Map 2018_09_23
Site Map 2018_09_24
Site Map 2018_09_25
Site Map 2018_09_26
Site Map 2018_09_27
Site Map 2018_09_28
Site Map 2018_09_29
Site Map 2018_09_30
Site Map 2018_10_01
Site Map 2018_10_02
Site Map 2018_10_03
Site Map 2018_10_04
Site Map 2018_10_05
Site Map 2018_10_06
Site Map 2018_10_07
Site Map 2018_10_08
Site Map 2018_10_09
Site Map 2018_10_10
Site Map 2018_10_11
Site Map 2018_10_12
Site Map 2018_10_13
Site Map 2018_10_14
Site Map 2018_10_15
Site Map 2018_10_16
Site Map 2018_10_17
Site Map 2018_10_18
Site Map 2018_10_19
Site Map 2018_10_20
Site Map 2018_10_21
Site Map 2018_10_22
Site Map 2018_10_23
Site Map 2018_10_24
Site Map 2018_10_25
Site Map 2018_10_26
Site Map 2018_10_27
Site Map 2018_10_28
Site Map 2018_10_29
Site Map 2018_10_30
Site Map 2018_10_31
Site Map 2018_11_01
Site Map 2018_11_02
Site Map 2018_11_03
Site Map 2018_11_04
Site Map 2018_11_05
Site Map 2018_11_06
Site Map 2018_11_07
Site Map 2018_11_08
Site Map 2018_11_09
Site Map 2018_11_10
Site Map 2018_11_11
Site Map 2018_11_12
Site Map 2018_11_13
Site Map 2018_11_14
Site Map 2018_11_15
Site Map 2018_11_16
Site Map 2018_11_17
Site Map 2018_11_18
Site Map 2018_11_19
Site Map 2018_11_20
Site Map 2018_11_21
Site Map 2018_11_22
Site Map 2018_11_23
Site Map 2018_11_24
Site Map 2018_11_25
Site Map 2018_11_26
Site Map 2018_11_27
Site Map 2018_11_28
Site Map 2018_11_29
Site Map 2018_11_30
Site Map 2018_12_01
Site Map 2018_12_02
Site Map 2018_12_03
Site Map 2018_12_04
Site Map 2018_12_05