Getting every researcher excited about adopting AI tools is something that we aim to do with our solutions, says Nishchay Shah, CTO, CACTUS
CXOToday has engaged in an exclusive interview with Mr. Nishchay Shah, CTO, CACTUS
- Tell us how is CACTUS contributing to the AI/Big Data Analytics industry and how the company is benefiting the clients? (Please briefly discuss the products/solutions you provide to your customers and how they benefit?)
CACTUS has been at the forefront of bringing AI and Big Data to the sci-tech space for over 5 years. Cactus Labs, our globally distributed remote teamwork, dedicatedly works on building AI solutions that solve for the new-age researcher. Our primary focus areas are Natural Language Processing, Applied Machine Learning, and Big Data. From language automation to document assessment solutions, we have been building algorithms and models based on a massive data lake, which is built on data accumulated over 2 decades of our existence in the industry.
By leveraging insights and intelligence from this database, we created Researcher.Life, an integrated suite of tools and services that is designed to simplify researchers’ lives by facilitating discovery, writing, literature surveillance, funding, and dissemination of research, with impact.
Part of this integrated suite are our three superhero products, powered by custom AI algorithms at their core: Discovery, Paperpal, and Mind The Graph.
With Discovery, we bring research to the researcher in real time by curating and recommending recent, relevant, and highly personalized content on an omnichannel platform which helps researchers save quite a lot of time while searching for literature to read. With over 100M published papers across 32K journals, our catalogue and hybrid concept mesh algorithms have been well accepted in the space, with our app crossing 2M downloads cumulatively across all mobile app stores and a healthy rating of 4.6+ on 5.
With Paperpal, we have built a real-time, subject-specific language correction solution that helps authors write research papers better and faster. We have leveraged large language models trained on millions of hours of work put in by professional editors on scientific manuscripts to create state-of-the-art accurate grammar error correction models tailored to academia. Paperpal is available as MS Word add-in and a web-based online editor. Paperpal helps researchers gain confidence in their writing. It aims to elevate the writing experience by combining machine learning with corpus intelligence from published academic literature. Whether the researcher writes on MS Word or the web editor, the language suggestions will automatically flow into the feed and academic vocabulary insights as well as high-quality translation support will be readily available, providing comprehensive academic writing assistance.
Mind the graph is our AI-powered self-service infographic maker which enables the researcher to create beautiful and scientifically accurate posters and illustrations. The internal content engine is built on the library of nearly 40k+ proprietary scientific figures spread across 80+ popular fields and is trusted by 100+ top academic, educational, and industrial institutions.
- Kindly mention some of the major challenges the company has faced till now in establishing AI solutions for science communicators.
Science communication is not a glamourous field like E-commerce or gaming. While it is one of the most important links between the researchers and rest of the world, it has its own set of challenges which impact how we build solutions for it.
#1 – Data
Research data, in volume, has become really huge, especially in the last 4-5 years. The challenge is not just the volume but the lack of standard sources for all the necessary information points. Lot of data sources are fragments and typically non-indexed. Disambiguation, deduplication, and building accurate data graphs and concept maps required quite a lot of heavy lifting before it is ready for consumption by even smaller ML models, let alone massive neural nets.
#2 – Adoption of AI and tech by researchers
In general, researchers use tools to a lesser extent when it comes to overall scientific communication and instead rely on service providers for tasks like proofreading, translation, and editing. While it may seem like we are offering a great product built using the latest and best AI tech available, the product may still not generate enough confidence among researchers because of the non-deterministic and black-box nature of some of these solutions. User interviews, product surveys, and explainability builds trust among researchers. It becomes exponentially complex when you add 1000+ different subject areas with researchers in several of these subject areas not relying on technology at all. Getting every researcher excited about adopting AI tools is something that we aim to do with our solutions. Incorporating their constant feedback into our products is what helps us create a community that helps the entire ecosystem. One such effort is our Researcher.Life ambassador program (https://www.cactusglobal.com/ambassador/).
#3 – Knowing what to build and what not to build
In the early days, experimentation was key to finding the one thing that sticks with the customers.
When you are ahead of the curve, you tend to think that building X will benefit you more, primarily to gain the first mover’s advantage. With AI, this advantage sometimes has repercussions and we did face those repercussions at times. While the technology has evolved by leaps and bounds in the last decade, the hype that has followed has set expectations very high. We live in a world of easy access to building and experimenting prototypes using cloud services, so it’s easy to be tempted to do everything at once. But this causes you to lose sight of what is relevant from the business perspective, as everyone has limited expert resources. Strong alignment of the AI strategy with the business vision is important from day 1 itself, when one starts building AI-powered products.
- Why do you think integrating AI into the researchers’ ecosystem is essential? What were the challenges faced while automating across CACTUS?
Not all researchers are tech savvy. Most of them still rely on age-old processes and systems, which causes a lot of heartache and in-turn reduces their research output over a period of time. It’s not just the researcher, but also the industry that is slow to adopt AI-powered tools.
A researcher’s journey is filled with lots of inefficient processes and outdated methods, which can be easily addressed using automation and AI. Did you know that poor language is still one of the top 10 reasons why manuscripts get rejected, resulting in an individual’s hard work not getting published? (source)
One would argue that we have so many language-related tools today, which helps us write better and fix grammatical errors. Well, formal language is a bit different and not all tools work well for such use cases. This is just the kind of case where infusing new developments in NLP, i.e., the transformer model architecture, can change the game altogether. It may seem trivial for non-researchers, but for researchers and academicians, removing poor language as a barrier to getting published helps save a lot of time, by removing the back and forth with journal editors and reviewers and helping them get published faster.
Similarly, for us at CACTUS, becoming a high-tech company was a journey and it took time. The journey made us realise the value of building tech capabilities, and while we tend to use AI to automate various parts of the business through in-house solutions, it certainly requires patience. Having a sound strategy around data and AI is not enough. One needs to account for a people strategy as well, which includes upskilling, communicating expectations effectively, and evangelizing the change AI brings to the table through initiatives such as hackathons and hands-on workshops.
- As the CTO, how has your role evolved over the years? What new developments can you expect in the next 5 years?
I started in CACTUS as Head of Technology, being hands on with code and infrastructure, and moved on to run product and design teams and started a new AI team from scratch. This year, I have started running the products business at CACTUS. I am part of the executive leadership team at CACTUS and play a central role in forming the company’s strategic direction for the future.
- The industry is witnessing an increase in the importance of business and technology enablers. How do you see these emerging technologies impact your business?
AI has been around since the 1970s. The key trigger for this intense and sudden spike in adoption of AI into businesses is the quantum of data being generated on a regular basis and the hardware catching up, in terms of computing power to analyse and crunch the data in a finite amount of time. Machine Learning, Natural Language Processing, Computer Vision, etc., are the subsets under the overarching umbrella of AI that have started penetrating and disrupting almost all industries.
The quality of Machine Translation five years ago was so poor that it was considered a wasted opportunity. A number of language pairs and additional dialects over and above made it a complex problem in itself. Translation service providers used to feel at ease, realising that in areas such as academia, where you need a 100% accurate translation so as to not lose out on any context, machines will never catch up. Cut to 2022, machine translation systems have evolved at such a phenomenal rate that for some language pairs, they give close to 100% accurate results for price sensitive demographies, especially college students, who are happy to use the technology instead of opting for a professional service. This in turn reduces business for the service provider. On the other hand, if you create such products, you are operating on new business models with a lower recurring cost and human intervention.
- What global trends can we foresee in the sci-tech space in the next 2 years?
A – Sci-tech space has undergone quite a lot of disruption since early the 2020s, when the pandemic hit the world. It also has to do with the fact that everyone realised the value of research reaching the masses quickly and being productionized. For the next 2 years, I would definitely want some of these global trends to emerge:
- OpenAccess Advocacy
- Making research accessible to everyone on the planet is already gaining a lot of traction, with lots of journals and publishers leaning towards the OpenAccess Initiative.
- It eventually unlocks a huge tranche of research data for everyone to build more robust solutions on top of it. Solving for specific scientific fields from where full-text papers and artifacts become available will create new, niche spaces in the existing sci-tech domain.
2. Tl;dr research
- We are already seeing researchers and scientists share their work on public social media platforms like Twitter and the like.
- Visual abstracts, short video reels, and infographics are becoming vital for the community to gain visibility into their work, and this trend should eventually overtake the number of formal papers and articles generated every year.
3. Searching relevant research becomes easier
- Researchers spend a lot of time and effort on literature search and surveys only to end up reading research artifacts that are irrelevant to them or their specific area of research.
- Better contextual search platforms, and adoption of products like R.Discovery in our case, makes me confident that this space will mature and stabilise in the next few years.