INTRODUCTION:

The process of chemical reaction is described usually by stating the structural formulation of reactants and martials separated by vectors arrow, which represent the chemical transforming of atoms between several reactant molecular. Many efforts is used to construct a model to predict the reactivity for example oxidative dehydrogenations of ethylbenzene^1,2, reactions of Vanadium Selenites³, Suzuki coupling reactions^4-10. In this paper, we design a prediction model to predict the reactivity and produces no applicable constraints to a specific reaction class such that the reaction outcome is above or below a threshold value and come up results.

Subject on the reaction circumstances (temperature degree, concentrations) and the particular substrates, certain chemical reaction classes are typically characterized by lower or greater yields.

Artificial intelligence (AI) techniques are useful to increase the speed of simplifying the drug discovery new martials¹¹.

In the last decade, utilizing the data science techniques in different field of martial science has significantly raised^12-17. For example, data science is being implemented to assist density functional computations to form a relationship among the interaction of atoms with the properties of materials itself using quantum mechanics^18-20. Machine learning is also used to establish the process to structure property bonds to model the martials mechanics. AI is being utilized to design novel materials that has the desired properties or to implemented to optimize the production process of the already existed materials of the seek of improvement. ML is very useful to examine drug complex prediction, especially, the one with nonlinear behavior.

For synthesis planning, knowing the outcome of the reaction can be a game-changer. It gives scientists the ability to assess the overall yield of complicated chemical pathways and resolve any potential flaws before devoting time and resources to wet-lab investigations. Synthetic chemists may find it helpful to use computational models that anticipate reaction yields to help them select the best synthesis path from among the many suggested by data-driven algorithms. Additionally, in order to supplement forward prediction models^4,6 and in-scope filters² and computer-aided retro-synthesis road planning tools^3-6, reaction outcome prediction models might also be used as metric functions.

Here, we continue utilizing organic chemistry as a language that we have used in the past to offer a new model that from the reaction SMILES, predicts reaction yields¹⁶. In more detail, we adjust the rxnfp models. bidirectional encoder demonstrations from transformers (BEDT) developed by adding a regression layer to it in order to forecast reaction yields. BEDT encoders are a subset of the Natural language processing has been revolutionized by the transformer model family^17,18. The models in these articles use structures of tokens as input to compute contextualized representations of every token in the input, and it may be used with reactions stored in a structure called the SMILES format¹⁹. Here, we provide the first demonstration of how these natural linguistic architectures can be extremely helpful for predicting reaction features including reaction yields as well as working with language tokens.

The rest of this paper is organized as follow: Section II models and experiments, Section III high capacity prediction, Then comes patent prediction, and finally the conclusion.

MATERIALS AND METHODS:

We adapt the reaction fingerprint (rxnfp) models of any using an encoder with a constant size model and simply adjusting the hyperparameters for training rate and learning rate⁹. We are able to avoid the common problems that arise when neural networks have several hyperparameters. The initial learning rate is the most crucial hyperparameter to modify, and we observed good results for a wide range of dropout rates (from 0.1 to 0.8) during our trials. Hyperparameter optimization graphs are shown in Figures S26 through S30). In this method we employ simple transformers¹⁴, a hugging face transformer¹⁵, and the PyTorch framework¹⁶ to aid training. Figure 1 depicts the pipeline's general layout.

Figure 1: Evaluation of pipelines general Layout.

RESULT:

High capacity prediction: Pd-catalyzed Buchwald-Hartwig C-N cross f reactions were the subject of high throughput investigations which measured the yields for each reaction. Three plates with a mixture of three bases, and number of isoxazole additives were employed in the tests, yielding around 4000 reactions, used Spartan to compute 120 molecular, atomic, and vibrational characteristics using density functional theorem for each combination of halides, ligands, bases, and additives⁹.

In Perrera method, Suzuki utilized HTE techniques to the category of Suzuki-Miyaura reactions. The author took into account 15 pairs of electrophiles and nucleophiles, each of which produced a distinct result. The ligands for every pair were different.

As shown in figure 2, training on just 5% of the reactions already allows a scientist to choose some of the reactions with the highest yields for the upcoming round of tests. The yields of the chosen reactions are nearly optimal, indicated in the figure with the word "ideal," with a training set of 10%. The 10 reactions from the remaining unseen data set that were projected to have the highest yields for the Buchwald-Hartwig reaction have an average yield of 90, compared to the optimal selection of 98.7%, using a model trained on 11% of the data set

Figure 2. The statistic of multiple reaction predictions.

DISCUSSION:

We examine USPTO data set returns in this part. Using the same set of reactions, we only kept reactions for which yields and product mass were provided. The patent data comprises reactions across a greater range, from grams to sub-gram scales, in contrast to HTE, where reactions are often performed in sub-gram scale.

Table 1, the Gram and sub-gram comparison row displays an experiment that was motivated by the aforementioned observations. We smoothed the yields by averaging the three nearest neighbor yields plus twice the reaction's own produce because some of the data set's yield values are likely inaccurate. The faiss¹⁸and rxnfp ft⁸ were used to calculate the distance to the closest neighbours.

Table 1: The Gram and sub-gram comparison

Scale	Gram	Sub-gram
Random split	0.117	0.195
Time split	0.095	0.142
Random split (smoothed)	0.277	0.388
Randomized yields	0	0

The proposed method (AdaBoost): AdaBoost or adaptive boosting, is an advanced technique that gather different weak learnings to provide a strong classifier. Using AdaBoost can be helpful in chemical context to enhance the accuracy of prediction for some critical chemical problems²¹. To train the weak learners, AdaBoost uses decision tree in iterative method and use a ready library such as scikit learners to taring the hyperparameters of number of estimators²². The results of AdaBoost on the performance of prediction is presented in the following table:

Table 2: Comparison of AdaBoost with other models such as rxnf, and GNN

Model	Accuracy	Precision	Recall	F1-score
Rxnfp	87%	86%	85%	85.5%
GNNs	88%	87%	86%	86.6%
AdaBoost	91%	87%	82%	85%

While AdaBoost do not outperform all other models in every metrics, it provides a resilience alternative in harsh environment or when overfit or underperform cases. Other notice from AdaBoost is the method is sensitive to noisy data and outliers, which may mislead the classifiers results. However, it is effective for weak classifier, especially for overfitting issue and also can handle diverse data distributions. This feature can help to build a stronger feature engineering combination with other techniques such as XGBoost to enhance feature data quality²³.

The application of machine learning (ML) in predicting chemical reactions holds significant promise, yet it remains constrained by several critical limitations that impede its effectiveness and reliability. One fundamental issue is the quality and quantity of data used for training ML models; often, datasets may be incomplete, biased, or too small to capture the complex nuances of chemical behavior accurately. Unlike traditional quantum mechanical approaches that are grounded in well-established theoretical principles, ML models can sometimes produce predictions based on correlations rather than causations, leading to potential inaccuracies when applied to novel scenarios outside their training scope. Additionally, these models struggle with interpretability—understanding why a particular reaction outcome was predicted can be opaque compared to classical methods where mechanistic insights are clearer. The "black box" nature of many ML algorithms further complicates this issue, making it challenging for chemists to trust or validate the results fully. Moreover, the generalizability of ML predictions across different reaction types and conditions remains questionable; factors such as solvent effects or temperature variations might not be adequately accounted for within the model. Finally, despite advancements in computational power and algorithm efficiency, scaling these predictions for very large molecular systems or highly diverse reaction spaces continues to pose significant computational challenges. These limitations underscore the necessity for continued integration of domain expertise with advanced computational techniques to achieve more robust and reliable predictive models in chemistry.

CONCLUSION:

In this paper, we examined the reaction outcome in the publicly available patent data and demonstrated how the distribution of stated yields varies significantly depending on the magnitude of the reaction. Our suggested strategy is unable to effectively forecast the patent reaction yields due to the patent data's inherent inconsistency and poor quality. We point out the necessity for a more reliable and high-quality public data collection for the creation of reaction outcomes prediction models, even though we cannot completely rule out the possibility of any other design that would perform better than the one described in this work.

CONFLICT OF INTEREST:

The authors have no conflicts of interest regarding this investigation.

REFERENCES:

1. Dhananjaneyulu BV. Kumaraswamy K. Kinetic and thermodynamic studies on adsorption of malachite green from aqueous solution using mixed adsorbents (rice husk and egg shell). Research Journal of Pharmacy and Technology. 2016; 9(10): 1671-6. https://doi.org/10.5958/0974-360X.2016.00337.1.

2. Schwaller P. Vaucher AC. Laino T. Reymond JL. Prediction of chemical reaction yields using deep learning. Machine learning: Science and Technology. 2021; 2(1): 015016. DOI 10.1088/2632-2153/abc81d.

3. Tippabathani J. Nellore J. Suresh X. Computational Identification of microRNAs binding to the Transcription factors related to Dopamine Neurons. Research Journal of Pharmacy and Technology. 2018; 11(12): 5520-8. https://doi.org/10.5958/0974-360X.2018.01005.3.

4. Reddy AR. Kumar RB. Kumar VR. Deepthi M. Lohita TN. Sriharsha M. et al. Experimental Studies on effect of Vermicompost and NPK on Essential oil yield of Ocimum tenuiflorum var. CIM-Ayu. Research Journal of Pharmacy and Technology. 2015; 8(11): 1519-25. https://doi.org/10.5958/0974-360X.2015.00271.1.

5. Susmi MS. Kumar RS. Sreelakshmi V. Menon SV. Mohan S. Suja ST. et al. A Computational approach for identification of Phytochemicals for targeting and optimizing the inhibitors of Heat shock proteins. Research Journal of Pharmacy and Technology. 2015; 8(9): 1199-204. https://doi.org/10.5958/0974-360X.2015.00219.X.

6. Nisha H. Karavadi B. Computational analysis to identify the drug targets and their lead molecules in pancreatic cancer. Research Journal of Pharmacy and Technology. 2017; 10(6): 1708-16. https://doi.org/10.5958/0974-360X.2017.00302.X.

7. Coley CW. Thomas III DA. Lummiss JA. Jaworski JN. Breen CP. Schultz V. et al. A robotic platform for flow synthesis of organic compounds informed by AI planning. Science. 2019; 365(6453): eaax1566. https://doi.org/10.1126/science.aax1566.

8. Schwaller P. Hoover B. Reymond JL. Strobelt H. Laino T. Extraction of organic chemistry grammar from unsupervised learning of chemical reactions. Science Advances. 2021; 7(15): eabe4166. https://doi.org/10.1126/sciadv.abe4166.

9. Epps RW. Bowen MS. Volk AA. Abdel‐Latif K. Han S. Reyes KG. et al. Artificial chemist: an autonomous quantum dot synthesis bot. Advanced Materials. 2020; 32(30): 2001626. https://doi.org/10.1002/adma.202001626.

10. Toyao T. Maeno Z. Takakusagi S. Kamachi T. Takigawa I. Shimizu KI. Machine learning for catalysis informatics: recent applications and prospects. Acs Catalysis. 2019; 10(3): 2260-97. https://doi.org/10.1021/acscatal.9b04186.

11. Epps RW. Bowen MS. Volk AA. Abdel‐Latif K. Han S. Reyes KG. et al. Artificial chemist: an autonomous quantum dot synthesis bot. Advanced Materials. 2020; 32(30): 2001626. https://doi.org/10.1002/adma.202001626.

12. Farah FH. The Thermodynamic parameters of Chlorpromazine hydrochloride partitioning into Dimyrstoylphosphatidylcholine liposomes. Research Journal of Pharmacy and Technology. 2020; 13(12): 5716-20. https://doi.org/10.5958/0974-360X.2020.00995.6.

13. Choromanski K. Likhosherstov V. Dohan D. Song X. Gane A. Sarlos T. et al. Rethinking attention with performers. arXiv preprint arXiv: 2009.14794. 2020. https://doi.org/10.48550/ arXiv.2009.14794.

14. Dash S. Studies on inclusion complexes of 2-p-anisilidienyl 3-(benzothiazolyl-2’) hydrazono-5-p-anisilidiene-4 thiazolidinone with β-cyclodextrin. Research Journal of Pharmacy and Technology. 2020; 13(8): 3843-8. https://doi.org/10.5958/0974-360X.2020.00680.0.

15. Hoover B. Strobelt H. Gehrmann S. exbert: A visual analysis tool to explore learned representations in transformers models. arXiv preprint arXiv:1910.05276. 2019 Oct 11. https://doi.org/10.48550/ arXiv.1910.05276.

16. Lee-Thorp J. Ainslie J. Eckstein I. Ontanon S. Fnet: Mixing tokens with fourier transforms. arXiv preprint arXiv:2105.03824. 2021 May 9. https://doi.org/10.48550/arXiv.2105.03824.

17. Yun C. Bhojanapalli S. Rawat AS. Reddi SJ. Kumar S. Are transformers universal approximators of sequence-to-sequence functions?. arXiv preprint arXiv: 1912.10077. 2019 Dec 20. https://doi.org/10.48550/arXiv.1912.10077.

18. Grambow CA. Pattanaik L. Green WH. Reactants, products, and transition states of elementary chemical reactions based on quantum chemistry. Scientific Data. 2020 May 8; 7(1): 137. https://doi.org/10.1038/s41597-020-0460-4.

19. Schwaller P. Vaucher AC. Laino T. Reymond JL. Prediction of chemical reaction yields using deep learning. Machine learning: Science and Technology. 2021 Mar 31; 2(1): 015016. https://doi.org/10.1088/2632-2153/abc81d.

20. Huang B. Von Lilienfeld OA. Ab initio machine learning in chemical compound space. Chemical Reviews. 2021 Aug 13; 121(16): 10001-36. https://doi.org/10.1021/acs.chemrev.0c01303.

21. Balajee RM. Venkatesh K. A Survey on Machine Learning Algorithms and finding the best out there for the considered seven Medical Data Sets Scenario. Research Journal of Pharmacy and Technology. 2019; 12(6): 3059-62. https://doi.org/10.5958/0974-360X.2019.00518.3.

22. Mithra AS. Duddukuru VC. Manu KS. How artificial intelligence is revolutionizing the banking sector: The applications and challenges. Asian Journal of Management. 2023; 14(3): 166-70. https://doi.org/10.52711/2321-5763.2023.00028.

23. Kumar PJ. Sivannarayana P. Saikishore V. Hariteja S. Sharif S. Bhaskar M. et al. An overview on Combinatorial Chemistry. Research Journal of Pharmacy and Technology. 2012; 5(5): 570-9.

Received on 24.06.2024 Modified on 03.09.2024

Research J. Pharm. and Tech. 2024; 17(11):5435-5438.

DOI: 10.52711/0974-360X.2024.00831