Leveraging NLP for Multi-Service Recommendation Generation | IEEE Xplore

Leveraging NLP for Multi-Service Recommendation Generation

PDF

Framework of NLP-Based Recommendation

Abstract:

In this study, we examine the potential of language models for natural language processing (NLP)-based recommendations, with a distinct focus on predicting users’ next pr...Show More

Abstract:

In this study, we examine the potential of language models for natural language processing (NLP)-based recommendations, with a distinct focus on predicting users’ next product purchases based on their prior purchasing patterns. Our model specifically harnesses tokenized rather than complete product names for learning. This granularity allows for a refined understanding of the interrelations among different products. For instance, items like ‘Chocolate Milk’ and ‘Coffee Milk’ find linkage through the shared token ‘Milk.’ Additionally, we explored the impact of various n-grams (unigrams, bigrams, and trigrams) in tokenization to further refine our understanding of product relationships and recommendation efficacy. This nuanced method paves the way for generating product names that might not exist in current retail settings, exemplified by concoctions like ‘Coffee Chocolate Milk.’ Such potential offerings can provide retailers with fresh product brainstorming opportunities. Furthermore, scrutiny of the frequency of these generated product name tokens can reveal prospective trends in purchasing keywords. This facilitates enterprises in creative brainstorming of novel products and swiftly responding to the dynamic demands and trends of consumers. The datasets used in this study come from UK e-Commerce and Instacart Data, comprising 71,205 and 166,440 rows, respectively. This investigation juxtaposes the NLP-based recommendation model, which employs tokenization, with its non-tokenized counterpart, leveraging Hit-Rate and mean reciprocal rank (MRR) as evaluative benchmarks. The outcomes distinctly favor the tokenized NLP-based recommendation model across all evaluated metrics.
Framework of NLP-Based Recommendation
Published in: IEEE Access ( Volume: 12)
Page(s): 14260 - 14274
Date of Publication: To be verified
Electronic ISSN: 2169-3536

Funding Agency:


SECTION I.

Introduction

Recently, there has been significant research on natural language processing (NLP) in the field of artificial intelligence, leading to rapid advancements in text processing technology. A significant factor contributing to this growth is the latest advancements in deep learning, which have elevated NLP performance to unprecedented levels, drawing considerable attention from the academia and industry [1], [2]. The increasing focus on NLP research can be attributed to several factors, but the emergence of innovative algorithms and models, such as the Transformer, BERT, and GPT, has played a pivotal role [3], [4]. Advancements in NLP technology, which enables machines to understand, generate, and communicate text, offer new opportunities across various industry sectors. Such NLP technologies offer businesses critical assistance in enhancing information processing efficiency, improving customer service, and devising new marketing strategies. Given this potential, there has been an intensified interest in NLP research within the industry, with both researchers and enterprises striving for further refinement and expansion of NLP capabilities.

The development of recommendation systems is increasingly recognized as a crucial application area for NLP [5]. Text sequences, such as news or articles, can provide essential information for recommending highly relevant content to users through the analysis of meaning or themes using NLP. Furthermore, by analyzing user reviews and feedback, one can discern user preferences and reactions, and integrating this information into a recommendation algorithm enables personalized content recommendation [6]. Notably, a crucial intersection between NLP and recommendation systems is the utilization of sequence information [7]. Sequence data, such as user behavior patterns, purchase history, and search records, accumulate over time, allowing for a precise reflection of a user’s recent actions and evolving preferences. NLP research deeply explores the context of text, allowing for a precise understanding of the meanings of sentences or documents derived from sequential patterns. Sequence modeling techniques have been developed to accurately predict specific or subsequent words within a given context.

The aim of this study was to extend the application of text sequence processing techniques beyond merely predicting the next word to forecasting what product will be purchased next. Specifically, based on a user’s purchase history, we develop a methodology that allows recommendation systems to accurately predict and suggest the product that the user is most likely to purchase next. Traditional research on recommendation systems incorporating NLP has focused on sequential recommendation, characterizing recommendations derived from product-level learning [5], [7], [8], [9]. This has led to a significant oversight: the nuanced connections shared by products with common elements are often overlooked when analysis is conducted without tokenizing product names. For example, ‘Chocolate Milk’ and ‘Strawberry Milk’ are treated as distinct entities, although they share a common ‘Milk’ component. This limitation in the current body of research underlines the need for a more sophisticated approach that we address in this study.

To fully harness the advantages of NLP research, we posit that analyzing product names at the token level will facilitate a more detailed learning of product sequences. By tokenizing product names, such as ‘Chocolate’ ‘Milk’ and ‘Strawberry’ ‘Milk’, they can be linked through the common token ‘Milk’, enabling the system to recognize a broader range of associations and relations. This approach of interlinking various product name tokens and learning from their interactions is illustrated in Figure 1 and serves as the core motivation of our study.

FIGURE 1. - Interaction between product names with and without tokenization.
FIGURE 1.

Interaction between product names with and without tokenization.

Tokenization in NLP is the critical process of segmenting text into its constituent elements, or tokens, which can be as small as words or as significant as phrases. Each token represents a fundamental unit of meaning, crucial for the machine’s understanding and analysis of language. This step is especially challenging given the diverse and complex grammatical structures present across languages [10]. The methodology chosen for tokenization directly impacts the performance of NLP tasks, with rule-based methods often applied to straightforward texts and more advanced statistical or neural network approaches employed for intricate linguistic patterns. Subword tokenization is particularly adept at discerning meaningful components within words, an aspect essential for processing languages with rich morphology [11]. In the context of recommendation systems, traditional research has frequently neglected the intricate semantic connections between products, primarily due to non-tokenized approaches. This oversight has led to a practical impact where systems fail to recognize the shared attributes of products, limiting the personalization and relevance of recommendations. To address this gap, we employ N-gram techniques—specifically unigrams, bigrams, and trigrams—as these methods allow for different granularities in recognizing patterns within product names. The unigram model considers individual tokens, while bigrams and trigrams account for pairs and triplets of tokens, providing insight into immediate sequential relationships [12]. Through comparative analysis of these techniques, we aim to determine which yields the most accurate reflection of consumer behavior and, consequently, more precise recommendations. By dissecting product names into tokens, we anticipate uncovering a deeper layer of consumer preference that enhances the personalization of the recommendation process. We aim to explore which tokenization method yields better performance in product name processing. By adopting a token-level approach in product name learning, we expect to achieve a more granular understanding that could lead to more relevant recommendations. However, it is acknowledged that token-based modeling may occasionally generate non-existent product names. For instance, if the ‘Chocolate’ token is followed by ‘Milk’ or ‘Chip’, it aligns with existing products, but a combination like ‘Chocolate Strawberry Milk’ represents a novel, albeit non-existent, product name. This mirrors the hallucination issue commonly found in large language models (LLM) like GPT [13]. While these may seem like errors, they also represent creative opportunities for product development and brainstorming. Furthermore, the ability to predict product trends using NLP-based recommendation systems is a significant advancement. Since the primary objective of recommendation systems is to forecast a user’s next purchase, these novel product names can be viewed as predictive data, offering insights into future trends. Treating tokenized product names as keywords and analyzing their frequency allows us to propose new services based on predicted keyword trends.

Hence, this study aims to enhance recommendation performance by focusing on the tokenization of product names and exploring diverse service generation strategies. Our primary contributions are substantial: we have created a novel NLP-based recommendation model that operates at the token level of product names. Additionally, we are conducting experiments to further improve performance by comparing token levels using the n-grams approach. Utilizing this model, we have introduced two innovative services: new product brainstorming and keyword trend forecasting. These contributions demonstrate our commitment to advancing the field of NLP and its applications in recommendation systems, overcoming the limitations of previous methodologies, and setting new benchmarks for personalized, predictive commerce.

SECTION II.

Related Work

To facilitate the development of NLP-based recommendation and its utilization in diverse service generation, we briefly review several pertinent studies.

A. Natural Language Processing

The field of NLP has seen significant advancements over the past several decades. Initially dominated by rule-based and statistical methods, a major shift occurred with the advent of deep learning techniques. In particular, the introduction of the Transformer architecture has greatly altered the paradigm of the NLP field. The Transformer was first introduced in a paper titled “Attention Is All You Need” in 2017 [14]. The traditional recurrent neural network (RNN) sequentially processes the elements of an input sequence, while long short-term memory (LSTM) is a variant of RNN designed to address long-term dependency issues. However, unlike these traditional RNN and LSTM architectures, the Transformer introduced the self-attention mechanism, allowing the processing of the entire sentence simultaneously without sequential information processing. This enables a more effective understanding of the relationships among words located at various positions within the text. The structure of the Transformer has had a significant impact on two major areas of NLP: natural language understanding (NLU) and natural language generation (NLG) [15], [16], [17]. NLU focuses on the process of computers understanding and interpreting human language; recognizing entities, relationships, and intents within sentences; understanding context; and analyzing meaning. Advanced models based on encoding layers are primarily used in this process, with the BERT model being exemplary. BERT has demonstrated outstanding performance in various NLP tasks by leveraging the advantages of the Transformer architecture to the fullest extent [18]. Meanwhile, NLG focuses on the process of computers generating human-readable natural language text. This involves utilizing machine learning algorithms, template-based systems, etc. to convert data or information into natural text. Advanced models based on decoding layers are primarily utilized in this field, with the GPT model being exemplary. GPT, which is also based on the Transformer architecture, exhibits excellent performance in text generation [3], [19].

B. NLP-Based Recommendation

The evolution of recommendation systems has seen a shift from basic collaborative filtering and cold start solution strategies [20], [21], [22], [23] to utilizing complex semantic patterns in text data for fine-grained recommendations [5], [8], [9], [24]. BERT4Rec and similar architectures have shown promise in capturing dynamic user preferences through the analysis of historical behavior. These models, informed by the Transformer’s strengths, outperform traditional sequential neural networks [8]. The incorporation of semi-automatic annotation for sentiment analysis [25] and adapted feature selection algorithms [26] further improves the system’s ability to provide personalized, accurate recommendations. This approach aims to predict future user purchases by learning significant product name characteristics and patterns. These advancements, combined with the full strengths of the Transformer architecture, allow for a deeper understanding and prediction of future user purchases based on semantic information. This paper proposes a new approach, utilizing the full strengths of the Transformer architecture to learn significant characteristics and patterns of product names, aiming to predict future user purchases based on semantic information.HybridBERT4Rec utilizes BERT to extract the characteristics of user interactions with purchased items and provide recommendations based on this. This model sequences users’ historical interactions and is designed to better reflect users’ changing interests [24]. However, most of these studies treated the entire product name as a single unit without tokenization. Although there have been attempts to learn product names as tokens [27], they have the complex limitation of a dual structure that combines a the Transformer and Word2Vec. In this paper, we propose a new approach to recommendation systems, fully leveraging the strengths of the entire Transformer architecture. We explored a method for learning significant characteristics and patterns of product names using an encoder and generating new product names at the token level using a decoder. This approach aims to learn the semantic information of product names related to users’ past purchase patterns and, based on this, predict the product names that users are likely to purchase in the future, which is expected to be greatly beneficial.

C. Service Generation

Service generation is the process of designing or innovating new services. Although this approach can vary depending on the industry, field, or company, the overarching goal is to develop services that meet or exceed user expectations [28], [29]. In relation to this, NLP and recommendation systems have been established as key elements in enhancing user experience and innovating service delivery. Specifically, text data generated by users across various digital platforms contain valuable information regarding user preferences, behaviors, and expectations. Companies that deeply understand the significance of these data are exploring strategies for offering personalized services by integrating NLP and recommendation systems. This approach goes beyond merely analyzing user text data to provide tailored services, playing a crucial role in enhancing service quality, efficiency, and diversity.

Services leveraging NLP have contributed significantly to the provision of more precise and personalized services by utilizing users’ text data. Various studies have explored the possibilities and effects of service generation, showcasing the potential in this domain [30], [31], [32], [33]. Traditional research has focused on analyzing text data such as user reviews, feedback, and interactions to understand user preferences and behavior patterns, primarily to provide personalized recommendations. In addition, active research is being conducted to measure user similarities and content-based recommendations using user text data. However, service providers and platforms must customize their models for each specific task. For efficient model development, a strategy for generating diverse services using a single model is required. For instance, large language models like GPT provide various services, including sentence summarization, translation, and document generation [19], [34].

Prior research has highlighted several strategies for service generation using recommendation systems. First, personalized marketing campaigns using recommendation systems delve deeply into users’ past purchase histories and search patterns to provide personalized marketing messages or discount coupons, combining users’ past purchase histories and product preferences to offer personalized marketing campaigns [35]. Target marketing is the duality of product recommendations targeting users who are expected to purchase certain products [36]. Second, there has been significant interest in product bundle suggestions based on users’ purchase histories. For example, after a user purchases a digital camera, a recommendation system can suggest related products, such as memory cards or camera cases [37], [38]. Moreover, recommendation systems can offer bundling of products that appear together as a result [39]. Our study extends the insights of these earlier studies and introduces innovative service generation methods centered on NLP, aiming to explore new service generation strategies by maximizing the advantages of NLP-based recommendation systems.

SECTION III.

NLP-Based Recommendation

In this section, we propose an NLP-based recommendation that utilizes the Transformer to learn product names at the token level, and we validate its performance using a dataset where product name verification is possible.

A. Framework

A distinctive feature of this study is the tokenization of product names into individual tokens for the recommendation system, and the framework for the NLP-based recommendation is depicted in Figure 2. Initially, when a list of product names purchased by a user is input, the names are tokenized based on the space within them. For instance, a purchased product name ‘Chocolate Milk’ is tokenized into ‘Chocolate’ and ‘Milk’ based on the space. These tokenized product names are then trained as token sequences using the Transformer, which subsequently predicts the product names to purchase in tokens. As an example, when product names like ‘Chocolate’, ‘Milk’, ‘Fruit’, ‘Snack’, ‘Chocolate’, ‘Chip’, ‘Cookie’ are input, the Transformer’s training results in predictions such as ‘Strawberry’ and ‘Milk’. These tokenized predictions are then detokenized using spaces, converting ‘Strawberry’ and ‘Milk’ back into ‘Strawberry Milk’. During the training and prediction of product names in tokens, it is possible to generate product names that are not actually sold. As shown in Figure 3, using the Transformer could lead to the generation of a product name like ‘Coffee Chocolate Milk’, which is not available in actual stores. Because this impedes the recommendation system’s ability to suggest actual products, we check against a list of product names to verify the existence of the product and replace non-existent product names using similarity to find the most similar existing product name. To ensure the efficacy of our recommendation system in suggesting actual products, we chose the Jaccard similarity metric for its simplicity and computational efficiency. This contrasts with vector space models that require extensive computation; Jaccard similarity directly compares sets by quantifying the overlap between tokenized product names, which is particularly suitable for our use case. The complexity of product names varies greatly, and our focus is on the presence or absence of shared tokens rather than their frequency or order. The Jaccard similarity, calculated as the proportion of shared tokens to the total unique tokens in both product names, allows for accurate suggestions of existing product names when non-existent ones are generated.

FIGURE 2. - Framework of NLP-based recommendation.
FIGURE 2.

Framework of NLP-based recommendation.

FIGURE 3. - NLP-based recommendation process using product names with tokenization.
FIGURE 3.

NLP-based recommendation process using product names with tokenization.

The formula used for Jaccard similarity is as follows:

Jaccard(A,B)=|AB||AB|=|AB||A|+|B||AB|.(1)
View SourceRight-click on figure for MathML and additional features.\begin{equation*} Jaccard\left ({A, B }\right)= \frac {\left |{ A\cap B }\right |}{\left |{ A\cup B }\right |}= \frac {\left |{ A\cap B }\right |}{\left |{ A }\right |+\left |{ B }\right |- \left |{ A\cap B }\right | }. \tag{1}\end{equation*}

The output from the Transformer selects the token with the highest softmax probability as the first in the sequence and subsequently generates the next tokens in an autoregressive manner to form the product name. For instance, if the Transformer’s decoder first predicts ‘Chocolate’, it will then predict the ‘milk’ token to complete the product name. The subsequent prediction, excluding the highest probability ‘Chocolate’, will select the next highest probable token ‘Vanilla’, followed by ‘Almond’ and ‘Breeze’, thus forming a product name with tokens ‘Vanilla’, ‘Almond’, and ‘Breeze’, predicting the top-k product names in this manner. Through these steps, the final k recommended product names, trained and predicted as tokens, are derived.

B. Datasets

For our NLP-based recommendation experiments, we employed two datasets with distinguishable product names:

  • UK e-Commerce1: Sourced from a UK-based online retail platform. This dataset aggregates product names and purchase dates documented in user invoices. The platform predominantly features food items, daily essentials, and electronic appliances.

  • Instacart2: This dataset contains the transactional records of grocery orders, noting the specific week, time, and individual products ordered in the US.

A detailed breakdown of these raw datasets is presented in Table 1.

TABLE 1 Summary of the Raw Datasets
Table 1- 
Summary of the Raw Datasets

After examining the raw data, pre-processing was performed. First, we removed errors, such as missing values, from the dataset and chronologically listed the product names purchased by each user. Then, as shown in Figure 4, we grouped the product names into sets of five to form a row, using four product names for training and one product name as the label for each row. To address the issue of repeated product names in our expansive datasets, we utilized a data cleansing technique based on a previous work [9]. To obtain a more comprehensive dataset, we curated an additional collection of products by transferring two pairs of products from every five transactions.

FIGURE 4. - Example of dataset preprocessing.
FIGURE 4.

Example of dataset preprocessing.

Owing to computational constraints and to maintain coherence in data volume when juxtaposed with the UK e-commerce dataset, we opted to use only 1% of the Instacart dataset. The descriptions of the two datasets processed at the row level, as previously mentioned, are provided in Table 2. This study employed NLP-based recommendation, tokenizing product names for use. Four product names were allocated to Train and one to Label, and their average tokens are listed in the table. In our experiment, we allocated 80% of the rows for training, 10% for validation, and the remaining 10% for testing.

TABLE 2 Summary of the Datasets After Preprocessing
Table 2- 
Summary of the Datasets After Preprocessing

C. Evaluation Metrics

To evaluate the proposed model, we drew inspiration from established studies on recommendation system evaluation metrics [40], [41], [42]. These studies validated our choice of metrics and highlighted their significance in the current context. The following two metrics were employed to assess performance:

  • Hit-Rate: This metric is prevalent in recommendation systems. It gauges whether the top-K product names suggested to each user are aligned with the product name of their most recent purchase. A match within the top-K recommendations is considered a hit. The Hit-Rate is the ratio of users with hits to the total number of users. The corresponding formula is as follows:

    HitRate=#HitUsers#Users(2)
    View SourceRight-click on figure for MathML and additional features.\begin{equation*} Hit-Rate= \frac {\# Hit Users}{\# Users} \tag{2}\end{equation*}

  • Mean Reciprocal Rank (MRR): MRR quantifies the ranking of the last purchased product name within the top-K recommended list. A superior MRR indicates the success of the system in recommending relevant product names at the top positions. The formula for MRR is as follows:

    MRRk=1KKi=11ranki(3)
    View SourceRight-click on figure for MathML and additional features.\begin{equation*} {MRR}_{k}=\frac {1}{K}\sum \nolimits _{i=1}^{K} \frac {1}{rank_{i}} \tag{3}\end{equation*}

For a comprehensive evaluation, we present our results for both Hit-Rate and MRR at multiple K-values, specifically, K=3 , 5, 10, 15, and 20. This choice helps assess the robustness of the model across various recommendation list lengths. Higher metric values signify enhanced performance.

D. Comparison Model and Parameter Settings

To verify the superiority of NLP-based recommendation using product names at the token level, we set up the following comparison models:

  • Random: This approach recommends n products randomly selected from all products. Measurements were performed 100 times and then averaged.

  • NLP-based Recommendation with Tokenization: This model is the primary focus of our study, where product names are tokenized into individual tokens for analysis. The tokenization process involves breaking down complex product names into simpler, more manageable components, thereby allowing the system to learn and predict user preferences more accurately. The model utilizes the same hyperparameters as the non-tokenized version for a controlled comparison.

  • NLP-based Recommendation without Tokenization: As a contrast to our primary model, this version of the recommendation system processes product names as whole entities, without breaking them down into tokens. This model serves as a direct comparison to evaluate the added value of tokenization in improving recommendation accuracy.

  • NLP-based Recommendation using N-grams: In addition to the above models, we introduced a variant that utilizes n-gram tokenization. This model was tested with unigrams, bigrams, and trigrams to assess how different levels of token granularity impact the system’s performance. Similar to other models, this approach also uses the same set of hyperparameters to ensure consistency in the comparative analysis.

The selection of these specific models was driven by the need to comprehensively evaluate the effectiveness of our tokenization approach against varying baselines. The Random model provides a baseline to contrast the predictive power of our NLP models against random chance. The NLP-based Recommendation without Tokenization model serves to directly assess the incremental benefit provided by tokenization. The inclusion of N-grams further allows us to explore the depth of tokenization granularity and its impact on recommendation accuracy. These comparative models were chosen to provide a holistic understanding of our system’s performance across different levels of text processing complexity.

In our experiment, we elaborate on the parameter settings utilized in our experiments to ensure transparency and reproducibility. The selection and tuning of these parameters were guided by preliminary tests and literature benchmarks to optimize the performance of our NLP-based recommendation system.

  • Transformer Model Configuration: Our model utilized a Transformer architecture configured with a single layer (num_layers = 1), which was chosen to maintain a balance between complexity and computational efficiency. This layer count was found to be sufficient for capturing the nuances of our dataset while ensuring manageable training times.

  • Input and Output Space Dimensionality: We set the dimensionality of the input and output space to 128 (d_model = 128). This dimension was selected as it provides a good trade-off between model expressiveness and overfitting risk, considering the size and complexity of our datasets.

  • Attention Mechanism: The number of attention heads in the multi-head attention mechanism was set to 4 (num_heads = 4). This number allows the model to focus on different parts of the input sequence, improving its ability to learn from various patterns within the data.

  • Inner-Layer Dimensionality: The inner-layer dimensionality was set to 256 (units = 256), which determines the capacity of the feed-forward networks within the Transformer. This setting was chosen to provide sufficient model complexity for learning intricate relationships in the data.

  • Dropout Rate: To mitigate the risk of overfitting, we employed a dropout rate of 0.2 (dropout = 0.2) during training. This rate was optimized through cross-validation to ensure the model generalizes well to unseen data.

  • Training Epochs: The model was trained across 100 epochs (epochs = 100), with early stopping applied to cease training upon convergence. This approach ensures that the model is adequately trained without overfitting to the training data.

These parameters were meticulously selected and adjusted to align with the specific needs and characteristics of our datasets, ensuring that our model delivers robust and reliable recommendations.

E. Experimental Results

1) Comparison Between With and Without Tokenization

Our first focus was to emphasize the transformative impact of improving the overall efficiency of recommendation systems through token-level learning with product names. To ensure that our results were accurate and reliable, we utilized two key metrics: Hit-Rate and MRR. From Table 3, it is evident that infusing our recommendation system with a tokenization technique was not only beneficial but also provided a notable improvement. Taking the UK e-Commerce dataset as an example, when juxtaposed with its non-tokenized counterpart, the tokenized version exhibited a notable surge in performance. There is a significant enhancement rate, peaking at 65.9%. Similarly, experiments using the Instacart dataset yielded positive results. By employing tokenized product names in this particular dataset, we observed a surge in recommendation efficiency, with gains as high as 23.3%. To offer a holistic view of our results, we performed an in-depth examination of the Hit-Rate metric, placing special emphasis on the UK e-Commerce and Instacart datasets. Our insights from this are captured and visualized in Figure 5 for better clarity.

TABLE 3 Hit-Rate Comparison for NLP-Based Recommendation Without and With Tokenization
Table 3- 
Hit-Rate Comparison for NLP-Based Recommendation Without and With Tokenization
FIGURE 5. - Hit-Rate comparison with and without tokenization.
FIGURE 5.

Hit-Rate comparison with and without tokenization.

Moreover, MRR emerged as another pivotal metric in our study. This metric serves as a litmus test for recommendation systems and evaluating their performance based on the placement of the initial relevant recommendation. Specifically, as portrayed in Table 4, it became clear that models trained on tokenized product names outperformed their peers in the UK e-Commerce dataset, showing an excellent improvement rate of up to 30.0%. Conversely, on the Instacart dataset, the performance enhancement of the tokenized model was more tempered, albeit still notable, showing an improvement of up to 5.8%. For a side-by-side comparison, both datasets and their respective performances are illustrated in Figure 6. Considering both the HR and MRR metrics, with a magnified focus on the UK e-Commerce and Instacart datasets, it was evident that while harnessing tokenized product names had an unmistakable edge in the UK e-Commerce landscape, it was less effective on the Instacart dataset. The latter presented a more reserved performance, possibly attributed to its inherently vast diversity in both user and product range, introducing a myriad of complexities into the learning curve of the recommendation system. Consolidating our research findings, it is evident that NLP-based recommendation systems, especially those enhanced by product name tokenization, provide compelling capability and efficiency. The strategic use of tokenized product names not only provides benefits but also yields significant improvements in the system’s performance. This favorable outcome remained consistent regardless of the dataset or evaluation metric applied.

TABLE 4 Mean Reciprocal Rank Comparison for NLP-Based Recommendation Without and With Tokenization
Table 4- 
Mean Reciprocal Rank Comparison for NLP-Based Recommendation Without and With Tokenization
FIGURE 6. - MRR comparison with and without tokenization.
FIGURE 6.

MRR comparison with and without tokenization.

2) Comparison of Tokenization Across N-Grams

The second focal point of our experimentation was an in-depth analysis of the impact that various n-grams have on the performance of NLP-based recommendation systems. We scrutinized the efficiency of unigrams, bigrams, and trigrams in tokenizing product names to ascertain the most effective n-gram level for our tokenization strategy. Consistent with our methodology, we employed Hit-Rate and MRR metrics to measure accuracy. Initially, accuracy as depicted by the Hit-Rate in Table 5 demonstrated that both the UK e-Commerce and Instacart datasets achieved superior performance with unigram tokenization. The UK e-Commerce dataset showed a more noticeable variance in performance among the different n-grams, whereas the Instacart dataset exhibited less variation but nonetheless confirmed the superior performance of unigrams. This pattern is graphically represented in Figure 7, where unigrams take the lead, followed by bigrams and trigrams, in that order. The MRR metric mirrored these findings, with unigrams again showing the highest performance for both the UK e-Commerce and Instacart datasets, as corroborated by Table 6 and Figure 8. Across all our tests, unigrams consistently outperformed bigrams and trigrams. This phenomenon is elucidated in Figure 9, which illustrates the breadth of interrelationships formed through unigram tokenization. Building on the premise that tokenization enables more detailed learning of product names, our n-grams experiment further solidified the finding that finer-grained tokenization facilitates more precise learning. Unigrams, by splitting product names into single-word units, created an extensive network of interrelationships that could be leveraged for a more comprehensive learning of patterns. Thus, the approach of tokenizing product names has been robustly validated through the n-grams experiment.

TABLE 5 Hit-Rate Comparison by N-Grams
Table 5- 
Hit-Rate Comparison by N-Grams
TABLE 6 MRR Comparison by N-Grams
Table 6- 
MRR Comparison by N-Grams
FIGURE 7. - Hit-Rate comparison by n-grams.
FIGURE 7.

Hit-Rate comparison by n-grams.

FIGURE 8. - MRR comparison by n-grams.
FIGURE 8.

MRR comparison by n-grams.

FIGURE 9. - Interactions among product names by n-grams.
FIGURE 9.

Interactions among product names by n-grams.

SECTION IV.

Service Generation

The previously introduced NLP-based recommendation system can derive product name tokens by learning them at the token level. As shown in Figure 10, this feature allows us to propose two services: New Product Brainstorming utilizing non-existent product names and Keyword Trend Forecasting using frequency analysis of product name tokens.

FIGURE 10. - Service generation with NLP-based recommendation.
FIGURE 10.

Service generation with NLP-based recommendation.

A. New Product Brainstorming

In initiating our discussion on the New Product Brainstorming service, we wish to underscore the exploratory and illustrative purpose of this application. The service is presented not as an endpoint, but as a demonstration of the NLP-based system’s potential for broad service generation. It is a proof of concept that invites stakeholders to visualize the possibilities of NLP in product innovation. This aligns with our wider objective of highlighting the system’s versatility and serves as a catalyst for further creative exploration and development. While product names that do not exist in reality might be seen as mere mistakes or errors, they can be utilized to generate novel product name ideas through “Serendipity,” which refers to valuable discoveries or inventions made unintentionally through coincidences [43]. Particularly in the field of scientific research, there are many instances in which significant findings arise from experimental failure. A classic example is the discovery of penicillin by Fleming [44]. By mistakenly mixing in blue mold during a culture experiment, Fleming discovered an antimicrobial substance that was effective against infections. Other examples include the discovery of microwaves from melted chocolate and the birth of Post-it notes from a failed strong adhesive [45]. Based on these scientific cases, we aim to propose product names that do not exist in reality but are derived through the NLP-based recommendation system, as a foundation for brainstorming new product ideas.

1) UK E-Commerce Data

From a test set of 7,121 rows, we generated the top-20 product names, resulting in a total of 142,420 product names. By comparing the product names in the dataset with the generated names, we found that 202 (0.15%) product names were not present in the dataset. Examples of non-existing product names are displayed on the left side of Figure 11. On the right-hand side, the most similar product names derived using Jaccard similarity are listed. Some product names had inaccurately generated tokens related to size or color, resulting in outputs like ‘Red Spotty Luggage Tag’ instead of ‘Pink Spotty Luggage Tag’ and ‘Green Owl Soft Toy’ instead of ‘Pink Owl Soft Toy’. Although ‘Red Spotty Luggage Tag’ and ‘Green Owl Soft Toy’ are not present in the dataset, they are colors that customers might desire. Additionally, there were cases where product name tokens were incorrectly generated or entirely new product names emerged, such as ‘Tube Red Spotty Paper Plates’, ‘Black Greeting Card Holder’, and ‘Tutti Frutti Notebook Box’. Even if similar products are available in other stores, they can be considered novel product ideas based on the current store’s data. Specifically, to better understand unfamiliar names like ‘Tube Red Spotty Paper Plates’, we utilized DALL E 33 to generate images of the aforementioned product names, as depicted in Figure 12. In the future, by expanding the “new product brainstorming” service, we aim to offer new product development ideas by presenting both product names and images together.

FIGURE 11. - New product brainstorming based on the UK e-Commerce dataset.
FIGURE 11.

New product brainstorming based on the UK e-Commerce dataset.

FIGURE 12. - Expected images of new product brainstorming based on the UK e-Commerce dataset: (a) Tube Red Spotty Paper Plates (b) Black Greeting Card Holder (c) Tutti Frutti Notebook Box.
FIGURE 12.

Expected images of new product brainstorming based on the UK e-Commerce dataset: (a) Tube Red Spotty Paper Plates (b) Black Greeting Card Holder (c) Tutti Frutti Notebook Box.

2) Instacart Dataset

From a test set of 16,644 rows, we generated the top-20 product names, resulting in a total of 332,880 product names. By comparing the product names in the dataset with the generated names, we found that 1,020 (0.31%) product names were not present in the dataset. Examples of non-existing product names are displayed on the left side of Figure 13. On the right-hand side, the most similar product names, derived using Jaccard similarity, are listed. Some product names had inaccurately generated tokens related to flavor or ingredients, resulting in outputs like ‘Coffee Chocolate Milk’ instead of ‘Chocolate Milk’ and ‘Coconut Chocolate Chip Cookies’ instead of ‘Chocolate Chip Cookies’. Although ‘Coffee Chocolate Milk’ and ‘Coconut Chocolate Chip Cookies’ are not present in the dataset, they might represent flavors or variations that are appealing to customers. There were also instances in which product name tokens were generated in a different manner or entirely unique product names surfaced, such as ‘Vegetable Beef Franks’. Even if such products exist in other markets or stores, they can be viewed as fresh product concepts based on the current store’s data. For a clearer visualization of these potential products, we used DALL E 3 to generate images corresponding to the product names, as shown in Figure 14. Product names that do not exist in reality and are derived from a token can be used for new product brainstorming, serving as a support service for the product development department and company.

FIGURE 13. - New product brainstorming based on Instacart dataset.
FIGURE 13.

New product brainstorming based on Instacart dataset.

FIGURE 14. - Expected images of new product brainstorming based on the Instacart dataset: (a) Coffee Chocolate Milk (b) Coconut Chocolate Chip Cookies (c) Vegetable Beef franks.
FIGURE 14.

Expected images of new product brainstorming based on the Instacart dataset: (a) Coffee Chocolate Milk (b) Coconut Chocolate Chip Cookies (c) Vegetable Beef franks.

B. Keyword Trend Forecasting

Several studies have analyzed trends through keyword analysis, either by analyzing the frequency of words in a text to identify research trends or by predicting product sales and consumer purchase trends using product frequency analysis [46], [47]. The results of a recommendation system can be viewed as data that predict what users will purchase in the future based on their past purchases. In particular, the proposed NLP-based recommendation method outputs product names in token units; thus, these tokens can be considered as keywords of product elements. Therefore, we suggest a keyword trend forecasting service based on a frequency analysis of product keywords.

1) UK E-Commerce Data

Based on UK e-commerce data, an experiment was conducted using 7,121 test data entries to generate the top 20 product names. Consequently, a total of 142,420 product names were created. These product names were tokenized, and a frequency analysis was conducted based on these tokens. The results are presented in Table 7, where top keywords such as ‘Heart’, ‘Red’, and ‘Set’ were identified. Each entry in this test set has four data points representing a user’s past purchases and one label indicating the next expected purchase. The four learning data points represent a user’s purchase history, while the single label indicates the product that the user is expected to purchase next. By analyzing the frequency of all products corresponding to this label, we can predict which products will be popular in the future. The results are shown in Table 8, where notably, the ‘White Hanging Heart T-light Holder’ product appeared with the highest frequency. Combining the results in Tables 7 and 8, we observed that products containing specific keywords have high sales volumes. For instance, products with the keywords ‘Heart’, ‘White’, and ‘Hanging’ show significant sales. This suggests that keyword-based trend forecasting can significantly influence product sales strategies. In Figure 15, the top-50 keywords are visualized as a word cloud.

TABLE 7 Top-15 Keywords From NLP-Based Recommendation on the UK e-Commerce Dataset
Table 7- 
Top-15 Keywords From NLP-Based Recommendation on the UK e-Commerce Dataset
TABLE 8 Top-15 Products From Label in the UK e-Commerce Dataset
Table 8- 
Top-15 Products From Label in the UK e-Commerce Dataset
FIGURE 15. - Word cloud of the top-50 keywords from NLP-based recommendation on the UK e-Commerce dataset.
FIGURE 15.

Word cloud of the top-50 keywords from NLP-based recommendation on the UK e-Commerce dataset.

2) Instacart Data

Based on e-commerce data from Instacart, a test was conducted using a total of 16,644 rows of data to generate the top 20 product names. This resulted in 332,880 product names being generated. The produced product names were tokenized, and a frequency analysis was conducted based on these tokens. The results are presented in Table 9, where the top product names, such as ‘Banana’, ‘Bag of Organic Bananas’, and ‘Organic Strawberries’, were identified. Each row of this test set consists of four learning data points and one label. The four learning data points represent a user’s purchase history, while the single label indicates the product the user is expected to purchase next. Analyzing the frequency of all products corresponding to this label allows us to predict which products may be popular in the future. This result is shown in Table 10, where notably, the keyword ‘organic’ appeared with the highest frequency. Combining the results in Tables 9 and 10, we observed that products containing specific keywords have high sales. For instance, products containing the keywords ‘organic’, ‘milk’, and ‘whole’ showed significant sales volumes. This suggests that keyword-based trend forecasting can be instrumental in product sales strategies. In Figure 16, the top-50 keywords are visualized as a word cloud.

TABLE 9 Top-15 Keywords From NLP-Based Recommendation on the Instacart Dataset
Table 9- 
Top-15 Keywords From NLP-Based Recommendation on the Instacart Dataset
TABLE 10 Top-15 Products From Label in the Instacart Dataset
Table 10- 
Top-15 Products From Label in the Instacart Dataset
FIGURE 16. - Word cloud of the top-50 keywords from NLP-based recommendation on the Instacart dataset.
FIGURE 16.

Word cloud of the top-50 keywords from NLP-based recommendation on the Instacart dataset.

SECTION V.

Conclusion

The main finding of this study is the demonstrated efficacy of token-level analysis in NLP-based recommendation systems, leading to significant performance improvements in the context of e-commerce. Additionally, our exploration into n-grams, particularly the superior performance of unigram tokenization, further reinforces the effectiveness of fine-grained token analysis in enhancing recommendation system accuracy. The n-grams experiment, especially the comparative analysis of unigrams, bigrams, and trigrams, provided crucial insights into how different granularities of tokenization affect the predictive accuracy and utility of our NLP-based recommendation system, thus contributing to a deeper understanding of the nuances in natural language processing for e-commerce applications. In the modern e-commerce landscape, understanding consumer behavior requires the analysis of vast amounts of data, with particular emphasis on natural language data. In this study, we adopted a unique approach by utilizing product names at the token level, thereby experimentally validating the efficacy of this methodology. Using the UK e-Commerce and Instacart datasets, we measured the performance of an NLP-based recommendation algorithm that employs product names at the token level. The results confirmed its high efficacy. This underscores the potential of NLP technologies to move beyond mere data analysis and emphasize deep data insights and the provision of high-quality services. Furthermore, this study highlighted the significance of creating NLP-based services with a focus on offering services without the use of personal data. Our natural language learning approach, which analyzes product names at the token level, has demonstrated its potential to offer effective services while preserving user privacy. By focusing solely on product name tokens, rather than personal user data, we enhance the system’s privacy and mitigate concerns related to personal data use. The innovative services we introduced, specifically product brainstorming and keyword trend forecasting, stem directly from our findings that token-level analysis of product names can uncover latent consumer preferences and market trends. These services hold the potential to revolutionize business strategies by providing insights into untapped product opportunities and emerging market demands. Future research should explore the scalability of token-level analysis across larger and more diverse datasets, and examine the integration of evolving NLP technologies to maintain the relevance and effectiveness of recommendation systems.

From a theoretical perspective, this study makes two significant contributions. First, it substantially improves the performance of recommendation algorithms. By advancing existing NLP-based recommendation methods and incorporating tokenization of product names, our approach demonstrated superior performance, as evidenced by improved metrics such as Hit-Rate and Mean Reciprocal Rank in comparison to non-tokenized models. Second, notable progress was made in the extensibility of the recommendation model. NLP-based recommendation models can be flexibly applied across diverse languages and domains, thereby enabling services to be offered globally, unrestricted by specific languages or regions. This study had two important practical implications. First, processing product names in natural language eliminates the need for personal information, thereby significantly enhancing their potential for industrial applications. This method can be integral in industries such as e-commerce and digital marketing, where it can enhance user experience by providing personalized recommendations without compromising individual privacy, thereby addressing challenges related to the protection of personal information. Second, this study explored the potential of diverse services. The development of services, such as brainstorming new product ideas and predicting keyword trends, offers businesses a sustainable path for utilization. Thus, companies are better equipped to respond swiftly to market shifts and evolving consumer demand.

Despite offering significant insights, this study has certain limitations that merit consideration. First, it was confined to the UK e-Commerce and Instacart datasets. Future research should consider applying our token-level analysis approach to a wider range of datasets, such as social media trends, customer reviews, and global marketplaces, to validate the applicability and reliability of our findings across various consumer contexts and cultural backgrounds. Second, the NLP technologies employed reflect the current state-of-the-art. As technology evolves, newer techniques may emerge, potentially affecting the performance and outcomes of recommendation algorithms. Furthermore, a notable advantage of this study lies in its approach to using product names directly and analyzing them at the token level. However, this methodology is predominantly suitable for products with intuitive and straightforward names. For more abstract product names, the application could become challenging. For instance, for a product name like “Taste of Magic,” it is not immediately clear which food it represents. Thus, NLP methodologies may struggle to provide accurate recommendations based solely on such abstract names. Future research should focus on expanding the datasets tested, incorporating newer NLP techniques, and devising strategies to effectively handle products with abstract or non-descriptive names.