Article Spinning with Python

Introduction

Article spinner algorithms take an original article and fully understand what each word content means. It doesn’t view sentences as just a list of words, it views them as real things that interact with each other. This human like understanding allows the Article spinner algorithm to automatically rewrite sentences using Machine Learning NLP and produce one or more variations by replacing specific words, phrases, or sentences with substitute variants.

Prerequisites

To implement this solution, the reader must have a sound knowledge of

  • Python
  • Natural Language Processing

What topics we will cover?

  • A Brief History
  • Data Briefing
  • Developing Article Spinner
  • Inspirations
  • Criticisms
  • Different applications in the market
Source: https://www.techiexpert.com/is-article-spinning-just-the-same-as-article-rewriting/

A Brief History

Article spinning is one of the most talked-about and least-understood technology. For a common man, it’s a mythical technology that makes it possible to create unlimited original content and a way to easily defeat plagiarism detection technology.

Article spinning is a technique to generate seemingly original content from old content by replacing words or phrases with synonyms. Majorly used in search engine optimization (SEO), and other applications, which creates what appears to be new content from what already exists. Nowadays Article spinning algorithm is refined with Advanced Machine Learning Natural Language Processing techniques so that they can now result in perfectly human-readable contents which seem original.

Here’s a very basic example of what article spinning is. Take this sentence “Hello my name is Jaiprakash”

The algorithm will Spin it like: {Hi|Hello|Hey} my name is Jaiprakash and it produces the below content:

  1. Hi my name is Jaiprakash
  2. Hey my name is Jaiprakash
  3. Hello my name is Jaiprakash.

The Power of this is that across a hundred to a thousand-word article, an algorithm can create heaps or even millions of permutations, every as a minimum slightly unique from the others.

The Data

Our data comes from a Multi-Domain Sentiment Dataset, which contains product reviews taken from “Amazon.com” in text format positive reviews for electronics product and the data is available in pseudo XML format. It contains 1000 text observations of positive reviews.

Load the data

# Loading required Python packages
import nltk
import random
# Data url : used electronics/positive.review http://www.cs.jhu.edu/~mdredze/datasets/sentiment/index2.html# used for reading data of xml format
from bs4 import BeautifulSoup
#Reading the data
positive_reviews = BeautifulSoup(open(‘positive.review’).read())
positive_reviews = positive_reviews.findAll(‘review_text’)
positive_reviews = BeautifulSoup(open(‘positive.review’).read())
positive_reviews = positive_reviews.findAll(‘review_text’)

Exploration — Getting a feel of our data

> print(positive_reviews[1:5])[<review_text> 
This product was perfect. it’s a sturdy case that holds my large collection of cd’s and dvd’s.My other cases were flimsy and wore
out fast. This one has a hard outside shell and the inside holds my dvd’s nicely.So i have no complaints. and at the price i paid i’d say this was a great deal
</review_text>,
<review_text>
Great value for your money. Kingston has always been a reliable memory chip company and the same goes for this product.
</review_text>,
<review_text>
I like it very much, I use it in an outdoor trail camera for wild game and it hold several pictures and has been in the camera in all kinds of weather. I would buy Kingston again
</review_text>,
<review_text>
I had no problems with this card and the delivery was prompt…Thank You Muc
</review_text>,
<review_text>
After going through the reviews, I bought this CF card for my Canon Digital Rebel. So far it has worked fine, though I don’t pretend to be an expert digital photographer. When the card is empty, it shows 127 shots available. It seems to read reasonably fast, though it takes a bit longer than I was used to with my point-and-shoot Olympus and its SmartMedia card.
</review_text>

Developing Article Spinner

# Dictionary in Python is an unordered collection of data values, 
# used to store data values like a map, which unlike other Data
# Types that hold only a single value as an element, Dictionary
# holds key:value pair
from builtins import range
#Initialize an Empty Dictionary
trigrams = {}

# Extract trigrams from positive_reviews and insert into dictionary
# (w1, w3) is the key, [ w2 ] are the values
for review in positive_reviews:
s = review.text.lower()
tokens = nltk.tokenize.word_tokenize(s)
for i in range(len(tokens) — 2):
k = (tokens[i], tokens[i+2])
if k not in trigrams:
trigrams[k] = []
trigrams[k].append(tokens[i+1])

# turn each array of middle-words into a probability vector
from future.utils import iteritems
# Initialize an Empty Dictionary
trigram_probabilities = {}
for k, words in iteritems(trigrams):
# create a dictionary of word -> count
if len(set(words)) > 1:
# only do this when there are different possibilities for a middle word
d = {}
n = 0
for w in words:
if w not in d:
d[w] = 0
d[w] += 1
n += 1
for w, c in iteritems(d):
d[w] = float(c) / n
trigram_probabilities[k] = d
# choose a random sample from dictionary where values are the
# probabilities
def random_sample(d):
r = random.random() #Initialize an Empty list
cumulative = 0
for w, p in iteritems(d):
cumulative += p
if r < cumulative:
return w

def test_spinner():
review = random.choice(positive_reviews)
s = review.text.lower()
print(“Original Text:”, s)
tokens = nltk.tokenize.word_tokenize(s)
for i in range(len(tokens) — 2):
if random.random() < 0.2: # 20% chance of replacement
k = (tokens[i], tokens[i+2])
if k in trigram_probabilities:
w = random_sample(trigram_probabilities[k])
tokens[i+1] = w
print(“Spin Text:”)
print(“ “.join(tokens).replace(“ .”, “.”).replace(“ ‘“, “‘“).replace(“ ,”, “,”).replace(“$ “, “$”).replace(“ !”, “!”))

Generate Spin Text

if __name__ == ‘__main__’:
test_spinner()
# ----- Output Screenshot ----- #Original Text:
as soon as i bought this from amazon, the price went up from $21 to $ 37. i was going to buy another one, but the price is too high.overall, this is a great product it has everything you need. however, i wouldn't buy it at the current price, i'm waiting for the price to go down before i buy another one
Spin Text:
as soon as i purchase this from amazon during the price went up from $21 to $37. i was going to get another one, but the price is too high.however, this is a great product it has everything you need. however, i will not touch it at the current price, i'm waiting for the price to go down before you new purchase

Inspirations

  • Saves you time in creating content from scratch
  • Saves you money since you may no longer have to hire a writer
  • Creates lots of articles with a click of a button
  • Assists you with building websites content quickly
  • A very useful tool for web advertisers
  • You tracked down an astounding article on the web, and you wish to publish something comparative.

Criticisms

  • It can be viewed as unethical, whether it is a paraphrasing of copyrighted material
  • Possible penalties from web search tools once found out Generic content
  • Sometimes Awkward phrases and sentences are substituted

Few applications available in the Market based on Article Spinning

  • Ant Spinner
  • Caligonia
  • Plagiarisma
  • Spin bot
  • Content professor

Conclusion

Because of the excellent word substituting and rewriting features with a blend of Machine learning Natural Language processing, it is increasingly becoming popular day by day. This post discloses how to build a basic Article spinning algorithm from scratch

Wrapping Up

The quality of this model isn’t awesome, and I don’t imagine that it should be. Despite this, I tried creating a working model for Article spinner from scratch quickly with python.

References

  1. https://lazyprogrammer.me/
  2. https://www.emeraldcityjournal.com/2015/09/article-spinning-a-plagiarism-technique-for-the-21st-century/
  3. https://www.plagiarismtoday.com/2018/03/08/a-brief-history-of-article-spinning/
  4. https://github.com/lazyprogrammer
  5. https://www.linkedin.com/pulse/article-spinning-sujith-kumar/
  6. https://en.wikipedia.org/wiki/Article_spinning

Follow me: Linkedin / Github

Senior Data Scientist @Datamatics Global Services Pvt Ltd