Plagiarism a rising issue
By NICK JOHNSON
Plagiarism is one of those terms that is tossed around a lot without people knowing exactly what it means. It actually has a specific legal definition, though it has some overlap with copyright violations and other information copying techniques.
In the realm of content marketing, you have to consider a few factors when thinking about plagiarism.
First of all, you have to consider how people check for plagiarism. That is, people use online plagiarism checkers, things like Copyscape, Grammarly, and tools that integrate those tools like SmallSEOTools. These tools either maintain their own index of information online, or they use extensive and creative Google searches to look for relatively unique phrases from a submitted piece, to match against existing content.
No tool has an index as large as Google’s, though some may index tertiary sources of content Google might miss. My own personal check for plagiarism involves running a Copyscape scan and a few Google searches for full sentences.
Here’s the issue: these checks do not account for references and quotations. They look for duplications in text, but they don’t look for attributions. Here’s an example. I can quote this block:
“Plagiarism usually occurs when a writer fails to:
- Cite quotes or ideas written by another author;
- Enclose direct text in quotes; or
- Put summaries and/or paraphrases in his or her own words.”
And that would technically come up as an instance of plagiarism in this very blog post. I obviously quoted it, but what’s the source? As of yet, I haven’t told you, which makes this a technical theft of content. Then again, maybe all the way at the bottom of this post, I have a footnote with the source. Copyscape and Grammarly aren’t going to be smart enough identify that.
Of course, I don’t have such a footnote. What I do have is this link right here. That quote comes from an old LegalZoom blog post on the subject of plagiarism. And now, with this paragraph, I’ve made my quote completely legitimate.
If you’re writing a blog post that makes heavy reference of several other sources of information, or even just references one source that you then line-by-line refute, you’re going to get a high percentage of plagiarism back from your checks. You need to use your own judgment and analyze the context of the copied content to see if it counts.
You also have the issue of common sentences. Certain sentences tend to be truisms amongst your industry, repeated as catch phrases or as ironic sources of humor, or even just facts that are often repeated. These, if they’re sufficiently long or unique, can trigger plagiarism scans, even when there’s no clear source of the original sentence, or even when it has basically become an industry meme.
All of this comes together to show you that a lot of blog posts are going to show at least some level of plagiarism when scanned by any automated process. It might be a meager 2-3%, or it might be higher. Ironically, the more time you spend gathering references and quoting sources, the more likely your post is to be flagged.
This isn’t to say you should stop quoting or citing sources; these are additional value that can be very worthwhile to your post. No, what you should do is recognize that sites like Copyscape and Grammarly are not the arbiters of plagiarism.
“Class aptent taciti sociosqu ad litora per conubia nostra, per inceptos himenaeos .Aenean non turpis vitae ligula tristique sagitt isras varius erat pulvinar eros pretium”
When does plagiarism matter? After all, we’re on the internet, where there exist people who have made a living solely off stolen content, copied wholesale and infused with links or ads.
There are essentially two possible penalties for plagiarism, though a third is implicit.
The first penalty is rare, but very damaging. This is the penalty wherein the original creator of the content discovers that you plagiarized their content and sues you. You will be forced to prove that you created the content, or else admit that you stole the content. This will be a violation of copyright and can come with a wide array of penalties, ranging from taking down the copied content to monetary damages.
This is rare because most websites don’t have the inclination or the funds to take every offender to court. Copied content is a very prevalent problem, and most of the time the thieves are difficult to identify or bring to trial. Many of them use false information or simply reside in a country where legal action is difficult to pursue. If someone stole one of your blog posts, would you consider taking them to court? My guess is probably not.
The other penalty is the primary penalty, and it’s the Google search penalty. Around 2011, when Google first rolled out the Panda algorithm update, they began taking copied content very seriously.
Google is a lot more sophisticated than sites like Copyscape and Grammarly. They’re able to use context and differentiate between content that is actually stolen and content that is not. Here are some things that might trip up Copyscape but won’t trip up Google:
- Content posted on two versions of a website, for example a standard and a mobile site.
- Content posted on a blog and duplicated on a printer-only version of the page.
- Content in a store that has dynamic URLs, showing the same content on multiple different pages.
- Content published with attribution on multiple URLs, as with syndication.
Most such issues can be solved with proper use of the rel=”canonical” tag, which you add to each version of a piece of content. This is for full pages, though. What about longer quote blocks?
This, again, is Google’s sophistication. Google is able to read context and can identify when a piece of content is quoted in part, but surrounded by unique content. My quote block up above is easy to tell that it’s just a quote and not a full stolen blog post.
Partial copied content that is used without attribution or passed off as original by the thief is harder to detect, but it can run afoul of Google’s push to maintain a unique selection of search results. If one page was published in January, and another with largely the same content – and nothing new of value – is published in March, Google is more likely to use the earlier one. Duplicating content mostly just means your post won’t rank in comparison to the original.
The third implicit penalty comes from when your site is discovered and penalized for plagiarism, or has its results removed from legal notices via the DMCA. That is, damage to your reputation. You become a known content thief and can have all your guest posts, all your links, all your references disappear overnight. Legitimate marketers don’t want to work with spammers.