Artificial Intelligence in Penetration Testing and URL Injection

When artificial intelligence is mentioned, thoughts of robots becoming human-like and taking over the world are prevalent. (Mutlu, 2012) This is because of the misinformation passed along in the media. Artificial intelligence is used in a lot of everyday items and machines. Artificial intelligence is also key in the video game world. Most people do not think of the more practical applications and uses of artificial intelligence. This makes artificial intelligence a controversial topic. It is the center of many debates on how machines are going to take over the world and threaten humanity. Computers are able to complete complex algorithms in mere seconds when it would take a human hours to complete the calculations by hand. (Luger, 2009, p. 1) This leads to a fear of the unknown and causes research into artificial intelligence to be approached with caution.

Artificial intelligence development and research is important because it helps streamline tasks and solve complex problems quickly. It is now used in household appliances and security systems. These appliances can now connect with smartphones and other devices for alerts and run automatically when there is an issue. Security systems can detect the smallest movement and use algorithms to determine whether an alert needs to be set off or not. This allows for homeowners to set their own parameters in the device so that pets and smaller movements do not set off alarms. All of these systems used in conjunction gives a homeowner peace of mind when they are out of the house.

The most important area of focus in artificial intelligence should be the security of the internet. Everyday there are breakthroughs of firewalls and data is compromised. By using complex systems, these systems can detect potential threats and see that they are thwarted before they can begin. There are already aspects of it at work unknown to most of the public. Antivirus software running on most machines can analyze any file or piece of software and determine if it is a threat or not. Antivirus software needs this intelligence to combat harmful files because identifying the wrong file and deleting it could be disastrous.

There are also other checks in place using artificial intelligence by the system itself. If a piece of malware tries to take over the system, a popup box will sprout in the system alerting the owner that an unknown file is trying to run. This same alert box also forms if a file tries to go around the firewall or delete harmful files. These security checks using a form of artificial intelligence were necessary to help users prevent intrusion and file corruption. These filters and security alerts constantly have to be updated. As humans create more complex and unique ways to bypass security, the security checks also have to be updated.

Another example of this is the spam folder used in email clients. Once an email is sent to an email address, it is scanned intelligently for common keywords and characteristics of harmful or deceitful email messages. This is to help protect unsuspecting users from damaging their system. There are a variety of emails that are delivered everyday that use tricks to get users to download files or click on links. These files install malware or viruses on the system of the user with the intent to cause harm. The links that are included in some emails are used to steal information or gain access to funds of the user. Some of the emails ask for people to wire money to them in exchange for services. A lot of the time, users assume the emails are legitimate if they do not get caught by the spam filter. This is why artificial intelligence is so important to the future of security.

Technical Discussion

The internet continues to grow in the size of available data as devices become cheaper to consumers. Cell phones now connect directly to the internet adding the convenience of searching the internet wherever a user is. It is now essential for businesses to have some semblance of a web presence. Any time someone is curious about a business, there is a good chance they are going to search for that business online. If a business does not have a web presence, that potential customer will be lost. The easiest way to get a profile is by creating a social media business page. These are convenient to business owners who are not technologically savvy, but do not offer the customization that a privately owned website would. The exposure is also limited by the size of user base of that certain social media outlet. This means that multiple profiles across multiple platforms are needed to increase exposure. So sometimes the positives gained by have a profile on these pages is outweighed by the negatives of having to maintain and self promote the business. There can be a flood of information across social media, so users may not see even the most creative campaigns by a business.

The alternative approach that is taken by a lot of businesses is to create their own website. A lot of the time, businesses think of their website as an afterthought. They do not take the right approach when discussing and planning a website. The most common error is not including a programmer or server administrator in planning or developing a website. A website is designed first either on paper or using a graphics program and then sold to a contractor or web development group to create the website.

When small website contracts are sold, the website is forced to fit the design that was created by the company. When these designs are created by the company themselves, they often do not lend themselves well to the way a browser will display the website. A website is then developed that is full of errors and does not display correctly for users. These problems can be compounded if a business contracts out parts of a website by multiple companies who work without contacting each other. Concessions are then made to get all parts working together. This is not good for visitors of a website.

It takes a lot of time and planning to get a website created that is secure and rendering correctly. The company that creates standards for the internet is the World Wide Web Consortium (W3C). They work directly with the programming languages used on the web to help standardize protocols when displaying pages. They even offer a free tool to developers at https://validator.w3.org to validate the code of websites. The problem behind most of the internet is that most websites are not validated correctly. This is in direct contrast to using a complier to create an operating system based program. The compiler will not create a software package if there are errors found throughout the program. That same level of validation is not in use on the internet.

If some of the most prominent websites in the world are run through the validator, there will be a plethora of errors that are reported. This is alarming because the validator is a free tool that gives a breakdown of all of the errors that are found on each website. There is a tool that makes it easy to fix errors on a website. The issue that remains is that many companies do not take the time to make sure that their website is properly validated. These are basic coding errors that are reported. If companies are not taking the time to make sure their website is properly validated, imagine the amount of errors that are occurring underneath the cosmetic layer.

Each website is hosted on a server that is available freely on the internet. These servers can be hosted by large companies and belong to server farms or they can even be self hosted in the basement of a person’s house. From a security standpoint, the operating system is the start of a secure system. The operating system for a server hosting a website needs to be constantly kept up-to-date and file permissions need to be maintained. Using artificial intelligence on the internet, anyone can search through these servers looking for vulnerabilities.

This is where artificial intelligence is important to the future of the internet. Its importance does not lie in creating tougher enemies in video games. It is important for testing for vulnerabilities in websites and servers connected to the internet. If it company chooses to collect information from its user base, it is up to them to secure that information. It is appalling how many websites do not follow protocols to protect user information. There are data breaches everyday and identity theft has exploded through the roof. That is due to users trusting companies to protect their information or not securing their own information.

There are quite a few tools available to search for files and folders on web servers. Depending on the operating system that is in use, wget and curl are tools that are usually built into an operating system. These allow for webpages and files to be downloaded from a server without using a browsers. These tools are great for what they are used for, but they do not give the amount of control as a user created program could. Python is a great programming language to use for web penetration testing because it is installed on almost all web servers that are connected to the internet. It is also a high level programming language that is easy to use and has an enormous amount of information and documentation available to help learn it. The great thing about penetration testing on the web, is that any programming language can be used.

When starting to penetration test on a website, it is best to understand the tool that is in use. Make sure that each tool or programming language is understood so that things can be adjusted as needed. It is also important to understand the basics of how a web server is designed. Using artificial intelligence to look through a web server speeds up the process, but it is still very important to understand the layout of a server to cut down on the search time from the program in use. There are also tools available to determine the system information that the website that is under testing is running on. It is important to gain as much knowledge on a web server as possible before running tests.

This is important because of the file structure of the server. If the artificial intelligence algorithm is searching using the file structure of a unix system when the website is actually run on a Windows server, there is not going to be a lot of successful information gained. The root of a Windows server is C: while a unix system uses a root of /. This is important for searches as a successful scripts can determine the file structure and search correctly for the files that are in question. There was also a difference in the folder structure before as Windows used \ while unix was built upon using / to denote folder structure. Windows has been adapted to respond to both folder qualifiers so this should not be a problem going forward.

To focus on penetration testing and artificial intelligence, depth-first search is a great tool. This is because of the way a website is hosted on a server. The web server is hosted in a main folder using more files and folders under the root to separate information. Every time a / is entered into a browser in a web address, it is telling the browser to look into another folder for the file that is being requested. A depth-first search will begin at the top of the root and dig to the last file at each folder level. It will dig deeper and deeper into the server looking for files. (Luger, 2009, p. 99) This is quicker than a breadth-first search when searching a web server because most web servers have more depth than they have breadth. Using this search intelligently, one can quickly move about a server looking for unrestricted files.

Breadth-first search can be used on a web crawl much like it could be utilized for a search on a distributed system. (Buluç, 2011, p. 2) If the date of a file is known but the domain is unknown, a breadth-first search would actually be preferred. This is especially true if there are a multitude of subdomains that are hosted on a server. A breadth-first search can go through all of the subdomains before moving down into the depths of the content folder on the server.

Artificial intelligence is important to penetration testing because it will save time using smart tactics when searching for vulnerabilities. The first aspect of the penetration testing would be to search for files on a server. The easiest way to search for files is to crawl an entire domain and look for files that are of importance. Now using a web crawler with an artificial intelligence depth-first search, all files that are found that match parameters can be downloaded to a host computer. To test this scenario, take a look at North Carolina State University. Their website is built using WordPress. The server information doesn’t really matter in this example, but a look at the WordPress structure does.

WordPress uses a file structure of three main folders in the root of a server. The root of a website is usually stored in a public_html folder or a www folder. Since we are intelligently searching for files, we are not trying to download an entire website. That would be the basis behind wget or curl, which were mentioned earlier. We are searching for files that may be of importance or stored as a reference on the server. The three main folders for WordPress are wp-includes, wp-content, and wp-admin. The wp-includes and wp-admin folders are used to store the core WordPress files. These folders are not important to our search. The wp-content folder is the most important. This is the folder where the theme and all uploaded files are stored. Using North Carolina State University as an example, the root of uploads is going to be https://www.ncsu.edu/wp-content/uploads. This folder should be set up to deny access. That is the correct way to set up permissions for the folder. When trying to load the folder from that link, a page not found message will be delivered.

This is the correct way a website should serve an admin only folder. That folder is actually in that location on the server. By default, WordPress blocks access to the folder but not the content. Using a crawler in Python, a depth-first search can crawl through a wp-content folder looking for files. This is similar for other content management systems such as Drupal. Drupal just uses a different file structure so the file path would need to be changed. Documents in Drupal are in a sites folder on the server. With large companies, there may be millions of files in the wp-content folder. Limits need to be placed on the search. Images are not relevant so only files that use the .pdf extension will be used. These files are prevalent on websites as they cannot be edited. This is how companies uploaded important files to the server to share with employees.

Running the search on the website for North Carolina State University a folder of files is downloaded. Once downloaded, it is easier to make a copy of all files into a new folder. When they are downloaded, their file structure is retained from the server. WordPress places all files in folders grouped by year and then month. The problem with this when scanning is that some of these folders are empty. A smart depth-first search can ignore these folders. It is important to keep to file structure the same because now these files can be accessed using their real link on the server.

Searching through their domain and subdomains, some files of importance are found. There is a quiz with an answer key at https://lee.ces.ncsu.edu/wp-content/uploads/2012/08/SS6E_Quiz_8.pdf. That file does not belong to the University. There is also a similar diagnostic test at https://lee.ces.ncsu.edu/wp-content/uploads/2012/08/Diagnostic-Test-With-Answer-Key.pdf. Also found were full copies of text books and scholarly publications. These are usually strictly licensed, which is probably a breach of contract by hosting them on a public web server. With such a large amount of files available, it is very important to protect those that could be damaging to a company.

The file download was a trivial demonstration of the power of depth-first search and artificial intelligence to penetration test a website. There are more powerful examples that drive home the importance of taking every precaution when creating a website. Depth-first search can also be used to try to access content or log into a website or user account. The next example will look at a professional newspaper website that uses a plugin that takes payments from readers to access content. This type of setup is popping up in more and more online publications.

The idea behind this type of website is that the newspaper can host their own content. Their content is saved to their server using their own content management system. This gives their developers ease of access and customizability that they would not have if they outsourced their hosting needs. This is a great setup for free publications because the content can be served freely to anyone who is browsing the internet. They can even take advantage of shared hosting by any of the number of hosting giants that serve up web domains.

Larger newspapers like to offer subscription based services much like the print subscriptions they have made their incomes from in the past. The expertise of the newspaper staff is in design and layout of their written content. They may have a designer or two on staff, but in most cases they have a joint web design and management position that also hosts their websites. This means that they do not have infrastructure in place to also manage payment solutions. This makes newspapers an easy target for penetration testing to gain access of their content.

Newspapers that host their own content but do not take payments on their own site use a plugin to connect to an outside payment solution. This outside payment solution takes a key or access code to validate the content. If this key is passed through the url, it can be inserted by a script. This script would allow a bypass of the outside plugin so that the content that is hosted on the server is accessed without ever going through the plugin. Once the url key has been found, any article hosted on the web server can be accessed using that same url key.

The Altoona Mirror is a newspaper located in central Pennsylvania. It serves news to the entire Blair county area. It also serves the surrounding counties. There is a semipro baseball team in the area, along with some division two college athletic teams. It is a fairly popular newspaper for people in the central Pennsylvania area. They host their website at http://www.altoonamirror.com that serves as an online copy of their newspaper. Their homepage lists all of their stories and it opens to the public. When you click on one of their articles, it only opens the excerpt and a popup blocks the rest of the content.

This popup has two tabs, one for new users and one to login to an existing account. It is a pay for access portal that blocks users from accessing content. The plugin is loaded over an HTTP connection even though there is a message that says, “This form is secure.” The form itself is just a billing box that asks for personal information and a credit card. There is also a select menu that allows you to select the access level. If the box is closed out, the user is redirected to the homepage of the newspaper. At the bottom of the popup, there is a message that says, “Brought to you by: MediaPass™.”

This MediaPass popup is being served over an HTTP connection and collecting sensitive information. This information can be sniffed over a wireless network because of the way it is being transmitted. That does not matter in this instance because a closer look at the url string gives some important details. Each story has an extra string in the url. This string is ?nav=742 or some other number. This string is only added on pages where the MediaPass popup is loaded. It is appended to the page by the MediaPass popup. Deleting the string and trying to reload the page only reloads the popup. This string is the key to gaining access to the content of the page.

Going through a bunch of stories, the string will change but it will always be sort of similar. Since there is no successful login case when starting, there is no goal of what the correct string will be. This means that the first thing for the script to try is 0. Python will load the web page using urllib2.request to open the website. (Python) The correct protocol is to check the response to make sure that the page opens.

Once the page has been loaded, the next step is to check the contents of the html file. This can be done by dumping the outputs of the website to a file. This can be done by using the url parse function in Python. This will parse a website and dump it into a file. Of course, this could get heavy on data if the program has to keep looking through these dumps to find the one that does not have the actual MediaPass popup. This is where we can use an artificial intelligence search algorithm.

The goal is to use brute force to search throughout the page until the string validates and eliminates the MediaPass popup from the page. To find the element that is in question, the Firefox tool inspect element is perfect. It will show the code that is loaded on each page. Once a page with MediaPass has been loaded, open up the inspect element tool. There is a select box that can be selected, use that. Highlight the MediaPass popup and click on it. This will select the code used and load it at the bottom of the page. There are two columns that load. The left side is the actual HTML code for the page and the right is the CSS that is used to style the elements.

CSS is used for all styled elements so this is where the MediaPass popup is going to be displayed. Since the inspect element select tool selected the box, there should only be elements displayed on the right that are important to the goal. There are a few elements that are not important, but the one that is important is called .mp-inner-page-box. This box loads the MediaPass popup inside of each page.

To test and make sure that this is the correct element, the code needs to be tweaked inside of the inspect element tool to modify the page. This will modify the page in browser and not the actual website itself. This means that if the page is refreshed, all changes will be lost. To test and make sure the correct box is selected, code can be added inside the inspect element. The actual line for .mp-inner-page-box has two {} brackets. Inside of these brackets, add the following code: display:none;. This will hide the box on the page. If done correctly, the MediaPass box will disappear from the page.

This means that we have the correct area and content selected on the page. This is the code that our Python script has to look for in the page. When reading the content of a certain url, it needs to look for the .mp-inner-page-box element. If that box is displayed, the MediaPass popup is still blocking the content from being displayed on that page. Once the MediaPass popup has been hidden by CSS, the actual content of the page is still not displayed. This means that our code has to be a bit smarter to get past the MediaPass object blocking the content.

While researching this topic, another newspaper that used a different subscription service responded differently to the CSS change. The Tribune Democrat is a newspaper in central Pennsylvania that servers the Cambria county area. The Tribute Democrat uses a NewsMemory application that was developed by Community Newspaper Holdings, Inc. This information was gathered reading the Tribune Democrat website. Their website is located at http://www.tribdem.com and it is also served directly over HTTP.

Running the same artificial intelligence test to try to bypass their insecure content provider found another bug. When using the inspect element to select their popup blocker, the content actually displays when the popup blocker is hidden. The popup element is .ta_popup and this element creates a gray background that is semi-transparent over the content. Once the element is given the display:none; attribute the entire content blocker disappears. This allows access to the content without having to perform any url injection. This means that a simple filter plugin can be created that would hide that content blocking element anytime that page is loaded.

Since that does not work on the Altoona Mirror website, a url injection is needed. A url injection occurs when a script or code is inserted in a url that changes how a page is displayed. Any server running php as a scripting language for their website is potentially vulnerable to a url injection attack. These url strings can query a SQL database and select content that was otherwise unintended for viewing. Our script is going to start with ?nav=0 and work up from there. Each time our search finds the .mp-inner-page-box it will call back to the url request. It will implement the number for the ?nav= function and send another request for the new page.

None of the of the pages viewed had used anything other than numbers. To make the script a little more intelligent, letters and symbols will not be used in the query. Each of the request numbers were all three digits so instead of starting right at ?nav=0 the script will start at ?nav=000. These are just our beginning parameters. It will speed up the url request search by only requesting a limited number of url addresses.

The basis of the program is to call a url for an article on the Altoona Mirror website. The url for the article will be changed to end with ?nav=000 instead of the starting number. Python will call the url and search for the MediaPass element. If the MediaPass element is found, the url will be incremented and the search will occur again. If the MediaPass element is not found, the search will be stopped and the url will be displayed to test. This is a simple Python program that is very powerful.

I ran a few tests in Python in terminal and after a few minutes the search finally stopped. The number in the url that it stopped at was 760. If the string at the end of an article url is changed to ?nav=760, the full content of the article will display. The MediaPass popup will not be called. This has worked for any article on the entire website that was tested. Once the query string at the end of the url was changed, the MediaPass box does not display. This is a huge security hole just like the CSS display bug on the Tribune Democrat website.

Combined with looking for files on a server, this can be a powerful tool to look for exploits. Chaining a bunch of attacks together into one program can be a simple way to test for exploits. The only negative to taking this approach is that it can take quite a while for both files to run.

Conclusions and Areas of Possible Future Work

Using artificial intelligence to look for vulnerabilities is a great way to extend the skills of a system administrator or a web programmer. Looking for vulnerabilities while browsing the internet, can create different avenues of thought and ways to exploit ones own website. A programmer should always be looking for ways to improve their work. Looking at the work of someone else is a great way to do that. In a lot of cases, it is easy to miss simple mistakes when one is heavily invested in a project. It is always great to have a second set of eyes.

Artificial intelligence can give a second set of eyes when looking for vulnerabilities that are unbiased. A key when using or creating a tool like this is to think outside of the box. Someone who is trying to penetrate a system is not going to go straight ahead with an exploit. Try different alleys and paths to files or try to use common use tactics especially when using a content management system. The most important key to using a content management system is to be careful when adding external plugins or using external themes on top of that system.

Plugins that are not official have not been tested by the creators of the content management system in most cases. This means that they are not tested against the actual backbone code of a system. There may be incompatibilities inside the core code that the plugin developers did not anticipate or know about. Always test the plugins for any scenario.

Handling money or payments on a website opens up a whole other set of problems. If there is any semblance of payment on a website, it should be required to use an HTTPS connection. An SSL certificate to certify an HTTPS connection is not expensive. It is an extreme oversight in the world today to not give that protection to users. With the amount of identity theft that happens and the ease of gaining access to this information, this level of protection is sorely needed.

Not securing payment or user information should freely open up websites to litigation. It is not up to the user to understand the intricacies of the internet when using a website to trying to read the local newspaper. It should be the responsibility of the company to protect its customers. If it cannot protect these customers, it should give access to its content for free.

There are a few different artificial intelligence searches that might be more powerful and faster, but the depth-first search works well enough for the purposes of penetration testing. Files and folders are stored in a more linear way on the server. They are not stored in complex trees or other solutions. A depth-first search can crawl through a server much like it goes through a stack of information. As it looks at a file, it either queues it for download or ignores it if the extension doesn’t match. Once the file is actually downloaded, it can be popped off of the stack. (Russel, 2010, p. 85)

There are other search algorithms that can be tied into the search used in this example that can be extended to make the solution more powerful. Looking at the Tribune Democrat example, there was more than one element that was created by their content access plugin. These would all need to be filtered out just in case they would display on a different type of content page. This could be done by using another search to append a display:none; tag to all of the elements that using the same prefix as the one identified earlier. Since it was a unique prefix of .ta there should be no other necessary element that is affected by adding the filter.

This would need to be tested more but could ultimately be added into a browser extension or plugin that looks for that same element on any webpage that loads. If another website would use the same plugin that the Tribune Democrat uses, it would be filtered out without the user even knowing as long as that extension is active.

This could be extended a bit further using the information we have already learned with the MediaPass plugin. If the same string works every time for all pages, a similar extension could be built that looks for that same string at the end of a url and changes the number that occurs at the end to 760. This would automatically reload the page and the user would not know why the url redirected. All they would see on their end is the correct page loading and displaying the full content. This is a way to exploit a website to show their content without them even knowing.

That is why it is important to take all of the necessary precautions when developing a website. Security should not be an afterthought. It is important to get a professional developer on staff to test the website and make sure that all functions are working. There are ways to hide the url string in the header. Since the exploit has already been identified, it might not matter as the code can already be placed in the header.

There might be some legal recourse of action against the company who develops subscription based plugins when there are loopholes around them. If these loopholes are not fixed, there can be ways to get around them that damages the company. The company is making their profits using the subscription system. Anyone who visits the site and views the content without an active description costs that company money.

Building on this for future work, the next step would be to automate the functions into a plugin or browser extension. This would allow anyone access to the content of these sites. They will download a plugin and install it and active it in their browser. It would be ideal to find more exploits for other news websites that use their own content blocker. This way the extension could be built using a more intelligent design.

This extension would be built much like the artificial intelligent move machines that are used when solving a board in a game of chess. When the browser loads a website, the extension would prepare itself by looking through a predetermined set of moves. This could be set off by looking for an identifier somewhere on the website. The CSS is a good start because that is usually standard for anyone using the same plugin. Unless they go through the code and change the element coding, which most do not, it will still have the same element extension used on all of the websites.

This is similar to using the folder structure of a content management system. All default WordPress sites are going to have the same three top level folders of wp-content, wp-includes, and wp-admin. These only change if they are modified in the core of the code upon installation. This most likely does not happen as it may change the infrastructure of the content management system itself.

Once these identifiers have been added to the extension, the extension would run the exploit or move based on the identifier it found. Since these elements are created to be uniquely named when the plugin is made available, there really shouldn’t be any compatibility issues that arise. If they do, the extension can just utilize a simple click to disable option in the menu bar. The problem with using a depth-first search algorithm for an extension like this would be speed.

When looking for early exploits or unsecured data, speed did not matter. The goal was to have the script crawl through all elements making sure one was not missed. Some of the other algorithms take short cuts through certain data to improve speed. This would be a time where another algorithm could be used to replace the depth-first search of the project. As it stands now, there are only two exploits identified so the extension would not be resource heavy or take a lot of time to execute. It would simply look for the two elements and when identified, disable them.

In conclusion, using artificial intelligence algorithms for penetration testing can be highly beneficial. It will save time from having someone manually check to make sure there are no security holes. These are starting to popup in home security systems with phone applications that can check and make sure no appliances or lights are left on in a home. Before these technologies arrived, someone would manually have to walk around a house to make sure that everything was turned off. Using an AI system gives peace of mind to homeowners.

This peace of mind is also available to website owners and server administrators. Another example of this would be a recursive function to check the permissions of all files on a server. This would prevent the web programmer from having to go back through the server and make sure that all permissions have been set correctly. Assuming a public_html folder as the root the following code would work:

find ./public_html -type d -exec chmod 755 {} \;
find ./public_html -type f -exec chmod 644 {} \;

This would crawl through both folders and files and set them to different permissions. The main key is to set the files as read only. This prevents them from being executed if there is harmful data included and also prevents them from being edited. If a file has been incorrectly set to be edited by anyone, someone can inject that file with malicious code. Once someone gets into the server through a terminal or command line, they can list all of the files in the directory and all permissions. It only takes one file having an incorrect set of permissions to be executed and set off to destroy a server.

This would be tough to maintain as an individual, so it is important to include automated tasks to ensure that all files and folders are set correctly. It would be ideal to have a cron job created that runs everyday to check and set permissions. This could be done for other exploits as well but there are cases identified every day that would take more work. In the example described above, there is no simple solution. The way both plugins were written, they have been easy to exploit or work around completely.

Once both newspapers have been contacted to let them know that their services can be circumvented, it would take some work to get those loopholes closed. The most ideal way would be for the newspaper to just redirect completely to an HTTPS server with the payment information. If a user has logged in and paid, a timed session cookie can be placed on the machine. After an hour or so it would redirect to to the payment portal again to check and make sure an account is still active. As long as the user stays logged in, they would not be aware that this is taking place. It would just take longer for the page to load.

Whatever solution is put in place still needs to be tested thoroughly to ensure that some other exploit is available to give access to content. That should be the job of the plugin creator. If someone is paying for a service, it should be in the contract to ensure that all security needs are met. Until security starts to get a more serious look on the internet, identity theft and fraud will continue to be prevalent.

Python code examples and modules were pulled from the official Python documentation. (Python)

import urllib.request
import urllib.parse
data = data.encode('ascii')
with urllib.request.urlopen("http://", data) as f:
 print(f.read().decode('utf-8'))

count = 000
url = "http://www.altoonamirror.com/page/content.detail/id/625523/House-to-consider-budget--tax-plans.html?nav="
while count < 1000:
    print(url + str(count))
    count += 1

string = "mp-inner-page-box"
if str.find(str, beg=0 end=len(string)) :
    print(url + str(count))
    count += 1
else :
    print(url)

# Depth-First Search

    def search(web_server, search_file = -1):
        files_on_server, server = [], []
        move_file = { web_server.download_file(): None }
        while not web_server.no_files_left():
            for next in web_server:
                continue_search = next.download_file()
                if continue_search in move_file:
                    continue
                move_file[continue_search] = web_server
                files_on_server.append(next)
            web_server = files_on_server.pop(search_file)
        while web_server:
            server.insert(0, web_server)
            web_server = move_file[web_server.download_file()]
        return server
import urllib
f = urllib.urlopen(file_on_server)
print f.read()

References

  • Buluç, A., & Madduri, K. (n.d.). Parallel breadth-first search on distributed memory systems. Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on — SC ’11.
  • Luger, G. (2009). Artificial intelligence: Structures and strategies for complex problem solving (6th ed.). Boston: Pearson Addison-Wesley.
  • Mutlu, B., Kanda, T., Forlizzi, J., Hodgins, J., & Ishiguro, H. (2012). Conversational gaze mechanisms for humanlike robots. ACM Transactions on Interactive Intelligent Systems TiiS ACM Trans. Interact. Intell. Syst., 1–33.
  • Python 3.5.1 documentation. (n.d.). Retrieved December 2, 2015, from https://docs.python.org/3/
  • Russell, S., & Norvig, P. (2010). Artificial intelligence: A modern approach (3rd ed.). Upper Saddle River, NJ: Prentice Hall.
  • W3C. (n.d.). Retrieved December 2, 2015, from http://www.w3.org/

Attachments

The actual Python script testing was completed on a Linux computer running Ubuntu 15.04 and Python3.4.

First example of a screenshot showing an article with the paywall active
First example of a screenshot showing the paywall removed from the article
Second example of a screenshot showing an article with the paywall active
Second example of a second screenshot showing the paywall removed from the article

Leave a comment

Your email address will not be published. Required fields are marked *