My Role Model — Andrew M. Teno Jr.

There are many extraordinary people who influence my life greatly. This includes: my parents, grandparents, aunts, uncles, cousins, and friends. But, one person whom I admire and love, and whose love and loyalty to his family and friends is so great, is my grandfather, Andrew M. Teno Jr., my role model.

My grandfather was born in Johnstown in 1935, a survivor of the 1936 Johnstown Flood, as my great-grandmother carried him to safety. He attended school in Richland and at the age of eighteen went on to drive tractor-trailer. He was an owner operator and through the Teamsters Union hauled steel from the Johnstown plants to Chicago, Michigan, New York, and other states for forty years.

With the decline of the steel mills, he decided to buy a garage in Portage. He bought the garage on April 1st, 1981, and was open for business July 8th, 1981. Not only did he receive his mechanics inspection license in July, he also got his license to sell vehicles in August. For many years he worked at his garage as a mechanic building his customer base. His son started working for him 1990 and is now part owner of the business.

In 1988, an opportunity arose for him to haul fuel for Refiners Transport, who later became Bulk Material Inc., out of Duncansville. He worked at BMI for seven years, and after the company went out of business he became semi-retired at the age of sixty. These extra years that he worked added on to his pension in the International Brotherhood of Teamsters Union.

My grandfather will turn eighty years old this year, and he still goes to the garage every morning around 7:30 a.m. He faithfully has a coffee break with some of his friends who stop in every day. With the ever-changing auto industry, my grandfather had to learn about the new computerized automobiles and the new auto emissions testing program. He still helps around the business now and works on vehicles, mostly trucks and fire trucks. He has many of the same customers as he did when he first opened for business. My grandmother calls him the janitor because he is always cleaning up and doing all of the repairs and maintenance on the building.

My love and admiration for him comes from how he treats his family, friends, and customers. He always finds time to attend the sporting events of his grandchildren and has sat through many in the rain and snow. He never forgets to give us a hug and tell us what a great job we did, or how to improve for the next game. I’ll always remember how he raises his glass for a toast on Christmas Eve and says how thankful he is for his family, as his eyes swell up with tears.

Projects

Heavy Metal Trivia

One of the first programming projects I ever completed was a Heavy Metal Trivia application for Android phones. This application displayed a list of multiple choice questions broken down by category. The application kept score based on how many questions were correct in a series of ten questions. After the series was completed, the user was given the choice to continue playing with a chance to increase their score. The application charted as a Top 50 Music application in the United States during May of 2012 and returned to the charts in Japan as a Top 500 Music application during January of 2014. Unfortunately, this application is no longer available due to changes in the Android operating system.

    Professional Careers in Education Panel

    ![](https://dev.leaheymedia.com/files/richie-leahey/professional-careers-in-education-01.jpg)

    ![](https://dev.leaheymedia.com/files/richie-leahey/professional-careers-in-education-02.jpg)

    ![](https://dev.leaheymedia.com/files/richie-leahey/professional-careers-in-education-03.jpg)

    ![](https://dev.leaheymedia.com/files/richie-leahey/professional-careers-in-education-04.jpg)

    ![](https://dev.leaheymedia.com/files/richie-leahey/professional-careers-in-education-05.jpg)




              <h3>Three Springs Fire Company Computer Lab Build</h3> - 2014-10-20

              <div class="row">
                  <div class="col-md-3">
                      <img class="img-fluid" src="/images/cv/service-three-springs-fire-company-computer-lab-build-01.jpg">
                  </div>
                  <div class="col-md-3">
                      <img class="img-fluid" src="/images/cv/service-three-springs-fire-company-computer-lab-build-02.jpg">
                  </div>
                  <div class="col-md-3">
                      <img class="img-fluid" src="/images/cv/service-three-springs-fire-company-computer-lab-build-03.jpg">
                  </div>
                  <div class="col-md-3">
                      <img class="img-fluid" src="/images/cv/service-three-springs-fire-company-computer-lab-build-04.jpg">
                  </div>
                  <div class="col-md-3">
                      <img class="img-fluid" src="/images/cv/service-three-springs-fire-company-computer-lab-build-05.jpg">
                  </div>
                  <div class="col-md-3">
                      <img class="img-fluid" src="/images/cv/service-three-springs-fire-company-computer-lab-build-06.jpg">
                  </div>
                  <div class="col-md-3">
                      <img class="img-fluid" src="/images/cv/service-three-springs-fire-company-computer-lab-build-07.jpg">
                  </div>
              </div>

              <h3>Day One Computer Lab Build</h3> - 2015-01-24

              <div class="row">
                  <div class="col-md-3">
                      <img class="img-fluid" src="/images/cv/service-day-one-computer-lab-build-01.jpg">
                  </div>
                  <div class="col-md-3">
                      <img class="img-fluid" src="/images/cv/service-day-one-computer-lab-build-02.jpg">
                  </div>
                  <div class="col-md-3">
                      <img class="img-fluid" src="/images/cv/service-day-one-computer-lab-build-03.jpg">
                  </div>
              </div>

              <h3>West Loop Missionary Church Computer Lab Build</h3> - 2015-02-07

              <div class="row">
                  <div class="col-md-3">
                      <img class="img-fluid" src="/images/cv/service-west-loop-missionary-church-computer-lab-build-01.jpg">
                  </div>
                  <div class="col-md-3">
                      <img class="img-fluid" src="/images/cv/service-west-loop-missionary-church-computer-lab-build-02.jpg">
                  </div>
              </div>

              <h3>Huntingdon Salvation Army Computer Lab Build</h3> - 2015-05-02

              <div class="row">
                  <div class="col-md-3">
                      <img class="img-fluid" src="/images/cv/service-huntingdon-salvation-army-computer-lab-build-01.jpg">
                  </div>
                  <div class="col-md-3">
                      <img class="img-fluid" src="/images/cv/service-huntingdon-salvation-army-computer-lab-build-02.jpg">
                  </div>
                  <div class="col-md-3">
                      <img class="img-fluid" src="/images/cv/service-huntingdon-salvation-army-computer-lab-build-03.jpg">
                  </div>
                  <div class="col-md-3">
                      <img class="img-fluid" src="/images/cv/service-huntingdon-salvation-army-computer-lab-build-04.jpg">
                  </div>
                  <div class="col-md-3">
                      <img class="img-fluid" src="/images/cv/service-huntingdon-salvation-army-computer-lab-build-05.jpg">
                  </div>
                  <div class="col-md-3">
                      <img class="img-fluid" src="/images/cv/service-huntingdon-salvation-army-computer-lab-build-06.jpg">
                  </div>
                  <div class="col-md-3">
                      <img class="img-fluid" src="/images/cv/service-huntingdon-salvation-army-computer-lab-build-07.jpg">
                  </div>
                  <div class="col-md-3">
                      <img class="img-fluid" src="/images/cv/service-huntingdon-salvation-army-computer-lab-build-08.jpg">
                  </div>
              </div>

URL Injection to Circumvent Web Security

I have worked as a web developer full-time for the past three years. Before that, I worked as a freelance web developer gaining experience and becoming proficient as a programmer. Over time, I was able to see common themes among websites I was hired to improve. Most websites are not designed with security in mind. In fact, most websites are created with only one goal in mind — to have a presence on the web as cheaply as possible. This creates major complications with the security of these websites.

Most of the time, especially with smaller businesses, I have found that business websites were created by an employee already on the payroll. This employee only had a little bit of programming or web design experience in most cases. This was done because the company didn’t want to hire another employee to handle the website or consult with a dedicated web design company. When websites are created by these employees, they are thrown together and do not work as intended. Features have been added in upon request without security in mind.

When looking at poorly produced websites, they do not pass even the most basic validation tests. Sometimes, these error-filled websites are produced by larger companies that are prominent in certain areas. Thousands of people look at these websites everyday without realizing that their information is at risk. It really bothers me when websites that were not designed with security in mind start to take payments for services. I was appalled when working freelance to find a bunch of high profile clients that were not updating payment services on their websites because it would break features they had hacked into their core.

To me this is unacceptable. When clients put faith into a company and blindly pay over the internet, it should be required by the company to protect their information. There are too many cases of identity theft in the world today to not hold companies liable for identity and data theft.

An example of a company not updating their paid plugins, is the Altoona Mirror newspaper website. The Altoona Mirror is the official newspaper of Blair County in Pennsylvania. This hits close to home for me because I have a lot of family around the area that read the Altoona Mirror. The Altoona Mirror website is located at http://altoonamirror.com and does not transmit data over a HTTPS connection. Their website uses a plugin called MediaPass to collect payment from readers to access their full articles. Each article that includes the plugin transmits data over a HTTP connection even though there is a line on the form that says, “This form is secure.”

This is just a poor implementation of a plugin on a website. The goal of using MediaPass is to get website visitors to pay to continue reading the articles found on the website. There are a lot of newspapers on the internet that are using similar, if not the same exact plugin. The problem is, there is no security in place to prevent someone from circumventing the plugin.

This starts with the website using a HTTP connection. When someone visits the website, their data is sent directly to the server. This means their information can be read by someone through the wireless network. If someone uses a tool like Wireshark, they can sniff the traffic that is being sent to the server and see what the user is sending. Since this is an open connection, that information is not secure. (Pedersen, p. 268) Someone can perform a man in the middle attack and collect all of the information that is being sent.

Since this plugin is included in the page that is not secure, it does not matter if it displays a message saying it is secure. It is not. This could trick users into believing their information is safe when filling out the form. The form itself has a field for all credit card information. This means that all information will be submitted through this form. Since this connects over an insecure connection, there is a way to circumvent this pay to read service.

Since there is no validation of data, one can brute force their way into the website. The way to do this is to trick the server into believing that the payment service has been validated on the other end. Once again, since this is not using a secure connection there is really no way of stopping this from validating correctly.

Making matters worse, the website takes information to load the page through the url. There is a string attached to the end of the url that reads ?nav=742. The number is not the same for all articles. It appears to be random. This is the key to validating the MediaPass plugin. Since this is part of the url, there are no measures in place to stop it from being changed. To exploit the insecure transmission, the correct key would have to be passed through the url. The ?nav= part of the string will have to stay. This is telling the browser to navigate to the correct location.

From here, we are looking at the number in the url. Since this number changes from page to page, there is no sure way to know what the key would be. From a social engineering standpoint, one could watch someone who does have access log in to view an article. They could then observe the url string change to see what was permitting access. Unfortunately, we do not know anyone who has access. So, this will take a brute force algorithm to gain access to the page. Our algorithm will be a depth-first search through the page to find the missing content. This is ideal for a website due to the layout of them and how they are stored using tags. (Luger, 2009, p. 100)

By observing the pages, one can see the number on each page is always three digits. This will be the beginning of the attack. It would take time to go through plugging in random numbers to see if any of them give access to the page. This would take a lot of time. Instead, writing a script to do that would be better. This will save time going through every three digit number combination.

I used Python to create a script to request a url from the internet. This url request would pull the website and look through the content. This is a simple one line package that is included in the Python library. (Python) Using an if statement in the Python language, we can look to see if the MediaPass element is being loaded. If the MediaPass element is actively blocking content on the page, it will be loaded and viewable to the Python program.

This element uses a class with the prefix mp-. This prefix only occurs on the page if the MediaPass plugin is blocking content to the page. Analyzing the content in a browser, it can be tested to see if modifying the CSS will show the content. Blocking this does not show the content. The content is still being loaded through the url query because MediaPass does not hide the content inside of the code. Using this as a starting point, if the mp- prefix is found when Python loads the website, it will increment the number of the string that was added to the url.

So the code will start with a string value of 000. This was chosen because of the three digits in place on other pages. The url will be loaded in Python and it will check for the mp- prefix. If the mp- prefix is found, it will delete the url content and increment the string value by one. So in turn, next it would try 001 and so on. Once it loads a page that does not have the mp- prefix in the content, it will stop searching. It will return the url string value for manual testing.

Using this idea, I was able to crawl through the page and reached a value where there was no mp- prefix found. This was using the number 760. When 760 is the value passed through the url, the MediaPass plugin does not load. It validates the user and the complete article is loaded. This took a few minutes for the program to test and try, but it was successful. I tested the 760 value on a few other articles and it worked for all of the ones I tested. I did this over a period of a few days and it still works.

My original thought was that it would not load every article using the same string, so this worked out better than expected. Since the other pages had different initial values, the assumption was that they would have their own unique key. With this not being an issue, a browser extension could be created to automatically change the value granting anyone access. This is a penetration test that was done to circumvent security protocols. This only works because the server does not use HTTPS when transmitting data. I found a similar site that used the same MediaPass plugin but I was not able to penetrate it because it was using an HTTPS connection. An HTTPS connection is easy to setup and something that a website owner cannot ignore.

References

  • Luger, G. (2009). Artificial intelligence: Structures and strategies for complex problem solving (6th ed.). Boston: Pearson Addison-Wesley.
  • Pedersen, T. (n.d.). HTTPS, Secure HTTPS. Encyclopedia of Cryptography and Security, 268–269.
  • Python 3.5.1 documentation. (n.d.). Retrieved December 2, 2015, from https://docs.python.org/3/

Attachments

The actual Python script testing was completed on a Linux computer running Ubuntu 15.04 and Python3.4.

First example of a screenshot showing an article with the paywall active
First example of a screenshot showing the paywall removed from the article
Second example of a screenshot showing an article with the paywall active
Second example of a second screenshot showing the paywall removed from the article

Artificial Intelligence in Penetration Testing and URL Injection

When artificial intelligence is mentioned, thoughts of robots becoming human-like and taking over the world are prevalent. (Mutlu, 2012) This is because of the misinformation passed along in the media. Artificial intelligence is used in a lot of everyday items and machines. Artificial intelligence is also key in the video game world. Most people do not think of the more practical applications and uses of artificial intelligence. This makes artificial intelligence a controversial topic. It is the center of many debates on how machines are going to take over the world and threaten humanity. Computers are able to complete complex algorithms in mere seconds when it would take a human hours to complete the calculations by hand. (Luger, 2009, p. 1) This leads to a fear of the unknown and causes research into artificial intelligence to be approached with caution.

Artificial intelligence development and research is important because it helps streamline tasks and solve complex problems quickly. It is now used in household appliances and security systems. These appliances can now connect with smartphones and other devices for alerts and run automatically when there is an issue. Security systems can detect the smallest movement and use algorithms to determine whether an alert needs to be set off or not. This allows for homeowners to set their own parameters in the device so that pets and smaller movements do not set off alarms. All of these systems used in conjunction gives a homeowner peace of mind when they are out of the house.

The most important area of focus in artificial intelligence should be the security of the internet. Everyday there are breakthroughs of firewalls and data is compromised. By using complex systems, these systems can detect potential threats and see that they are thwarted before they can begin. There are already aspects of it at work unknown to most of the public. Antivirus software running on most machines can analyze any file or piece of software and determine if it is a threat or not. Antivirus software needs this intelligence to combat harmful files because identifying the wrong file and deleting it could be disastrous.

There are also other checks in place using artificial intelligence by the system itself. If a piece of malware tries to take over the system, a popup box will sprout in the system alerting the owner that an unknown file is trying to run. This same alert box also forms if a file tries to go around the firewall or delete harmful files. These security checks using a form of artificial intelligence were necessary to help users prevent intrusion and file corruption. These filters and security alerts constantly have to be updated. As humans create more complex and unique ways to bypass security, the security checks also have to be updated.

Another example of this is the spam folder used in email clients. Once an email is sent to an email address, it is scanned intelligently for common keywords and characteristics of harmful or deceitful email messages. This is to help protect unsuspecting users from damaging their system. There are a variety of emails that are delivered everyday that use tricks to get users to download files or click on links. These files install malware or viruses on the system of the user with the intent to cause harm. The links that are included in some emails are used to steal information or gain access to funds of the user. Some of the emails ask for people to wire money to them in exchange for services. A lot of the time, users assume the emails are legitimate if they do not get caught by the spam filter. This is why artificial intelligence is so important to the future of security.

Technical Discussion

The internet continues to grow in the size of available data as devices become cheaper to consumers. Cell phones now connect directly to the internet adding the convenience of searching the internet wherever a user is. It is now essential for businesses to have some semblance of a web presence. Any time someone is curious about a business, there is a good chance they are going to search for that business online. If a business does not have a web presence, that potential customer will be lost. The easiest way to get a profile is by creating a social media business page. These are convenient to business owners who are not technologically savvy, but do not offer the customization that a privately owned website would. The exposure is also limited by the size of user base of that certain social media outlet. This means that multiple profiles across multiple platforms are needed to increase exposure. So sometimes the positives gained by have a profile on these pages is outweighed by the negatives of having to maintain and self promote the business. There can be a flood of information across social media, so users may not see even the most creative campaigns by a business.

The alternative approach that is taken by a lot of businesses is to create their own website. A lot of the time, businesses think of their website as an afterthought. They do not take the right approach when discussing and planning a website. The most common error is not including a programmer or server administrator in planning or developing a website. A website is designed first either on paper or using a graphics program and then sold to a contractor or web development group to create the website.

When small website contracts are sold, the website is forced to fit the design that was created by the company. When these designs are created by the company themselves, they often do not lend themselves well to the way a browser will display the website. A website is then developed that is full of errors and does not display correctly for users. These problems can be compounded if a business contracts out parts of a website by multiple companies who work without contacting each other. Concessions are then made to get all parts working together. This is not good for visitors of a website.

It takes a lot of time and planning to get a website created that is secure and rendering correctly. The company that creates standards for the internet is the World Wide Web Consortium (W3C). They work directly with the programming languages used on the web to help standardize protocols when displaying pages. They even offer a free tool to developers at https://validator.w3.org to validate the code of websites. The problem behind most of the internet is that most websites are not validated correctly. This is in direct contrast to using a complier to create an operating system based program. The compiler will not create a software package if there are errors found throughout the program. That same level of validation is not in use on the internet.

If some of the most prominent websites in the world are run through the validator, there will be a plethora of errors that are reported. This is alarming because the validator is a free tool that gives a breakdown of all of the errors that are found on each website. There is a tool that makes it easy to fix errors on a website. The issue that remains is that many companies do not take the time to make sure that their website is properly validated. These are basic coding errors that are reported. If companies are not taking the time to make sure their website is properly validated, imagine the amount of errors that are occurring underneath the cosmetic layer.

Each website is hosted on a server that is available freely on the internet. These servers can be hosted by large companies and belong to server farms or they can even be self hosted in the basement of a person’s house. From a security standpoint, the operating system is the start of a secure system. The operating system for a server hosting a website needs to be constantly kept up-to-date and file permissions need to be maintained. Using artificial intelligence on the internet, anyone can search through these servers looking for vulnerabilities.

This is where artificial intelligence is important to the future of the internet. Its importance does not lie in creating tougher enemies in video games. It is important for testing for vulnerabilities in websites and servers connected to the internet. If it company chooses to collect information from its user base, it is up to them to secure that information. It is appalling how many websites do not follow protocols to protect user information. There are data breaches everyday and identity theft has exploded through the roof. That is due to users trusting companies to protect their information or not securing their own information.

There are quite a few tools available to search for files and folders on web servers. Depending on the operating system that is in use, wget and curl are tools that are usually built into an operating system. These allow for webpages and files to be downloaded from a server without using a browsers. These tools are great for what they are used for, but they do not give the amount of control as a user created program could. Python is a great programming language to use for web penetration testing because it is installed on almost all web servers that are connected to the internet. It is also a high level programming language that is easy to use and has an enormous amount of information and documentation available to help learn it. The great thing about penetration testing on the web, is that any programming language can be used.

When starting to penetration test on a website, it is best to understand the tool that is in use. Make sure that each tool or programming language is understood so that things can be adjusted as needed. It is also important to understand the basics of how a web server is designed. Using artificial intelligence to look through a web server speeds up the process, but it is still very important to understand the layout of a server to cut down on the search time from the program in use. There are also tools available to determine the system information that the website that is under testing is running on. It is important to gain as much knowledge on a web server as possible before running tests.

This is important because of the file structure of the server. If the artificial intelligence algorithm is searching using the file structure of a unix system when the website is actually run on a Windows server, there is not going to be a lot of successful information gained. The root of a Windows server is C: while a unix system uses a root of /. This is important for searches as a successful scripts can determine the file structure and search correctly for the files that are in question. There was also a difference in the folder structure before as Windows used \ while unix was built upon using / to denote folder structure. Windows has been adapted to respond to both folder qualifiers so this should not be a problem going forward.

To focus on penetration testing and artificial intelligence, depth-first search is a great tool. This is because of the way a website is hosted on a server. The web server is hosted in a main folder using more files and folders under the root to separate information. Every time a / is entered into a browser in a web address, it is telling the browser to look into another folder for the file that is being requested. A depth-first search will begin at the top of the root and dig to the last file at each folder level. It will dig deeper and deeper into the server looking for files. (Luger, 2009, p. 99) This is quicker than a breadth-first search when searching a web server because most web servers have more depth than they have breadth. Using this search intelligently, one can quickly move about a server looking for unrestricted files.

Breadth-first search can be used on a web crawl much like it could be utilized for a search on a distributed system. (Buluç, 2011, p. 2) If the date of a file is known but the domain is unknown, a breadth-first search would actually be preferred. This is especially true if there are a multitude of subdomains that are hosted on a server. A breadth-first search can go through all of the subdomains before moving down into the depths of the content folder on the server.

Artificial intelligence is important to penetration testing because it will save time using smart tactics when searching for vulnerabilities. The first aspect of the penetration testing would be to search for files on a server. The easiest way to search for files is to crawl an entire domain and look for files that are of importance. Now using a web crawler with an artificial intelligence depth-first search, all files that are found that match parameters can be downloaded to a host computer. To test this scenario, take a look at North Carolina State University. Their website is built using WordPress. The server information doesn’t really matter in this example, but a look at the WordPress structure does.

WordPress uses a file structure of three main folders in the root of a server. The root of a website is usually stored in a public_html folder or a www folder. Since we are intelligently searching for files, we are not trying to download an entire website. That would be the basis behind wget or curl, which were mentioned earlier. We are searching for files that may be of importance or stored as a reference on the server. The three main folders for WordPress are wp-includes, wp-content, and wp-admin. The wp-includes and wp-admin folders are used to store the core WordPress files. These folders are not important to our search. The wp-content folder is the most important. This is the folder where the theme and all uploaded files are stored. Using North Carolina State University as an example, the root of uploads is going to be https://www.ncsu.edu/wp-content/uploads. This folder should be set up to deny access. That is the correct way to set up permissions for the folder. When trying to load the folder from that link, a page not found message will be delivered.

This is the correct way a website should serve an admin only folder. That folder is actually in that location on the server. By default, WordPress blocks access to the folder but not the content. Using a crawler in Python, a depth-first search can crawl through a wp-content folder looking for files. This is similar for other content management systems such as Drupal. Drupal just uses a different file structure so the file path would need to be changed. Documents in Drupal are in a sites folder on the server. With large companies, there may be millions of files in the wp-content folder. Limits need to be placed on the search. Images are not relevant so only files that use the .pdf extension will be used. These files are prevalent on websites as they cannot be edited. This is how companies uploaded important files to the server to share with employees.

Running the search on the website for North Carolina State University a folder of files is downloaded. Once downloaded, it is easier to make a copy of all files into a new folder. When they are downloaded, their file structure is retained from the server. WordPress places all files in folders grouped by year and then month. The problem with this when scanning is that some of these folders are empty. A smart depth-first search can ignore these folders. It is important to keep to file structure the same because now these files can be accessed using their real link on the server.

Searching through their domain and subdomains, some files of importance are found. There is a quiz with an answer key at https://lee.ces.ncsu.edu/wp-content/uploads/2012/08/SS6E_Quiz_8.pdf. That file does not belong to the University. There is also a similar diagnostic test at https://lee.ces.ncsu.edu/wp-content/uploads/2012/08/Diagnostic-Test-With-Answer-Key.pdf. Also found were full copies of text books and scholarly publications. These are usually strictly licensed, which is probably a breach of contract by hosting them on a public web server. With such a large amount of files available, it is very important to protect those that could be damaging to a company.

The file download was a trivial demonstration of the power of depth-first search and artificial intelligence to penetration test a website. There are more powerful examples that drive home the importance of taking every precaution when creating a website. Depth-first search can also be used to try to access content or log into a website or user account. The next example will look at a professional newspaper website that uses a plugin that takes payments from readers to access content. This type of setup is popping up in more and more online publications.

The idea behind this type of website is that the newspaper can host their own content. Their content is saved to their server using their own content management system. This gives their developers ease of access and customizability that they would not have if they outsourced their hosting needs. This is a great setup for free publications because the content can be served freely to anyone who is browsing the internet. They can even take advantage of shared hosting by any of the number of hosting giants that serve up web domains.

Larger newspapers like to offer subscription based services much like the print subscriptions they have made their incomes from in the past. The expertise of the newspaper staff is in design and layout of their written content. They may have a designer or two on staff, but in most cases they have a joint web design and management position that also hosts their websites. This means that they do not have infrastructure in place to also manage payment solutions. This makes newspapers an easy target for penetration testing to gain access of their content.

Newspapers that host their own content but do not take payments on their own site use a plugin to connect to an outside payment solution. This outside payment solution takes a key or access code to validate the content. If this key is passed through the url, it can be inserted by a script. This script would allow a bypass of the outside plugin so that the content that is hosted on the server is accessed without ever going through the plugin. Once the url key has been found, any article hosted on the web server can be accessed using that same url key.

The Altoona Mirror is a newspaper located in central Pennsylvania. It serves news to the entire Blair county area. It also serves the surrounding counties. There is a semipro baseball team in the area, along with some division two college athletic teams. It is a fairly popular newspaper for people in the central Pennsylvania area. They host their website at http://www.altoonamirror.com that serves as an online copy of their newspaper. Their homepage lists all of their stories and it opens to the public. When you click on one of their articles, it only opens the excerpt and a popup blocks the rest of the content.

This popup has two tabs, one for new users and one to login to an existing account. It is a pay for access portal that blocks users from accessing content. The plugin is loaded over an HTTP connection even though there is a message that says, “This form is secure.” The form itself is just a billing box that asks for personal information and a credit card. There is also a select menu that allows you to select the access level. If the box is closed out, the user is redirected to the homepage of the newspaper. At the bottom of the popup, there is a message that says, “Brought to you by: MediaPass™.”

This MediaPass popup is being served over an HTTP connection and collecting sensitive information. This information can be sniffed over a wireless network because of the way it is being transmitted. That does not matter in this instance because a closer look at the url string gives some important details. Each story has an extra string in the url. This string is ?nav=742 or some other number. This string is only added on pages where the MediaPass popup is loaded. It is appended to the page by the MediaPass popup. Deleting the string and trying to reload the page only reloads the popup. This string is the key to gaining access to the content of the page.

Going through a bunch of stories, the string will change but it will always be sort of similar. Since there is no successful login case when starting, there is no goal of what the correct string will be. This means that the first thing for the script to try is 0. Python will load the web page using urllib2.request to open the website. (Python) The correct protocol is to check the response to make sure that the page opens.

Once the page has been loaded, the next step is to check the contents of the html file. This can be done by dumping the outputs of the website to a file. This can be done by using the url parse function in Python. This will parse a website and dump it into a file. Of course, this could get heavy on data if the program has to keep looking through these dumps to find the one that does not have the actual MediaPass popup. This is where we can use an artificial intelligence search algorithm.

The goal is to use brute force to search throughout the page until the string validates and eliminates the MediaPass popup from the page. To find the element that is in question, the Firefox tool inspect element is perfect. It will show the code that is loaded on each page. Once a page with MediaPass has been loaded, open up the inspect element tool. There is a select box that can be selected, use that. Highlight the MediaPass popup and click on it. This will select the code used and load it at the bottom of the page. There are two columns that load. The left side is the actual HTML code for the page and the right is the CSS that is used to style the elements.

CSS is used for all styled elements so this is where the MediaPass popup is going to be displayed. Since the inspect element select tool selected the box, there should only be elements displayed on the right that are important to the goal. There are a few elements that are not important, but the one that is important is called .mp-inner-page-box. This box loads the MediaPass popup inside of each page.

To test and make sure that this is the correct element, the code needs to be tweaked inside of the inspect element tool to modify the page. This will modify the page in browser and not the actual website itself. This means that if the page is refreshed, all changes will be lost. To test and make sure the correct box is selected, code can be added inside the inspect element. The actual line for .mp-inner-page-box has two {} brackets. Inside of these brackets, add the following code: display:none;. This will hide the box on the page. If done correctly, the MediaPass box will disappear from the page.

This means that we have the correct area and content selected on the page. This is the code that our Python script has to look for in the page. When reading the content of a certain url, it needs to look for the .mp-inner-page-box element. If that box is displayed, the MediaPass popup is still blocking the content from being displayed on that page. Once the MediaPass popup has been hidden by CSS, the actual content of the page is still not displayed. This means that our code has to be a bit smarter to get past the MediaPass object blocking the content.

While researching this topic, another newspaper that used a different subscription service responded differently to the CSS change. The Tribune Democrat is a newspaper in central Pennsylvania that servers the Cambria county area. The Tribute Democrat uses a NewsMemory application that was developed by Community Newspaper Holdings, Inc. This information was gathered reading the Tribune Democrat website. Their website is located at http://www.tribdem.com and it is also served directly over HTTP.

Running the same artificial intelligence test to try to bypass their insecure content provider found another bug. When using the inspect element to select their popup blocker, the content actually displays when the popup blocker is hidden. The popup element is .ta_popup and this element creates a gray background that is semi-transparent over the content. Once the element is given the display:none; attribute the entire content blocker disappears. This allows access to the content without having to perform any url injection. This means that a simple filter plugin can be created that would hide that content blocking element anytime that page is loaded.

Since that does not work on the Altoona Mirror website, a url injection is needed. A url injection occurs when a script or code is inserted in a url that changes how a page is displayed. Any server running php as a scripting language for their website is potentially vulnerable to a url injection attack. These url strings can query a SQL database and select content that was otherwise unintended for viewing. Our script is going to start with ?nav=0 and work up from there. Each time our search finds the .mp-inner-page-box it will call back to the url request. It will implement the number for the ?nav= function and send another request for the new page.

None of the of the pages viewed had used anything other than numbers. To make the script a little more intelligent, letters and symbols will not be used in the query. Each of the request numbers were all three digits so instead of starting right at ?nav=0 the script will start at ?nav=000. These are just our beginning parameters. It will speed up the url request search by only requesting a limited number of url addresses.

The basis of the program is to call a url for an article on the Altoona Mirror website. The url for the article will be changed to end with ?nav=000 instead of the starting number. Python will call the url and search for the MediaPass element. If the MediaPass element is found, the url will be incremented and the search will occur again. If the MediaPass element is not found, the search will be stopped and the url will be displayed to test. This is a simple Python program that is very powerful.

I ran a few tests in Python in terminal and after a few minutes the search finally stopped. The number in the url that it stopped at was 760. If the string at the end of an article url is changed to ?nav=760, the full content of the article will display. The MediaPass popup will not be called. This has worked for any article on the entire website that was tested. Once the query string at the end of the url was changed, the MediaPass box does not display. This is a huge security hole just like the CSS display bug on the Tribune Democrat website.

Combined with looking for files on a server, this can be a powerful tool to look for exploits. Chaining a bunch of attacks together into one program can be a simple way to test for exploits. The only negative to taking this approach is that it can take quite a while for both files to run.

Conclusions and Areas of Possible Future Work

Using artificial intelligence to look for vulnerabilities is a great way to extend the skills of a system administrator or a web programmer. Looking for vulnerabilities while browsing the internet, can create different avenues of thought and ways to exploit ones own website. A programmer should always be looking for ways to improve their work. Looking at the work of someone else is a great way to do that. In a lot of cases, it is easy to miss simple mistakes when one is heavily invested in a project. It is always great to have a second set of eyes.

Artificial intelligence can give a second set of eyes when looking for vulnerabilities that are unbiased. A key when using or creating a tool like this is to think outside of the box. Someone who is trying to penetrate a system is not going to go straight ahead with an exploit. Try different alleys and paths to files or try to use common use tactics especially when using a content management system. The most important key to using a content management system is to be careful when adding external plugins or using external themes on top of that system.

Plugins that are not official have not been tested by the creators of the content management system in most cases. This means that they are not tested against the actual backbone code of a system. There may be incompatibilities inside the core code that the plugin developers did not anticipate or know about. Always test the plugins for any scenario.

Handling money or payments on a website opens up a whole other set of problems. If there is any semblance of payment on a website, it should be required to use an HTTPS connection. An SSL certificate to certify an HTTPS connection is not expensive. It is an extreme oversight in the world today to not give that protection to users. With the amount of identity theft that happens and the ease of gaining access to this information, this level of protection is sorely needed.

Not securing payment or user information should freely open up websites to litigation. It is not up to the user to understand the intricacies of the internet when using a website to trying to read the local newspaper. It should be the responsibility of the company to protect its customers. If it cannot protect these customers, it should give access to its content for free.

There are a few different artificial intelligence searches that might be more powerful and faster, but the depth-first search works well enough for the purposes of penetration testing. Files and folders are stored in a more linear way on the server. They are not stored in complex trees or other solutions. A depth-first search can crawl through a server much like it goes through a stack of information. As it looks at a file, it either queues it for download or ignores it if the extension doesn’t match. Once the file is actually downloaded, it can be popped off of the stack. (Russel, 2010, p. 85)

There are other search algorithms that can be tied into the search used in this example that can be extended to make the solution more powerful. Looking at the Tribune Democrat example, there was more than one element that was created by their content access plugin. These would all need to be filtered out just in case they would display on a different type of content page. This could be done by using another search to append a display:none; tag to all of the elements that using the same prefix as the one identified earlier. Since it was a unique prefix of .ta there should be no other necessary element that is affected by adding the filter.

This would need to be tested more but could ultimately be added into a browser extension or plugin that looks for that same element on any webpage that loads. If another website would use the same plugin that the Tribune Democrat uses, it would be filtered out without the user even knowing as long as that extension is active.

This could be extended a bit further using the information we have already learned with the MediaPass plugin. If the same string works every time for all pages, a similar extension could be built that looks for that same string at the end of a url and changes the number that occurs at the end to 760. This would automatically reload the page and the user would not know why the url redirected. All they would see on their end is the correct page loading and displaying the full content. This is a way to exploit a website to show their content without them even knowing.

That is why it is important to take all of the necessary precautions when developing a website. Security should not be an afterthought. It is important to get a professional developer on staff to test the website and make sure that all functions are working. There are ways to hide the url string in the header. Since the exploit has already been identified, it might not matter as the code can already be placed in the header.

There might be some legal recourse of action against the company who develops subscription based plugins when there are loopholes around them. If these loopholes are not fixed, there can be ways to get around them that damages the company. The company is making their profits using the subscription system. Anyone who visits the site and views the content without an active description costs that company money.

Building on this for future work, the next step would be to automate the functions into a plugin or browser extension. This would allow anyone access to the content of these sites. They will download a plugin and install it and active it in their browser. It would be ideal to find more exploits for other news websites that use their own content blocker. This way the extension could be built using a more intelligent design.

This extension would be built much like the artificial intelligent move machines that are used when solving a board in a game of chess. When the browser loads a website, the extension would prepare itself by looking through a predetermined set of moves. This could be set off by looking for an identifier somewhere on the website. The CSS is a good start because that is usually standard for anyone using the same plugin. Unless they go through the code and change the element coding, which most do not, it will still have the same element extension used on all of the websites.

This is similar to using the folder structure of a content management system. All default WordPress sites are going to have the same three top level folders of wp-content, wp-includes, and wp-admin. These only change if they are modified in the core of the code upon installation. This most likely does not happen as it may change the infrastructure of the content management system itself.

Once these identifiers have been added to the extension, the extension would run the exploit or move based on the identifier it found. Since these elements are created to be uniquely named when the plugin is made available, there really shouldn’t be any compatibility issues that arise. If they do, the extension can just utilize a simple click to disable option in the menu bar. The problem with using a depth-first search algorithm for an extension like this would be speed.

When looking for early exploits or unsecured data, speed did not matter. The goal was to have the script crawl through all elements making sure one was not missed. Some of the other algorithms take short cuts through certain data to improve speed. This would be a time where another algorithm could be used to replace the depth-first search of the project. As it stands now, there are only two exploits identified so the extension would not be resource heavy or take a lot of time to execute. It would simply look for the two elements and when identified, disable them.

In conclusion, using artificial intelligence algorithms for penetration testing can be highly beneficial. It will save time from having someone manually check to make sure there are no security holes. These are starting to popup in home security systems with phone applications that can check and make sure no appliances or lights are left on in a home. Before these technologies arrived, someone would manually have to walk around a house to make sure that everything was turned off. Using an AI system gives peace of mind to homeowners.

This peace of mind is also available to website owners and server administrators. Another example of this would be a recursive function to check the permissions of all files on a server. This would prevent the web programmer from having to go back through the server and make sure that all permissions have been set correctly. Assuming a public_html folder as the root the following code would work:

find ./public_html -type d -exec chmod 755 {} \;
find ./public_html -type f -exec chmod 644 {} \;

This would crawl through both folders and files and set them to different permissions. The main key is to set the files as read only. This prevents them from being executed if there is harmful data included and also prevents them from being edited. If a file has been incorrectly set to be edited by anyone, someone can inject that file with malicious code. Once someone gets into the server through a terminal or command line, they can list all of the files in the directory and all permissions. It only takes one file having an incorrect set of permissions to be executed and set off to destroy a server.

This would be tough to maintain as an individual, so it is important to include automated tasks to ensure that all files and folders are set correctly. It would be ideal to have a cron job created that runs everyday to check and set permissions. This could be done for other exploits as well but there are cases identified every day that would take more work. In the example described above, there is no simple solution. The way both plugins were written, they have been easy to exploit or work around completely.

Once both newspapers have been contacted to let them know that their services can be circumvented, it would take some work to get those loopholes closed. The most ideal way would be for the newspaper to just redirect completely to an HTTPS server with the payment information. If a user has logged in and paid, a timed session cookie can be placed on the machine. After an hour or so it would redirect to to the payment portal again to check and make sure an account is still active. As long as the user stays logged in, they would not be aware that this is taking place. It would just take longer for the page to load.

Whatever solution is put in place still needs to be tested thoroughly to ensure that some other exploit is available to give access to content. That should be the job of the plugin creator. If someone is paying for a service, it should be in the contract to ensure that all security needs are met. Until security starts to get a more serious look on the internet, identity theft and fraud will continue to be prevalent.

Python code examples and modules were pulled from the official Python documentation. (Python)

import urllib.request
import urllib.parse
data = data.encode('ascii')
with urllib.request.urlopen("http://", data) as f:
 print(f.read().decode('utf-8'))

count = 000
url = "http://www.altoonamirror.com/page/content.detail/id/625523/House-to-consider-budget--tax-plans.html?nav="
while count < 1000:
    print(url + str(count))
    count += 1

string = "mp-inner-page-box"
if str.find(str, beg=0 end=len(string)) :
    print(url + str(count))
    count += 1
else :
    print(url)

# Depth-First Search

    def search(web_server, search_file = -1):
        files_on_server, server = [], []
        move_file = { web_server.download_file(): None }
        while not web_server.no_files_left():
            for next in web_server:
                continue_search = next.download_file()
                if continue_search in move_file:
                    continue
                move_file[continue_search] = web_server
                files_on_server.append(next)
            web_server = files_on_server.pop(search_file)
        while web_server:
            server.insert(0, web_server)
            web_server = move_file[web_server.download_file()]
        return server
import urllib
f = urllib.urlopen(file_on_server)
print f.read()

References

  • Buluç, A., & Madduri, K. (n.d.). Parallel breadth-first search on distributed memory systems. Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on — SC ’11.
  • Luger, G. (2009). Artificial intelligence: Structures and strategies for complex problem solving (6th ed.). Boston: Pearson Addison-Wesley.
  • Mutlu, B., Kanda, T., Forlizzi, J., Hodgins, J., & Ishiguro, H. (2012). Conversational gaze mechanisms for humanlike robots. ACM Transactions on Interactive Intelligent Systems TiiS ACM Trans. Interact. Intell. Syst., 1–33.
  • Python 3.5.1 documentation. (n.d.). Retrieved December 2, 2015, from https://docs.python.org/3/
  • Russell, S., & Norvig, P. (2010). Artificial intelligence: A modern approach (3rd ed.). Upper Saddle River, NJ: Prentice Hall.
  • W3C. (n.d.). Retrieved December 2, 2015, from http://www.w3.org/

Attachments

The actual Python script testing was completed on a Linux computer running Ubuntu 15.04 and Python3.4.

First example of a screenshot showing an article with the paywall active
First example of a screenshot showing the paywall removed from the article
Second example of a screenshot showing an article with the paywall active
Second example of a second screenshot showing the paywall removed from the article

Using the Naive Bayes Classifier for Statistical Analysis

The Naive Bayes Classifier takes a look at a set of information and analyzes it, giving an estimation on how to classify the information. The paper takes a look at using this classification technique to determine if characteristics are met in the content of an email to classify it as spam. The classifier itself is based on Bayes Theorem which is used in statistics. For the statistics side of things, the theorem looks at a user and a set of indicators to determine if the user has a chance at something. I have seen it used mostly in medicine where a person can put descriptors such as family history, height, weight, and age together to find out what the chance is of them having a certain disease.

This is the same type of prediction that will occur with spam filtering. There are certain parts of an email that will be the same every time. These will be things like the subject, bc, cc, or addresses that are a part of the message of the email. There will be a body which contains the actual message of the email itself. There can also be an attachment that will come through with the email that can be checked. Some mail clients themselves scan the attachments for viruses before they are sent through. Gmail is one of the email clients that scans attachments before they are received. If it appears to be harmful, the user will be notified and then have the option to receive it. Microsoft email like Outlook automatically blocks them as part of their system and they have to be allowed to go through if there is an attachment. These include images that appear in the body of the email.

The problem with an approach like this is that all elements of the email need to be put together to be analyzed. This needs a sort of sanitization process that would clean up the inputs of the emails before they can be analyzed. This sanitization process would clean up all of the parts of the email so that they can be analyzed correctly. Sometimes the subject will be a dead giveaway that an email is spam. This sanitization can apply weights to certain parts of an email that usually contains spam. This includes the attachments. The first thing the analyzer needs is to do is go through each of the elements.

The elements will each have a list of the words that are commonly used in spam emails. Everyone has received some type of spam email that asks for money. This is usually a dead giveaway that the email is spam. Another thing that has cropped up over the years is the email asking people to wire money. Wiring money is a quick way for attackers to receive funds untraced. There is no information tying them to this wired account. So using this well known example, one could say that any email asking for wired funds is spam. Wired itself cannot be added to the list, it would have to be more specific as a bunch of emails about wiring in a house would be marked as spam. This would have to be analyzed as “wire money” or “wire funds”.

A phrase such as the one above can now be added to the sanitization list. This sanitization list can be built up using phrases along with words. If an email contains a high amount of words from either of the lists, it can be built up as spam. There is no straight probability in the Naive Bayes Classifier as it is working towards a score of the chance that the email is spam. This would take some training by the system to analyze email based on their actual scores. Getting feedback from users can also help adjust the scores so that good emails are not filtered out. The user can give feedback on the emails themselves if they are found in the spam folder. The system can then analyze the Naive Bayes Classifier score so that if it was borderline the score can be adjusted. If it is something that is way outside of the range of the typical spam email score, it can be thrown out as an outlier.

This would be something related to wiring money. If someone received a personal email from a friend that is asking them to wire money, that might not be spam but in most cases this would need to still be listed as spam. One person who receives a legitimate wiring email does not outweigh the hundreds who receive fake emails.

This classifier becomes much more powerful if it is put on a server in the middle to analyze all emails that are going through to each account. The paper calls this an indiscriminate attack because there is no user focus to the email. This style is one that needs to be analyzed through traffic. A common email like this is one where there is an alert to a user that their password has expired. It may come from a phishing site pretending to come from another legitimate one. This score can be added against the status quo of other large newsletter emails. The sender account would have to be analyzed too so that large emails are not filtered out.

An example of this would be emails sent from an .edu, .gov, .mil, etc. extension email. These emails are from verified schools or government companies that may send large email blasts at one time. If the Naive Bayes Classifier only looks at the large volume, the classifier will be placing a large number of good emails into the spam filter. This can also include the case of learning so that the classifier can take well known company emails into account when deciding if they should be filtered out or not. This could also be done by listing certain accounts as verified accounts. This is something that Twitter does to show that an account is run by a real person.

By having a real verified account, these emails would not be placed on the spam list. They would need to have turned in more than just the email name. This is because another company could spoof this address and get listed as a real account. This could then be used to send spam email. This would render the classifier useless. The way to get around this is by collecting more information from the verified account including the server address and name that is sending the emails along with packet information. These could then be scanned through a depth scanner to make sure that they are real. This would be a much better implementation as it would be a lot harder to spoof by spammers.

The last attack is the hardest to protect against. That is because this type of attack pretends to be a legitimate company in their email. This was mentioned briefly above and is one that would take extra analyzing. Like the list against the email, this one would need to take well known accounts and check certain things against a legitimate email. This is where the verified email sets up a system that can be adapted again. The way this would theoretically work is that the email list would be compared against additional information before a score is granted.

When calculating the score, the list of words add to the score and run the vector of the list against the baseline for both words. This is where the training set is important. The baseline score will go ahead and set the standard for the spam filter to go forward. There will be a point score awarded to each word so that when it occurs it will raise the score. Some words are safe words and would not raise the score of the email. There will be a threshold of the final score which would either flag the email as spam or not. The paper uses the Boolean system to count points of the email. This means that it will use a yes or no system to calculate points.

When using a system like this, all words will be listed as equal. Since we are using scores, it will not matter if the exact phrase of words match when calculating the points. For example, I previously mentioned the key of “wire money”. I concluded that this was a term that would immediately flag the email as spam. This is because the phrase is very specific when looking at money scams. With our system though, we are looking at both words independently. Since we are using a yes or no system, each time the word comes through the mail it would be assigned a score of one. Since we have two words, the points for these words would be two.

This would make it seem that the score would be higher if an email contained both of these words. That is what the algorithm is used for. This algorithm looks at the points used by the email and compares it against the words used in total. There will be a count for both words in the email and the flagged words. This would give a high or low score for the total email. This means that the weight of the email in terms of spam will determine if it will be flagged as spam or not. There will be a score determined and the lower the score, the lower chance the email is spam. If the score is high, there will be a high chance that the email contains spam.

Now looking at the math aspect of the classifier, we are looking at a pair of data. For the spam filter, we have the set of spam keywords and we also have safe words. The user should be able to save safe words to the list in case a name or something they commonly message about shows up as a keyword. We begin to look at the distribution from the pair into the conditional distribution of the second value. This means that the more frequently the word on the list or words from the list show up, the higher the distribution will be.

From a graph standpoint, think of creating a very dense graph. The dense graph means that the graph is more likely to be contained in the area. If the spam has a very dense distribution, there is going to be a high chance of the graph. From this we are using a pair of:

X | Y = Value ~ Probability(Value).

Looking at the distribution of the value creates a probability of the distributed value indicating one of the features. So from here we can take the classifier and split it up based on the good keywords and the spam keywords which gives us:

Probability(Y = Value | X = Value).

In the program, each value will have a weight and given set of numbers. If the value is high with good keywords and very low for spam, that means that there is a very low chance of spam. If the opposite is true and there is a high density of spam words and a low density of good words, then the email will probably count as spam. This can also be user sensitive as the user can be given the option to add words to the safe and spam lists. If the user always gets email about a certain topic they have no interest in, they can add it to the list. This will increase the accuracy of the training list.

The purpose of more keywords is to help lower the weight of the calculation. This lowers the significance of the values and will create less false flags. The training set will determine the baseline for both values and the added keywords will help create lower miscalculations. There will be some crossover where words may appear on both lists. This could appear with the word wire. Wire could be on the list for its definition of wiring money. The user could also add it to the safe list because they are an electrician. The problem with this is that it will assign a value of true to both lists.

This initially does no harm, but as the lists increase in size, the chance of double words also increases. What happens next is the weight of the classifier increases. When the weight increases, the potential value of both the good and bad keywords also increases. To combat this the training set will have to be calculated again to once again increase the baseline values for the calculation.

The test from the paper was carried out on thousands of emails. These emails were preprocessed beforehand and only focused on the content of the email. This means that the classifier itself could be improved by the header functions mentioned that would help increase the accuracy of the data rendered. There were an equal amount of good and bad emails to test for in the training. The results of the training showed that Naive Bayes Classifier was able to label the emails correctly.

The increase of the attack does not negatively affect the performance of the classifier, especially because the weight can be adjusted after the training is set. The standard Naive Bayes with no training is very close to the spam diagnosis, but does not perform as well on the targeted attacks. With the training set enacted, the success rate is much higher. This is due to the active keyword base being improved by user feedback and input.

The conclusion is that even if the attacker knows the word base being used to determine the spam, the weight adjustments can be made almost immediately on the end of the user to improve the classifier. When the degree of the attack is more specific and less random, the classifier performs quite well. It is when the unknown attack is thrown into the mix that the results may not end up as well as expected.


The Naive Bayes classifier had use as a spam projector because it gives an unbiased score. This is useful in projections for human statistics because the baseline just looks at the facts. This is mostly used for medical diagnosis but can also have more of a novelty use. Since graduation is upcoming, one might want to look at all of the aspects of a certain location when deciding on where to move.

The United States is such a large country that each region has different attributes. This makes it tough for someone who is trying to look for a job to also have to research to find out if they will fit into the area they are moving. Just like the spam filter, one can build a list of attributes that will assign points for each region. Each region will then be assigned a score on whether or not those activities or attributes can be found in that area. This can be broken down even more by state or by cities.

This can be even broken down even further by the regions and the cities of each state. These attributes can then be assigned to the states and these regions. This can lead to a very complex system. When the spam keywords start to get scored, they are only getting looked at from one perspective. This yes or no system that can assign whether or not an email is spam is very straightforward. This straightforward approach can also be used to tackle this more complex problem.

Since the United States is such a complex and large country, it doesn’t matter if multiple areas receive high scores from the user. The idea behind the program is that the user will input a list of items that are important to them. This list of keywords will be run against different regions of the United States. For example, if a person really likes the ocean, they will put ocean on their list. There are only a limited amount of states that actually touch the ocean and offer that as an attraction. This means that those states will get a true value on the boolean scale. This results on a score of one. The states that do not touch the ocean will receive a score of zero based on their assignment of false.

Since state regions are similar in certain parts of the country, similar regions will also receive certain scores. This is expected, because it gives the user more options to look for. It helps them search for jobs because it checks off all of the attributes that the user is looking for. If someone is looking for a warm location near the ocean, states in the south east along with those on the west coast can offer that. This would also eliminate the states that are in the north or in the midwest.

Now we can remove some of the regions from the states that received high scores. This means that a southern state like South Carolina, that is warm and has an ocean, also has a mountain region that would need to be eliminated from the score. The mountains are a few hours away from the ocean. This would deter the user from using the system if they are receiving low scores.

The accuracy comes from the list of the regions of each state. This is something that would have to be developed over time. The attributes need to be added to each of the regions of the state. For this to work, we need to build out sets of data for each state region.

The easiest way to do this in Python is to just import .txt files. Python has some built in libraries that make text and string manipulation easy. What we are trying to do is build a word count base to compare the input list against.

To test the program, I decided to focus on one state. Since I currently live in North Carolina, I built my test from that. North Carolina is split into three distinct regions. The coastal region which connects to the ocean and is flat and offers most coastal options. The second region is the piedmont. The piedmont hosts fertile ground with a lot of the agriculture and large business. The third region is the mountains, which offers lots of outdoor options like skiing and hiking.

The one downside with using this approach from what I could find is that for each iteration of the word, it has to be included in the list. So for example, if a user types in skiing instead of ski, it will not be counted. I build out my lists as the following.

Mountain (NC Mountains):

kayaking rivers mountain cabin spa whitewater rapids seasons vacation mountains fall colors color snow snowy skiing ski resort spring crisp air outdoors nature kayaking hiking trails camping zip lining country snowboard peaks rocks fly fishing trail beauty

Coastal (NC Coast and Beaches):

beautiful nature beaches beach island ocean seashores small towns pristine coast relaxation lighthouses beaches rivers sounds tee golf courses course seafood exquisite beauty wild horses native wildlife picturesque history civil war sand

Piedmont (NC Piedmont):

music greats cities cosmopolitan feel charm arts nightlife dining wine beer exotic animals natural habitats zoo handcrafted pottery upwind spas outdoors forest suburbs corporations business technology highway colleges young

Since this is only a test, there does not need to be a lot of extra detail put into these lists for the demonstration. For actual application, the list will need to be more detailed as this is the basis for a complete breakdown of all regions. Now the Naive Bayes classifier is going to complete the calculations to give the probability points for each region.

The program takes input from the user after a quick introduction. This creates data forms for each region and the user list. These lists are created using the basic Python data structures as recommended from their documentation. Python also has its own Bayes package which I used as a guide to build out my calculations. (NaiveBayes) This puts the values at a decimal level where the higher amount signifies the best chance of a match.

These datasets are calculated against one another using the Bayes formula found in the Python package. That formula was then isolated on its own and brought in with the basic Python word counting found in the documentation. The data is stored in a data folder and then each region will be kept in its own folder. So the basic structure is as follows. data > nc-piedmont, nc-coastal, nc-mountain. Since each United States state has a two letter code, I set that as the prefix. This prefix will let me order regions so that they can appear alphabetically. This makes the program easily scalable and able to add more features as more folders can be added.

Once the folders are added, the math needs to be run for each section again. I didn’t have time to make it more robust, but there can be some improvements on the calculations that let it run over for each folder. This would make it work so that extra lines would not have to be added to the program. All that would need to happen is for folders to be added into the data folder.

In conclusion, the classifier has a lot of room for improvement especially in the code and reusability of it. It is scalable and accommodates adding more features. The classifier does identify the keywords correctly and might also work better with some more filters and string manipulation other than the regular expression word split that is used. The all lower case filter may also want to be removed as some more specific words are bound to show up, but I think for this example it works quite well.

With an increased word list, some of the weights will go down as the specific attack on the coastal region reached as high as 15 when I think trying to keep the weight down to 0–1 is a much better scale for the user. These could also be manipulated into a percentage from their decimal form, giving the user a matched percentage of compatibility for the region search.

Screenshot of the program selecting the coastal region
Screenshot of the program selecting the piedmont region
Screenshot of the program selecting the mountain region

Python Code Built from NaiveBayes and Python Docs:

import re
import math
import sh

def list_setup(attrib):
    attrib = attrib.lower()
    return re.split("\W+", attrib)

def wordcount(text):
    totalcount = {}
    for word in text:
        totalcount[word] = totalcount.get(word, 0.0) + 1.0
    return totalcount

words_from_text = {}

word_counts = {
    "nc-coastal": {},
    "nc-mountain": {},    
    "nc-piedmont": {}
}

list_search = {
    "nc-coastal": 0.,
    "nc-mountain": 0.,
    "nc-piedmont": 0.
}

files = []

for attributes in sh.find("data"):
    attributes = attributes.strip()
    if not attributes.endswith(".txt"):
        continue
    elif "nc-coastal" in attributes:
        region = "nc-coastal"
    elif "nc-mountain" in attributes:
        region = "nc-mountain"        
    elif "nc-piedmont" in attributes:
        region = "nc-piedmont"
    else:
     sys.exit(0)
    files.append((region, attributes))    
    list_search[region] += 1
    attrib = open(attributes).read()
    text = list_setup(attrib)
    counts = wordcount(text)
    for word, count in list(counts.items()):
        if word not in words_from_text:
            words_from_text[word] = 0.0
        if word not in word_counts[region]:
            word_counts[region][word] = 0.0
        words_from_text[word] += count
        word_counts[region][word] += count

print ("Welcome to the United States Region Finder")
print ("Please enter a list of your favorite things separated by a space:")

user_attributes = input("")
text = list_setup(user_attributes)
counts = wordcount(text)

piedmont_count = (list_search["nc-piedmont"] / sum(list_search.values()))
mountain_count = (list_search["nc-mountain"] / sum(list_search.values()))
coastal_count = (list_search["nc-coastal"] / sum(list_search.values()))

coastal_bayes = 0.0
mountain_bayes = 0.0
piedmont_bayes = 0.0

for x, y in list(counts.items()):
    if x not in words_from_text:
        continue

    selected_word = words_from_text[x] / sum(words_from_text.values())
    selected_word_piedmont = word_counts["nc-piedmont"].get(x, 0.0) / sum(word_counts["nc-piedmont"].values())
    selected_word_mountain = word_counts["nc-mountain"].get(x, 0.0) / sum(word_counts["nc-mountain"].values())
    selected_word_coastal = word_counts["nc-coastal"].get(x, 0.0) / sum(word_counts["nc-coastal"].values())

    if selected_word_piedmont > 0:
        piedmont_bayes += math.log(y * selected_word_piedmont / selected_word)
    if selected_word_mountain> 0:
        mountain_bayes += math.log(y * selected_word_mountain / selected_word)
    if selected_word_coastal > 0:
        coastal_bayes += math.log(y * selected_word_coastal / selected_word)

print("North Carolina - Coastal  Region :", math.exp(coastal_bayes + math.log(coastal_count)))
print("North Carolina - Mountain Region :", math.exp(mountain_bayes + math.log(mountain_count)))
print("North Carolina - Piedmont Region :", math.exp(piedmont_bayes + math.log(piedmont_count)))

Resources

  • NC Coast and Beaches — North Carolina Travel & Tourism. (n.d.). Retrieved December 14, 2015, from http://www.visitnc.com/coast
  • NC Mountains — North Carolina Travel & Tourism. (n.d.). Retrieved December 14, 2015, from http://www.visitnc.com/mountains
  • NC Piedmont — North Carolina Travel & Tourism. (n.d.). Retrieved December 14, 2015, from http://www.visitnc.com/piedmont
  • NaiveBayes 1.0.0 : Python Package Index. (n.d.). Retrieved December 14, 2015, from https://pypi.python.org/pypi/NaiveBayes
  • Peng, J., & Chan, P. (n.d.). Revised Naive Bayes classifier for combating the focus attack in spam filtering. 2013 International Conference on Machine Learning and Cybernetics.
  • Python 3.5.1 documentation. (n.d.). Retrieved December 14, 2015, from https://docs.python.org/3/

A-Star Algorithms for Guided Search

A Comparative Study of A-Star Algorithms for Search and Rescue in a Perfect Maze focuses on using robots to find and rescue people in dangerous situations. There are many different situations where a robot may be the best answer to completing search and rescue missions. There could be a catastrophic earthquake or tornado where there is debris everywhere. There could also be flooding or damage due to hurricanes that force rescue missions on boat, like the recent flooding in South Carolina. The best way to put the ideas in this paper to use would be in a mining accident. Using an A* algorithm a robot can be programmed to search a disaster area to find victims who may be trapped within.

In a mining accident, there a many paths underground that could be blocked or inaccessible. An underground maze is created. There are also time limits due to the amounts of oxygen that are available to the mine workers. Growing up in a coal mining town in rural Pennsylvania, I have a lot of friends and family members who work in the mines. Even with the technology available today, it is still a very dangerous job that requires actual human workers to complete the actually mining. Until a fully functional robot is available for use, mining will still be completed by humans.

A few years ago, a large mining disaster occurred in West Virginia. There was a large explosion that closed off tunnels and trapped workers. As stated in the paper, a robot or even man controlled rescue vehicle can run through an algorithm to create a plan to search through the tunnels for an accessible route to the victims. A robot itself can have the algorithm embedded and set loose throughout the tunnels until it encounters the victims. Some parts of the mine many only have one entrance way into the farthest depths of the tunnel. To get to these depths, there can be many different pathways to reach the entrance of that depth. In the case of an explosion, there can be multiple blockages throughout the tunnel that create a maze-like environment underground.

A robot can be sent in first. It can travel through the mine using the algorithm as it traverses the tunnel and backtrack if it encounters blockages. This robot can be sent immediately after an explosion or accident is reported. Even though there are always rescue teams on site, they still take time to respond to an incident. Think of the response time much like the response time of an ambulance or fire truck. There are still preparations and precautions that need to take place before a rescue team can take over. This is the perfect scenario for a robot.

Since the robot was sent in immediately, it will already have checked parts of the mine to make sure that access is available to the rescue squad. This robot can also report back any situations via pictures or video that may be unexpected for the team. An example of this would be tunnel flooding. By sending a robot first, the flooding from the example can be identified and alerted to the team so that they can be equipped to handle the water.

The basis of the paper focuses on making the robot the one that the algorithm is built for. The algorithm should have the robot search through the parts of the mine as fast as possible. It can even be given a typical ending point based on a signal, the paper gives the example of miners tapping on a pipe to signal their location. This can be added into the algorithm for the robot so that it can change its search based on the information.

The robot would never be executing a blind search as there would always be a reason to send the robot in. The signal could come from actual miners or from electronic warning signals such as fume levels in the tunnel. A robot could then be equipped with special testing materials, like an air tester to test for chemicals in the tunnel itself.

Since the actual destination of the victim may vary based on movement and the accuracy of the signal, there will be deviation in the A* algorithm to make the changes to the final destination. This gives the robot a better chance at finding the victim in a search and rescue mission as there will always be issues that are unaccounted for. The main purpose of the algorithm is to find the shortest path to the victim and leave the disaster area.

The A* algorithm being discussed in the paper for the search and rescue missions is a search algorithm. To start the algorithm there is an input for the current position of the robot. This tells the position of the robot and can change based on the movement of the robot. This is similar to many other search algorithms. There is a starting point and some type of endpoint. The endpoint is the destination where the search will lead. There has to be some type of goal in mind before a search can take place.

Once you have a start point and an end point, the search can begin. One type of search algorithm is the Dijkstra’s algorithm. This is used a lot in networking when looking for the fastest connection time between a server and a computer. This algorithm finds the shortest path by visiting every node on a tree and calculating the time it takes to make a decision. Once it finds the shortest node, it then travels to that one and repeats again. This type of search is guaranteed to find the shortest path because it is an exhaustive search. It looks for all possible paths and scenarios while making its decisions. This is not optimal in a search and rescue because there might not be enough time to make it through all of the possible routes while moving through a maze.

A way to increase the success of a potential search and rescue is to have the search behave more like a human. This can be accomplished by adding features and controls to the robot. A sonar device like a bat would be used to detect distance and cut down on time searching for actual walls or road blocks. This can also be combined with a video camera and a flashlight to help detect any features along the course of movement for the robot. The more sensors that the robot has, the easier it will be to detect hazards along its course. It could also include safety features like carbon monoxide or other harmful gas detectors being onboard the robot. If it detects hazardous materials, it can send a relay message back to the home base so that they are aware of any risks of sending humans in to search.

Ideally the robot should behave rationally. If it senses harmful gases in one location or path, it needs to avoid that path during the rescue phase of its mission. This is important because if there is a successful rescue, the victim of the disaster will want to follow the robot out of the disaster area. The victim does not want to follow the robot into a potentially dangerous area that could pose another potential threat.

A maze would be more of a directed graph than a tree. There are many different directions that one can travel instead of just moving down into the next area. The robot will have to go through a maze of different routes and move around certain hazards. Knowing this, we will want the robot to move towards a goal driven search. The goal is the area where the disaster occurred. The robot will search along the path until it is able to reach that goal. Think of it like a mouse in a maze looking for cheese. The mouse will move along the course until it is able to find the cheese. That is the goal of the robot in a search and rescue mission.

We learned about two different types of searching, breadth-first and depth-first searching. Since this is a goal driven search where we are trying to reach our goal in the shortest time possible we would want to use a depth-first search. A breadth-first search is where the robot would search left and right and move across before moving down a level. This might work in a recovery robot or in a non-life threatening event where recovery is the most important. Think of a search where someone is looking for a dead body. The recovery of the body would be the most important thing. So a robot performing a breadth-first search going back and forth across a field would be an appropriate search algorithm. Since I have been using a coal mining example, we would want to go with a depth-first search. Since we have a general idea where the disaster has occurred, we want the robot to get as deep into the mine as quickly as possible.

Now depending on the scenario, this could end up being a blind search. If it is a focused robot, one owned by a mining company, they might be able to program a map of their mine into the robot’s algorithm. This would increase the speed of the search as the robot would have an idea of the possible direction that it would need to take. Think of using a GPS system in your car. The GPS system already has all of the roads programmed into it. If you hit traffic or construction, the GPS system is able to reroute itself around the hazard to reach its destination. This is the same behavior we need from the robot.

Now if there is a large area that needs to be searched, it may be beneficial to reduce that area into smaller chunks for the robot to encounter. This would be beneficial in the case of a very traumatic event. Let us say an earthquake or explosion caves in multiple areas of a large mine. We might receive signals from multiple areas inside of the mine but a majority of them are blocked off. It might then be beneficial to take a heuristic approach by tackling small parts of the mine at a time. Since the robot may have to perform backtracking more frequently in this scenario, it would be better to have small checkpoints along the way, sort of like nodes along a network path.

The paper builds off of these points and uses a heuristic, depth-first approach to programming a search and rescue robot. From the class slides, the “A* algorithm uses actual and heuristic values to compute an optimal solution.” This is ideal because the goal is never overestimated and it is built by being admissible. With it built on being admissible, a solution will most likely be optimal. There are a few disadvantages to this approach as a search of a large size can take quite a bit of processing power and memory. If a search and rescue robot also has a lot of bells and whistles built-in, it decreases the amount of processing power and memory for the search. This might not be a problem if there is a known list of routes, but it can be when there are a lot of unknowns in the search itself.

To make things seem even worse, the performance of the search is connected to the estimates that are being made. If an explosion is detected on one route of a mine, it can be programmed into the algorithm. But what if this explosion creates three other disconnected blockages? The robot might create an idea route but have another unknown blockage derail its search. Then it will have to reprogram the route which brings up the memory and processing power issue again.

Once the robot is on the trail, it will have even more limited resources at its disposal. Since the robot is being used to travel through disasters, it will have to run on battery power. Large processing power can eat through a battery very quickly. Try streaming a movie to your phone and compare the power usage of that to using it to send text messages. The streaming will eat through the battery much quicker as it is using the graphics processing power.

The basis of the search itself if f = g + h where g is the start point and h is the heuristic distance to the destination. The f is the actual time it takes to arrive at the destination. The paper adds a letter i to the search to make it a little more complex than the basic A* algorithm. The i is the current position of the robot. This is a key feature to add because the robot has to move throughout the maze in a disaster where unknown circumstances may cause the robot to have to reroute its course.

The paper then breaks down three differently programmed algorithms. This is highly beneficial because it shows a wide range of different outcomes. The surprise here is that the A* (2) algorithm always performed better. Sometimes the outcomes were very close but other times there was quite a bit of difference in the actual performance. In conclusion, the algorithm that adds a father node helps decrease the actual search time of the algorithm. This is very important in a search and rescue algorithm as it helps decrease response time in an emergency. Sometimes this can have a limited overall effect in efficiency, but other times it can make a huge difference.

The A* algorithm looks at a map and then makes decisions based on the surroundings and the perceived notion of where the goal is. Let us continue with our coal mining example. Let us say that the entrance is in the front of the mine. From there, the mine splits off into two separate directions. There are a few walls and obstacles that are in the way of the mine itself. The major mining for the mine is occurring in the bottom right side of the mine. There is a little alcove there where the mining accident occurs.

There is a cave-in at the mine right in the center where the path that the miners usually travel to the alcove gets blocked. The robot then has to start its search to find the path to the trapped miners. The miners are able to send a signal out from their location so that the rescue team knows that they are located in the alcove.

By known association, the robot begins to search on the right side of the breakaway entrance of the mine. As it moves around the mine itself it notices that the cave-in is in the center of the normally traveled path. Knowing the other location on the side, it is able to see that by going the opposite direction, to the left and down, it will be able to find another way into the recovery zone at the alcove. There is another wall that creates a boundary on the left side but the robot is able to cut above it so that it can then go straight down into the alcove. There is a path to the right of the alcove but the robot does not even check there as there is no need to go around the long side to see if it can get through. There is already a path created.

There is another known path to the alcove, one where the robot could have gone straight down to the left and then over, but it already had an idea that the trapped miners were on the right side of the mine based on their alert. So as soon as the robot had an opening, it cut back to the right towards the location of the trapped miners. This is the brilliance of the A* algorithm. It knows the miners are on the right side of the mine so it continues to work its way to the right side of the mine. Even though, at the start, the right side ended up being blocked from the robot.

I created a graphical representation of the mine itself. There are two variations. The first variation is when the mine is actually blocked. The robot itself would have to check a lot more of the area before finding access to the victims. The second variation is the actual mine without blockage in the area that the miners normally travel to.

The miners are working and hear the explosion. They walk up the right side of the alcove to check and see what has happened. They see that their path is blocked. The mine is very dark and hard to see, so they head back to their alcove to signal for help. The entrance of the mine is at the very top and is marked by a star. I used a green marker to represent the actual walls of the mine. The mine walls are actually very narrow so the team does not have a lot of room to move around. I have marked their location as the end with another star. They are located in the bottom right side of the drawing. The blockage I have colored in a circle with a black pen and labeled off of the map as blockage. The second map has the same layout without the blockage shown.

On both of the maps, I have the shaded in the searched areas with a blue pen. The path that the robot would end up taking is colored in red pen. The closer the blue to the red pen, the closer the nodes are to the actual path that has been taken. The blue area also corresponds with the amount of time that it takes the robot to actually perform the search. The more ground that the robot covers, the more time it takes the robot to perform the search. On the first map, the robot looks to the right since it knows the alcove is on the right. There is no open path as that is where the blockage has occurred. The whole right side is shaded right as that is the direction that the robot wanted to move. It knew that it had to move right but it was not able to find an opening.

Once the right side had been searched, the robot moves to the left side of the split trail. From there it also moves right and comes back in contact with its path from the previous search. So now, all areas around the path have been shaded blue. The robot now starts to work its way down as it moves to the right. The left sides are checked but the robot continues on its path down and to the right. There is an opening there so that is the direction in which the robot moves. The lower right side of the map would not be searched and that is ignored. That is because the robot worked up and under the left side of the alcove. Looking at the map, it would appear that over 75% of the mine has been checked by the robot.

On the next map, the blockage has not happened. The robot moves to the right as it tried to do before. It works its way to the right and starts to move down. There is no blockage so the robot continues to work down and to the right. As it comes down to the right side of the alcove it knows that there is no blockage and that the entire path is open. It continues straight down the right side of the alcove until it can turn up to where the victims would be located. The left side of the map is left unshaded as that side has not been checked by the robot. Looking at this map less than 50% of the map has been checked by the robot before it has reached its destination.

In conclusion, an efficient algorithm can be made inefficient due to unforeseen circumstances that could occur on a search and rescue mission. There are a variety of factors that could make a search and rescue a success. Ideally, the less ground that is searched for a route, the more efficient and success the route will be. That is why Dijkstra’s algorithm or a similar one is not advocated in the paper.

Drawing of the mining example to show the blockage and correct route

The A* algorithm is an algorithm that has far reaching uses. The focus of the second paper I chose was using the A* algorithm for search and rescue missions. Chunbao Wang, Lin Wang, and Jian Qin from Sun Yat-sen University along with Zhengzhi Wu, Lihong Duan, Zhongqiu Li Mequn Cao, and Xicui Ou from the Shenzhen Institute of Geriatrics focus their paper, Path Planning of Automated Guided Vehicles Based on Improved A-Star Algorithm, on using the algorithm to improve the routes of guided vehicles. The papers have the same type of premise, as they are both looking to have guided machines use the A* algorithm to generate a route for a machine to take.

The improved A* algorithm proposed in the Automated Guided Vehicles could very easily be applied to the rescue robot. The goal of the improved algorithm reduces turning and removes edges from the route so that a more direct route is taken. This is included in the vehicle algorithm because with the lower amount of turning a vehicle does, the less chance there is of an accident occurring. The longer route a vehicle takes, the greater chance of an accident occurring also is true. The longer a vehicle is moving the more dangerous the route becomes. That is the purpose of finding the fastest route.

The purpose of finding the fastest route in a search and rescue mission is to save as many lives as possible. Looking at both papers, safety is the reason behind using the A* algorithm to guide the machines through a maze of directions. Both also have to take into account that the environment might end up changing after the initial route is chosen. There may be environmental issues such as blockages or hazards that interfere with the actual route. In the case of the robot, it could get stuck and not be able to complete its mission. For the vehicle, it could cause an accident and potential harm to people.

Time is also important in both papers. For the search and rescue, this is of upmost importance because the longer it takes the robot to arrive at the destination, the greater the chance of an increased loss of life could occur. For a guided vehicle to be practical, it also has to make quick route decisions and be able to arrive in a timely matter. This goes along with the safety of both the guided robot and the vehicle. Safety is important, but so is the amount of time it takes for the robot and vehicle to arrive at their destination. One could argue that time is more important to the robot and safety is more important to the vehicle, but in reality they are required by both.

The difference between the two papers is in the application of the actual A* algorithm. The goal of the guided vehicle is to decrease the number of turns that it is has to take to reach the destination. When looking at the comparison of actual shortest path and the basic A* algorithm, they are getting the same performance speed-wise from all of them. The path length for all three improved A* algorithms, the Dijkstra, and the A* all traveled a length of 45 nodes. The Dijkstra traveled all 32 nodes to reach its route conclusion. This added extra time to its route configuration while all of the A* were right around the same amount of nodes search. The improved A* routes did shave off a few nodes from the original A* algorithm.

These results look good and all, but they were only configured on one map. I liked the fact that the search and rescue paper included multiple mazes with speed breakdowns of each. It was clear from that paper that the A* (2) algorithm was the best algorithm to be used in a search and rescue scenario. One could draw similar conclusions on the improved A* algorithms, but there is not an overall sufficient body of work to draw that conclusion.

Another difference between the two situations is the collision feature. In a search and rescue, there will not be a bunch of other robots traveling in different directions and having different goals. The robots will be focused on the same goal of rescue. In the guided vehicles paper, the goal would be to have self-sufficient vehicles on the road. In a large scale production, a self-guided vehicle would have to be able to maneuver throughout highway traffic. The algorithm has a built in time collision detector that would avoid conflict. If a conflict can be avoided by waiting, then the vehicle waits.

Think of this in a real world situation. A car is being driven down the road and two pedestrians walk across a crosswalk. The car cannot run over the pedestrians. The guided car would have to make adjustments. With a wait time implemented, the guided vehicle would take time to make sure that the crosswalk is clear and then it would proceed across. For this to be effective, there would have to be very sensitive sensors built into the car. The car would have to be able to detect walking people or animals in a split second.

If a deer or other animal jumped out in front of the vehicle, it would have to make a quick decision to avoid the collision. It could still implement the wait time, but it would have to be signaled to the vehicle quickly. This sensor would have to be able to tell the difference between an actual collision and not a perceived one. What if there was a heavy rain outside that immediately started around the car? Would the car pick up the motion of the rain and calculate that as a possible collision? It would be pretty silly for a vehicle to stop as soon as it started to rain or even snow. There are quite a few different situations where these features would have to be tested extensively before they are road ready.

In a search and rescue environment, the robot does not have to worry about oncoming collisions and unexpected objects like a vehicle would. Sure, there might be falling debris in a mine after an explosion, but a disaster area would be more of an isolated environment when compared to the guided vehicle. The guided vehicle would have a more controlled environment when compared to the robot as well. The vehicle will always have at least a minimum width of road and stable surface to drive on, while a robot may not have either in its working environment.

When looking at the algorithms used by both, they are very similar. Since both are using an A* searching algorithm the results are going to be consistent across both uses. The vehicles tried using a Dijkstra algorithm for comparison while the search and rescue did not. It did not make sense for the search and rescue to use that algorithm because there would never be a scenario where every location would be visited when time is of the essence. The guided vehicle used an improved A* algorithm while it was clear from the simulations that the best A* algorithm used by the search and rescue was the A* (2) algorithm.

The A* (2) algorithm is “f2(i) = g2(i) + h2(i) + h2(j).” In this algorithm, the f is the course that the node will take. The i used throughout is the location of the robot in the search. The g is the start point and the h is the end point that is being estimated. The j then used, is the father point along with the estimation. The father point improves the search speed. It does this because there are less nodes for the search to look through while it determines its path. Since we are going depth first, the father node helps create a sort of tree even though we are using an undirected graph. We do not want to backtrack or spread out, so this point helps create an increase in speed which is why we see it run faster than the other A* algorithms in the simulation.

This might help increase the speed of the car, but I do not think it is necessary. In the guided vehicle, the destination is the goal with the least amount of turns. A much simpler algorithm can help achieve this goal without coming across the extra processing power needed to traverse a more directed graph with another node to consider. The vehicle algorithm initialized the i value to 1 and works through the shortest path. It also takes into account comparisons between the goal location and the estimated distance. This is something that doesn’t need to occur in the search and rescue because it is not trying to limit moves like the car is trying to limit turns.

Overall, both papers use a similar idea to solve a very similar goal. The slight differences in the overall scope of each project leads to different algorithm outcomes even though both are based on the A* search algorithm.

References

  • Farmer, Michael. CSC546 — Artificial Intelligence: Classroom Slides. 2015.
  • Liu, Xiang. A Comparative Study of A-star Algorithms for Search and rescue in Perfect Maze. 2011.
  • Wu, Zhengzhi. Path Planning of Automated Guided Vehicles Based on Improved A-Star Algorithm*. 2015.

Smartphones as Micro-Computers

A smartphone is a micro-computer that falls under the class of desktop computers. Each smartphone has its own operating system that uses a native development language to run applications. A smartphone uses a variety of input devices to interact with these applications. Applications can be installed through a market that is included in the operating system or by transferring the application package to the phone directly.

For input, each device either includes an onboard or onscreen keyboard to interact with applications. Other input devices, such as a full keyboard or mouse, can be connected via Bluetooth or possibly a USB port depending on the smartphone. Smartphones also have an onboard camera that can take pictures and capture video. They also take advantage of GPS technology and are able to use GeoLocation to interact with map applications (Allen).

A smartphone transfers data by using a mobile network, mainly 3G or 4G, but most include a Wi-Fi or Bluetooth option as well. To use the 3G or 4G mobile network, the smartphone has to be on a phone network plan. Some of these plans include unlimited data transfer but others make you pay extra whenever you go over a certain data limit. This data limit can be increased by paying a higher monthly fee (Kamea).

Storage on a smartphone is included inside the device with an option for MicroSD expansion. Smartphones also take advantage of cloud storage, allowing users to back up information on a remote server and then download the information when it is needed. By taking advantage of cloud storage, users are able to keep local storage free for more applications. Users are also able to restore information if their smartphone is damaged or lost by downloading it again.

Smartphones are desktop computers that can be kept in the pocket of a user. They make data transfer easy and are on the cutting edge of technological advances. The mobile market is constantly changing so developers are always implementing new technology in their devices.

Smartphones Chips

A smartphone uses some of the most advanced processing chips on the market. The chip has to be able to run multiple applications and processes at the same time. At the core of a mobile operating system, applications are handled quite differently than on a desktop operating system. Applications are processed as tasks and called without a user option to close. When a user switches to a new application, the other continues to run in the background unless the user kills the task.

This behavior was created so that if a user received a phone call while using an application their work would not be lost. If a user doesn’t close tasks, a processor could run multiple applications at all times. If the chip cannot process tasks effectively, it would put a drain on the whole system and waste battery life. The processor has to be able to process data quickly and through a variety of different means as a smartphone has the capabilities to receive data through Bluetooth, Wi-Fi, 3G/4G, and GPS.

When a user provides input to a smartphone, the information is transferred through the processor and is handled in different ways depending on what the user is trying to do. The touchscreen input is handled by the CPU and is transferred to the internal memory. The memory processes the information and is transferred to the video interface to show the user response. If there is an output connected through a HDMI port, the video interface would then transfer the input response to the HDMI device (ARM).

Depending on the application being used, the user also has the option to use Bluetooth, Wi-Fi, 3G/4G and GPS to process information. The camera is also handed by the processor in the same way. If a user takes a picture, the picture is stored locally in the memory until it is saved to the storage device that the user had chosen. The user could then use the touch screen interface to call the picture which would then be transferred through the processor to the video interface so that it is displayed on the device (ARM).

When a smartphone has a fast chipset, the user will not feel any lag between their input device and the display of the phone. With the invention of newer chipsets, users can expect their devices to process information at extremely fast speeds. Qualcomm has a new Snapdragon chip that offers new features instead of just being a speed upgrade.

Named the Snapdragon 800, Qualcomm’s new chip supports higher video capture along with the faster speeds. Pictures can be taken with a camera of 21 megapixels or the user can capture video at 4,000 x 2,000 pixel resolution (Tofel). The highest resolution television has a display of 1080p or 1,920 x 1080 pixels (at the time of research). So a smartphone running the Snapdragon 800 would be able to create high definition video above what the highest resolution television could display.

Intel recently announced a next-generation quad core processor that is due out this year. This quad core processor is low-power and is smaller than older models. Smartphones created using this chip could be smaller and last longer on a single battery charge. Intel is also releasing a low-power multimode-multiband 4G LTE global modem that in conjunction with its new low power quad core processor could have a single phone charge last for days (Poeter).

The newer chipset speeds will not be able to be fully utilized if developers do not take full advantage of them when creating applications. If an application is not coded efficiently, it will not matter if a chipset is faster because the application could still possibly run slow. Anytime new technology is released it takes a while before devices are fully capable to take advantage of the upgrades.

Smartphones as a New Sub-Market

Smartphones fit perfectly into the desktop computer sub-market. I do not believe that smartphones offer enough of a difference to cause the creation of another sub-market. A new submarket would demand a different type of use for that device.

Servers as a sub-market handle the transfer of data. Once they are set up, the user inputs data from a remote source instead of using a direct input device into the server. Even though the server is comprised of the same parts as the other sub-markets, it is the way they are used that make them different.

Desktop computers are all about user input and modifications. Users are able to find applications to install that provide a variety of different uses. This is the category that smartphones fit into. Users are able to install applications and use input to directly control the device.

Embedded computers have the chipsets installed and are created with one purpose in mind. An example of an embedded computer would be a DVD player. The computer in the DVD player was created with one purpose, to playback DVD discs.

Since each sub-market is based on the user input side of things, I do not think that smartphones demand their own. If a new sub-market was to be created, I think it would be created for cloud computing (although technically these could fall under servers) or virtual reality devices. A virtual reality device would change the input handling to an extent that would help create a new sub-market.

References

Computer Processing in Guitar Amplification

There has always been a debate over which type of amplifier produces superior sound. With computer processors being produced that emulate vintage amplifiers, the similarities between solid state and vacuum tube amplifiers are becoming even closer.

I. Introduction

Over the years, many types of amplifiers and speakers were produced. This created a mixture of sounds that musicians wanted to imitate. When guitarists go into the studio to record, they can spend hours tweaking their settings to produce the best tone. It is best to mix the guitar tracks using a variety of different tones to help create a thick mixture when they are played together. This creates a full guitar sound and it is accomplished by overdubbing the same guitar track played through multiple amplifiers. If one amplifier produces a bright tone, it will compliment an amplifier that has a punchy bass sound when both tracks are mixed together.

As computer technology grew over time, so did recording capability. Recording studios that were set up using analog tapes were slowly replaced with digital recorders. With analog tapes, any time a guitar part was to be overdubbed, tapes had to be cut and aligned so that they could be mixed together. This was a time consuming and monotonous process. As tapes were cut and rerecorded on, they also lost some of their sound quality. Digital studios do not have the same limitations as analog ones as a recording can be aligned using software, and tracks can be cut and deleted with just a few clicks of a mouse.

Some musicians argue that the sound quality of vacuum tube amplifiers and analog recordings sound better than their digital counter parts. Voiced complaints against the sound of digital amplifiers say that the sound the amplifiers produce is too “tinny” or machine like. The sound from the digital amplifiers is said to not be as full or have the same characteristics of their vacuum tube counterparts. As solid state amplifiers became more popular, vacuum tube amplifiers became harder to find as there were not as many models produced. It is cheaper to produce a solid state amplifier and they are easier to maintain than tube amplifiers.

The debate over whether tube or solid state amplifiers is not new. In 1972, Sears Audio researched whether there was a difference in output between the two types of amplifiers. They found that when compared together the harmonic distortion output of solid state amplifiers is less than that of tube amplifiers [1]. This creates a cleaner tone, but means that when distortion is desired by a guitarist it cannot be produced as efficiently.

As guitarists started to show interest in vacuum tube amplifiers again recently, another concept was developed. A virtual or modeling amplifier was created. The purpose of the virtual amplifier was to emulate the sounds produced by amplifiers that are not manufactured anymore at the fraction of a cost. This allows a guitarist to switch sounds on the fly without having to carry around a ton of gear. Some of which, is impossible to find as it is not produced anymore, and reproductions are made using cheaper components so the sound produced is not of the quality the musician requires.

A virtual amplifier can also take advantage of vacuum tubes by mixing a solid state circuit with a tube gain channel. Since solid state circuits can amplify the guitar signal cleanly at high volumes, a tube gain circuit compliments that well in the gain department. The distortion sound of solid state amplifiers is usually the area where they receive the most criticism.

The future of virtual amplification has started to take shape in the form of software plugins. Computer sound cards are being produced that include an embedded digital signal processor to change analog signals from an instrument into a digital form that can be interpreted by a computer. This sound card works in conjunction with digital audio workstations to produce virtualized recreations of antique amplifiers and effects. The virtualized amplifiers and effects are stored in a virtual studio technology file. These are often referred to as plugins.

Plugins come from a variety of different sources and can perform one or many functions. A single plugin may serve only a single purpose, such as adding reverb to a signal, or it may emulate the entire circuit board of a discontinued effects pre-amplifier model. Some of the higher quality plugins are sold by workstation manufacturers while others are offered as free downloads on the internet. To install the plugins there needs to be a digital audio workstation installed on the computer. Depending on the workstation that is being used, the installation may differ, but most involve extracting the virtual studio technology file into the plugins folder for the workstation.

II. Details of Architecture

Universal Audio has produced a PCI accelerator card that takes an analog signal and uses a variety of plugins to recreate classic amplifier sounds. The idea behind the card is that a musician can choose from a variety of different set-ups to create whatever sound they want without having to spend more money buying other pieces of gear. The plugins run within the workstation itself so that once the audio signal is recorded, the musician has the freedom to change plugins and thereby change the sound of the recording on the fly.

This gives many advantages, as before if the guitarist decided to change tone in the middle of a recording, they would have to connect a different amplifier and rerecord their part. If the current recording clashes with another instrument, say the recording is too bass heavy and it is interfering with the bass guitar sound, the guitarist can switch plugins to emulate a different amplifier, say one with a punchy mid tone instead. In turn, this saves time as the guitarist only has to worry about nailing the part once and then can worry about the tone later.

The accelerated card also supports up to eight analog devices [2] at one time, giving other musicians the freedom to record at the same time as the guitarist. It would also be possible to set up microphones on each piece of a drum set and record a complete drum session at the same time. Each track would record to its own channel so that effects can be added later. Compared to analog tape recording techniques of the past, digital recording has made the mixing and tracking of recorded parts easier and less time consuming.

The UAD-2 uses four SHARC ADSP-21369 processor chips. The ADSP-21369 is a 32/40 bit floating-point processor that was created for use with high performance audio processing. It features 2M bits of on-chip SRAM and 6M bits of on-chip mask programmable ROM. The code is compatible with all other members of the SHARC processor family. It also features onboard audio decoders in ROM and pulse-width modulation [2].

The ADSP-21369 allows two data transfers from the core and one from the I/O processor in a single cycle. Data from the processor is formatted as a 64-bit frame and divided into 32-bit words. Each of the eight inputs has its own clock, frame sync, and data inputs as well [2]. This allows for multiple recordings at the same time with no loss of quality and no synchronization errors.

In comparison, a lot of MIDI control devices from the eighties have used a Zilog Z80 microprocessor. I had an ADA-MP1 that had an internal Zilog Z80 microprocessor itself. It is an 8-bit processor that has been widely used over the years. It features a two-bit control mode that can be written into the output register by the CPU or read back to the CPU from the input register at any time [3]. Using the Z80 with a MIDI controller allowed users to access and store preset data from their musical instruments.

Data from the Z80 CPU goes into the BUS I/O. It flows to the internal control logic and continues until it is interrupted or the counter logic stops the process. The Z80 has 8-bit data paths and 16-bit addresses. It can search whole data words and directly address any byte in the memory. The base register value is D0 and it works down to the D7 register. Some of the register values are reserved for special functions. For example, C3 is reserved for Reset values. For the Z80 to function properly a single 5V supply and single-phase clock is required [4].

III. Features

The main feature of the ADSP-21369 is that it can handle eight inputs at the same time. Used in a musical recording environment, a producer would be able to record sessions from up to eight different instruments at the same time. In a typical rock band, this would allow input from a guitar, bass guitar, a microphone for a singer, and five microphones to be placed around a drum kit. Since all of them would be recording at the same time, it would allow the producer to catch the excitement of a song by using a live take. Then using a digital workstation, if there are any errors in a take, the producer would be able to isolate them and overdub fixes later on.

The ADSP-21369 also features an asynchronous memory controller that provides a configurable interface for up to four separate banks of memory. Each memory bank can be independently programmed with different timing parameters which enable connections from a wide variety of memory devices. This makes it a very versatile processor as it can handle SRAM, ROM, flash and EPROM as easily as it can handle I/O devices that interface with standard memory [2].

The features of the Z80 are quite different as there is quite a gap in the power of the two processors. With the Z80 only handling 8-bit operations, to control devices in a live setting there must not be any delay from the hardware. The Z80 has a very specific memory language to minimize the number of different opcodes that correspond to the basic machine set. This provides consistent operation from the processor [5].

The Z80 uses a buffered I/O technique for handling the assembly language source files. The assembler automatically determines the available workspace and allocates the correct buffer sizes [5]. Using buffers, the Z80 can handle input and output without any lag or delay in the processing signal. Also the Z80 can use the same instruction set as other processors, which is a bonus, considering when some manufacturers make chipset revisions they break some of their original functionality [6, pp. 203–208].

According to Shima [7], the Z80 was created with the intent of trying to get higher performance than Intel’s 8080 and Motorola’s 6800 processors. It took two months to create the architecture design of the Z80. When the design was finished, the Z80 was able to update both registers that are in use concurrently. So even though the Z80 was created for a variety of different uses, it is still useful in the recording world. While the ADSP-21369 was created with the distinct purpose of being used for multi-track recording sessions, so it is better suited for that task.

IV. Application

Older amplifier models needed to be modified to make changes to their tone. I purchased an ADA MP-1 guitar pre-amplifier and completed the modifications myself to see if I could notice a change in the sound. The MP-1 is a guitar pre-amplifier that uses both solid state and vacuum tube components. The modifications I performed were used to swap out the tubes that are installed originally in the MP-1 with tubes and produce higher sounding gain. The modification is called 3.666.

The MP-1 was originally produced in the 1980’s and has remained popular with hobbyists as it is relatively quite simple to perform modifications. There are instructions on the internet for a variety of different modifications using readily available parts, but I chose the 3.666 mod to increase the gain while keeping the low end sound for playing rhythm guitar. The 3.666 mod makes the MP-1 sound like higher priced Mesa Boogie amplifiers. Another reason I chose to mod the MP-1 was because of the price and flexibility for the amplification itself.

To start the modification, the tubes need to be removed from the circuit so there is no chance that they can be damaged while soldering the new components to the circuit board. It is important when changing any hardware parts on any circuit board to take your time and make sure all of the parts are aligned correctly so there are no mistakes. It is also important to clean any excess solder off of the board so that all new parts are connected cleanly.

Replacing resistors and capacitors on the board is the next step for the modification. Once again, make sure to check to ensure that all connections are made cleanly. The circuit must be simple and fast to create a transparent or clean signal [8]. The new resistors and capacitors have different impedance values to help maximize the distortion of the tubes. This accomplishes the task effectively, but it comes at a price. Once the modification is complete, there will be an excess amount of unwanted noise that will be produced by the pre-amplifier. I handled this excess noise by running my MP-1 pre-amplifier through a noise gate before reaching the actual amplifier.

The next step of the modification is to change the trim pots of the equalization section of the amplifier. There are more resistor and capacitor changes that need to be made in this section as well. The changes in the equalization section create a larger range of values for the bass and mid tones. This allows the mid to be scooped out almost completely so that the sound output will consist of only bass and high tones. This creates a thumping guitar sound that compliments the kick bass drum and would be best utilized in heavy metal music. If the signal comes in too hot, there is a chance of high amounts of feedback that can cause clipping within the transistors [9].

The downside to the trim pot modification is that it will need to be set manually on the circuit board. Once the values are set, the only way to change them beyond the scope of the built in preset values is to open back up the case of the MP-1 and physically rotate the trim pot knob itself. This doesn’t really matter much once the MP-1 is set to match the speaker cabinet that will be used. It is best to adjust the trim pot while playing the guitar using a heavy palm muting technique to bring out a full bass tone from the guitar. This will help with the adjustment as it needs to match the speaker. Too much bass will create a muddy tone and not enough will leave the guitar with a weak sound. When the trim pots are set successfully set, the guitar tone will sound tight and punchy.

The benefits for using a pre-amplifier compared to a complete guitar head amplifier is the same a comparing a guitar head amplifier to a speaker combo amplifier. A pre-amplifier is separate from the amplifier itself so the musician has freedom when choosing the amplification section. Plugging a pre-amplifier into a full solid state amplifier allows the guitarist to achieve the full tone of the guitar and pre-amplifier without extra distortion added from a tube amplifier. If a guitarist wants to add even more distortion, they can set up the pre-amplifier to play straight into a tube amplifier. When adding too much distortion, excess noise can be added to the signal as unwanted white noise. For this reason, it is important to match the speaker impedance to that of the amplifier [10].

Once I had completed the modifications, I was unable to get a clean sound from my guitar when using a tube amplifier running from the pre-amplifier. The tube amplifier added its own distortion on top of the distortion that was created by the modification of the MP-1. This worked great for solos but really muddied up the tone of the guitar while trying to play rhythm pieces. A major factor that contributes to the extra distortion is the extra waveforms that are created from the overloaded modulating signal. These extra waveforms conflict with each other making the individual notes harder to distinguish creating a muddy sound [11].

To clean up the sound for rhythm guitar I had to use an A/B switch after the pre-amplifier to change the signal from a tube amplifier to a solid state amplifier. The solid state amplifier allowed for a full recreation of guitar and pre-amplifier sound without adding any distortion. To test to see how much extra distortion was created from the tube amplifier compared to the solid state one, I plugged my guitar straight into the A/B switch and tested each separately. I cranked both of them up and switch backed and forth between the two amplifiers. With the gain knob around six, the tube amplifier began to distort. At full volume the distortion was that of which you would hear on an AC/DC album.

The modification to the pre-amplifier took away a lot of the mid tones creating a very punchy bass sound and a bright high end that really cuts through any recording. Running clean gain settings on the MP-1 still distorted the signal even at the lowest volume. To prevent any distortion I had to use another A/B switch and bypass the pre-amplifier altogether. When I did that I sent my signal straight into a separate effect pre-amplifier to add a chorus effect to my signal and then it continued into the solid state amplifier.

For my rhythm guitar sound I ran my guitar through the MP-1 and the solid state amplifier. To get the extra distortion that I needed for solos I ran the MP-1 through the tube amplifier. I also ran some other varieties of my signal through the amplifiers and the effect pre-amplifier to achieve other effects such as delay and reverb which I wanted to be able to use in some songs.

To accomplish the complex switching that was needed, I had to employ the use of a MIDI foot switch. I placed all of my pre-amplifiers and amplifiers together in a rack mount and wired them altogether. I also placed a sound normalizer and equalization unit at the end of the chain to help keep my signal consistent. I found through my earlier A/B test that the tube amplifier signal came out louder than the solid state signal even though both units were rated at 100 watt output. Through tests of other amplifier units I found that tube amplifiers generally sound louder than solid state units rated at the same output.

To Z80 handled channel input from the MP-1 and the control was used to store and recall data from the built in equalizer. It is possible to have channels that matched the same amount of gain but had different equalization levels to scoop out the mid tones or brighten the high tones. This is useful from the perspective of a musician as the MP-1 can store up to 128 channel settings on internal memory and another 128 on external memory using the bypass setting. This enables the user to create another 128 settings using another processor like I did with the effects pre-amplifier.

The settings created can also be backed up to a local computer. To accomplish this task using modern equipment I had to purchase a MIDI to USB cable to connect the MP-1 to my computer. Then I was able to store the MIDI file settings on my local computer. It was also easier to change the vales in the files themselves and re-sync them with the MP-1 as I was able to change all of the values in the file using the keyboard instead of navigating the up and down arrows on the front panel to find my settings. The only downside to changing the values on my computer was that I wasn’t able to hear my changes as they were occurring.

The values of the MP-1 are stored as hexadecimal addresses. The hexadecimal address for each value is stored as one byte each. Each hexadecimal address denotes a separate value. For example, the value of the first overdrive channel is stored at the hexadecimal address 00. The actual channel switch itself, between the solid state and tube circuitry, is stored at the hexadecimal address 0A.

The settings for the other parameters handled by the MP-1 are controlled by MIDI values that are stored at their hexadecimal address as well. Depending on the setting, the MIDI value changes the actual value for the setting by using different amounts. For example, the overdrive values increase by two tenths of a percentage while the bass and treble value settings increase by three percent with each MIDI value increase.

There are twenty-nine preset values included with the MP-1 when it is first booted or reset. The settings are set to emulate well known amplifiers or styles that are highly sought after by guitarists. The first setting emulates a Marshall brand tube head amplifier. After modifications have taken place, the presets will not work as planned and will need to be adjusted based on the modifications that are performed. When emulating other amplifiers the waveforms produced are close in relation, but there are slight differences between the two. The variance created by the actual tube circuit may add to the richness of the distortion while a solid state or emulated amplifier would not have that variance [12].

V. Conclusion

Over the years, musical styles have changed drastically. From the instruments that musicians favor, to the overall approach in composition, music has become louder and more aggressive. One of the most drastic changes in a singular instrument has taken place in regards to the sound of the guitar. Originally, the guitar was produced as an instrument with strings that vibrated over a hollowed wooden body to amplify the sound. The concept and design of a guitar has changed throughout the years.

In competition with louder instruments, such as drums, when played live guitars looked for ways to increase the volume output of their instrument. Guitar designs were changed to accommodate the needs of musicians by changing the structure of the instrument and adding electronic components to help amplify the sounds that the guitar produced. Originally, the guitar body was changed from a hollow one to a semi hollow body with a magnetic pickup attached above the bridge to help amplify the sound. This helped keep the warm tone that is naturally produced by full hollowed body acoustic guitars.

As live music continued to evolve and become louder with the invention of amplifiers and speakers, so did the design of the guitar. Manufacturers stopped hollowing out any part of the body, instead using solid blocks of wood. This created a more aggressive sound. Guitars were also able to be produced at cheaper costs as manufacturers could now bolt the neck of the guitar to the body. This allowed them to craft the guitars out of smaller blocks of wood instead of one large piece which was needed at the time to create full acoustic guitars.

Along with the changes made physically to the guitar, there were vast changes in the way the signal from the electric guitar was amplified. A guitar pickup is basically a magnet that has been wrapped with wire. This creates an electromagnetic signal that travels through the cable to the output jack of a guitar. Some guitars have knobs and switches hardwired to change the tone of the signal or the volume from the guitar to the output jack. Some pickups are also considered “active”, and these pickups use battery power to help boost the electromagnetic signal. The pickups that do not use battery power are considered passive.

Once the signal reaches the output jack of the guitar, it travels through a wired instrument cable into an amplifier. Amplifiers come in a variety of different designs, but they all serve the same purpose — to boost the signal of the guitar. The components of an amplifier have changed along with the guitar and the technology of the times. The original intent of the amplifier was to boost the signal of the guitar so that it could compete in a live setting with instruments that are naturally louder such as drums. The amplifiers were created using vacuum tubes which then increased the volume of the guitar output.

Amplifiers have two main designs. There are combo amplifiers that combine the circuit that boost the signal and include one or more speakers wired together. Using tube combo amplifiers can be cumbersome to carry around as tubes themselves are quite heavy. This also limits the sound that that certain amplifier can make as the speaker and the amplifier circuit are wired together to produce the same tone.

The second amplifier design is to create a “head” which is just the amplifier components themselves. This design includes a separate output jack which gives the musician the freedom to choose which speakers they want the amplifier-produced sound to run through. This allows the musician to change the tone of their amplifier by changing the type of speaker that produces the sound. This also breaks the amp into separate parts allowing easier transportation depending on how large the speaker cabinet is.

Both design types have their pros and cons, but the most drastic advancement made in amplifier development has been in regards to the circuit components. When the transistor radio was produced, the idea to use it in amplifiers was also implemented. This would allow manufactures to replace the vacuum tube circuitry of an amplifier with solid state components. Some benefits of using a solid state amplifier instead of a vacuum tube amplifier would include lighter weights, cheaper costs, and an unlimited number of different tones produced by the amplifier.

By placing a computer processor in an amplifier, the amplifier can be used to create a variety of different sounds. The amplifier can be controlled remotely using programmable switches to help the musician change tone during performances. An amplifier can also take advantage of memory banks so the musician can preset tones and then recall them as needed.

I have built guitars and amplifiers consisting of a variety of different parts trying to recreate the sounds of different musicians. The composition of a guitar itself can change the sound of a player, along with the technique used to play the guitar. But over the years, I have found that the greatest change in tone occurs in the amplifier itself. Speaker change has a slight effect on the sound as well, but again, I have found that the amplifier itself produces the greatest change.

An amplifier consists of a circuit board, capacitors, resistors, transistors, vacuum tubes or computer chips. An amplifier does not have to use all of the previously mentioned components, just some variety of them. I have seen amplifiers that only consist of transistors. The intention of amplifiers changed over time as musicians started to incorporate sounds of distortion into their music. Originally, an amplifier was only used to boost the signal from the guitar. Distortion was an unwanted byproduct of using vacuum tubes. When the gain of an amplifier is turned up past the maximum level that the vacuum tubes can handle, the sound begins to distort.

Guitarists started to take advantage of the distortion by using it to boost their signals while soloing. This was most prevalent in the blues, but was soon incorporated into rock and roll music. Guitarists started to look for ways to add distortion to their guitar sound and control the amount of distortion. The distortion can be controlled by boosting the signal and or by varying the attack that the player uses. Manufacturers started looking for ways to allow musicians to control the tone of their sound while playing.

They accomplished this by creating amplifiers with multiple channels that were able to be switched by using a foot pedal. The foot pedal sends a signal to the amplifier telling it to switch channels on the circuit. Some amplifier manufacturers used different types of tubes and capacitors for each channel to increase or decrease the distortion. The benefit of this approach was that guitarists could play rhythm guitar on the clean channel and switch to the distortion channel when soloing. This would allow the solo guitar to cut through the mix.

In the future, I believe that digital recording will overtake all forms of analog signal processing and recording. There will be a demand for tape recording gear along with tube amplifiers, but in the end the cost of that type of equipment will vastly outweigh the benefits of it when compared to digital equipment. The best sounding recordings that I have ever made have been created using a variety of different gear with different tonal qualities. With virtual signal emulation, different sounds can be implemented on the fly so that it is easy to create great sounding recordings by mixing emulated signals.

VI. References

  1. R.O. Hamm, “Tubes Versus Transistors- Is There an Audible Difference?,” Sears Sound Studios., vol. 21, no. 4, May 1973.
  2. SHARC Processor ADSP-21369 User Manual. Analog Devices, 2013.
  3. Zilog Z80 PIO User’s Manual, ZiLOG Worldwide Headquarters, 2004.
  4. Z80 Family: CPU Peripherals User Manual, ZiLOG Worldwide Headquarters, 2004.
  5. Z80 Assembly Language Programming Manual, Zilog, Inc., 1977.
  6. R.B. Thompson and B.F. Thompson, PC Hardware in a Nutshell: A Desktop Quick Reference, 3rd ed., Cambridge: O’Reilly, 2003.
  7. M. Slater, F. Faggin, M. Shima, and R. Ungermann, “Zilog Oral History Panel on the Founding of the Company and the Development of the Z80 Microprocessor,” Computer History Museum, 2007.
  8. E. Sanchez-Sinencio and J. Silva-Martinez, “CMOS Transconductance Amplifiers, Architectures and Active Filters: A Tutorial,” IEE Proc.-Circuits Devices System., vol. 147, no. 1, February 2000.
  9. P. Quilter, “Amplifier Anatomy — Part 1.” Sound & Video Contractor., February 20, 1993.
  10. J. Portilla and R. Jauregui, “Studies on Small- and Large-signal Noise in Solid-State Amplifiers,” UPV/EHU,. 2013.
  11. T.E. Rutt, “Vacuum Tube Triode Nonlinearity as Part of the Electric Guitar Sound,” Convention of Audio Engineering Society., October 8–11, 1984.
  12. M. Karjalainen and J. Pakarinen, “Wave Digital Simulation of a Vacuum-Tube Amplifier,” Laboratory of Acoustics and Audio Signal Processing., vol. X-06, pp. V-153-V156, 2006.