Friday, April 20, 2018

Twitter as a data source

Data is literally everywhere. This may seem as though this is solely a benefit, however at

times there is too much as it abounds. The vast amount, when attempted to analyze, may make it

difficult to understand what is really there and how it may be useful. Whether researching

InfoSec or the latest system upgrades, there should be methods and tools present to alleviate the

issue.



One potential source of this data is Twitter. People and businesses tweet on nearly

everything. This may be food, dinner, present mood, politics, or any other number of items. One

useful area that has reviewed this aspect of Twitter has been ML. This is a great source for data

mining with virtually any subject. This is also a free source for people to express their opinions

or thoughts. This lack of barrier to entry has allowed everyone to input their thoughts, whereas

other venues have not done this. At times, there may be results slightly skewed by the trolls. In

light of the overall number of entries, the level of skew due to this would not be significant and

could be primarily removed with a script.



One such application recently occurred with a study on opioid abuse. Tim Mackey,

Janani Kalyanam, and Takeo Katsuki in the American Journal of Public Health published their

research on detecting prescription opioid abuse promotion and access using Twitter

(http://alphapublications.org/doi/pdfplus/10.2105/AJPH.2017.303994). The researchers’

the methodology included collecting tweets from Twitter. These were only the publicly accessible

items on Twitter. Their search filter was for terms associated with opioid prescriptions. The

researchers used unsupervised machine learning and applying topic modeling.



The sample analyzed was 619,937 tweets with the term codeine, Percocet, fentanyl,

vicodin, oxycontin, oxycodone, and hydrocodone. The sample period was from June to November

2015. From these 1,778 tweets, or less than 1% were noted in marketing the sale of controlled

substances online. Of these, 90% had embedded links.



While no methodology for research is perfect, this falls within the realm of acceptable

protocols. ML has taken this and increased its potential exponentially. The continued ML use

and the application will further research on not only the lease level but also the understanding and

comprehension of the data itself, along with its implications. This was only one example of the

many where ML would be exceptional in its application. As applied to InfoSec, this could also

be used to research compromises, data loss, or other subjects.

Monday, July 25, 2016

Noise

With descriptive and predictive analytics, the data set is the foundation of the client engagement. This is what the report is based on. With the analysis there may be outliers in the data set. These may be due to errors in recording or exemplary performers.

These data points should be removed from the analysis. To include would have the distinct potential to positively or negatively skew the results, depending on the data and measurement. Removing the noise provides a truer measurement within the scope.

Sharing is Caring

Data analytics has many benefits. These allow for the analysis of big data to help others. This could be predictive (letting the consumer know the best or worst time to do something, purchase a car, or more inventory), or descriptive (what is the average length a disc drive should work). The more data that is available, the better or more focussed and applicable the analysis is. This sharing of data is helpful.

Recently there has been a court case filed against Myriad Genetics Inc. alleging they are refusing to share their data. The entity has data on persons with rare conditions. This data may help with the patient's who are not in the database with their respective treatment. The requests for data are acceptable under HIPAA. the company is claiming this is proprietary.

The data would be useful for the patient's and the future of medicine as it relates to their specific illness. It is understandable the company would want to keep the data secret, as they expended money to collect this, but this needs to be released. The analysis of the data provides indications of treatment that could save lives.



Airline Efficiency

With industries, historically there has been slim margins during certain economic times. This may be due to the forces out of the industry's control. In the alternative, the margins may be fine. With either scenario, management monitors the margins and profits. Not completing this would be a glaring oversight.

Data analytics has been applied in the airline industry to ensure the best margins possible are enjoyed by the business. Optimization modeling and analytics have been applied for this and also in the decision-making process.

For instance, fuel is a large expense for airlines, as you can imagine. The modeling was applied to fuel spot pricing. This was a complex algorithm analyzing the costs of fuel along the airplane's stops, type of the aircraft, weight ot the aircraft, cost of the extra fuel, time spent fueling, and other attributes.

This is perfectly applicable to not merely the airline industry, but nearly all.


Saturday, July 23, 2016

Timelines


Timelines
            Life is ruled by deadlines. The payroll has to be entered by Friday at noon. The workers have to be back from their desks by 1pm. The quarterly reports have to be in by next Thursday.

            Data analytics is no different. Time continues to be of the essence. If there is a golf tournament at a private country club for the weekend, and the organizers need the predictive report by Wednesday, it is of no use or consequence if it is delivered after Wednesday. A Friday delivery date is of no use for the client. 

Hope for the best, plan for the worst

Hope for the best, plan for the worst
            As civilization marches on, there continues to be growth. Inclusive of this movement are the construction of buildings to dwell and work in. In certain geographic areas, this tends to be problematic. Specifically there are areas that are susceptible to disasters, be these from an earthquake, tornado, or other force out of anyone’s control.     

            To work towards a safer building people can live or work in, research has been completed focusing on the building’s reactions to these forces. Examples of these occurred in California, Oregon, and other states. The buildings constructed for this provided data from shake tests, strain gauges, and accelerometers. This mountain of data has and will continue to be analyzed along with new data. The analysis, descriptive and predictive, will indicate the safest, and most tolerant materials to build with and design. This will provide for the better, safer structures for the people to be in. 

Insurance

Insurance
            Any business or industry with data has a positive application for a data analytics. The owner or C-level may want to examine at length certain aspects of the business.
            The insurance agency is well-known for using statistics to analyze their clients who are and have purchased life insurance. There are algorithms in use that are updated regularly, based on the mortality data.
            Another aspect not explored at length normally are reviewing claims in order to flag them or not for fraudulent activity. The statistics behind this would be able to provide a baseline to work from to gauge the potential for the claim to be fraudulent. The more data involved for both examples, the greater or more robust the algorithm and analysis. In working with a motivated, focused vendor, the client will receive a greater user experience.